Logo Search packages:      
Sourcecode: icu version File versions  Download package

U_STABLE int32_t U_EXPORT2 uregex_split ( URegularExpression regexp,
UChar *  destBuf,
int32_t  destCapacity,
int32_t *  requiredCapacity,
UChar *  destFields[],
int32_t  destFieldsCapacity,
UErrorCode status 
)

Split a string into fields. Somewhat like split() from Perl. The pattern matches identify delimiters that separate the input into fields. The input data between the matches becomes the fields themselves.

Each of the fields is copied from the input string to the destination buffer, and NUL terminated. The position of each field within the destination buffer is returned in the destFields array.

Note: another choice for the design of this function would be to not copy the resulting fields at all, but to return indexes and lengths within the source text. Advantages would be o Faster. No Copying. o Nothing extra needed when field data may contain embedded NUL chars. o Less memory needed if working on large data. Disadvantages o Less consistent with C++ split, which copies into an array of UnicodeStrings. o No NUL termination, extracted fields would be less convenient to use in most cases. o Possible problems in the future, when support Unicode Normalization could cause the fields to not correspond exactly to a range of the source text.

Parameters:
regexp The compiled regular expression.
destBuf A (UChar *) buffer to receive the fields that are extracted from the input string. These field pointers will refer to positions within the destination buffer supplied by the caller. Any extra positions within the destFields array will be set to NULL.
destCapacity The capacity of the destBuf.
requiredCapacity The actual capacity required of the destBuf. If destCapacity is too small, requiredCapacity will return the total capacity required to hold all of the output, and a U_BUFFER_OVERFLOW_ERROR will be returned.
destFields An array to be filled with the position of each of the extracted fields within destBuf.
destFieldsCapacity The number of elements in the destFields array. If the number of fields found is less than destFieldsCapacity, the extra destFields elements are set to zero. If destFieldsCapacity is too small, the trailing part of the input, including any field delimiters, is treated as if it were the last field - it is copied to the destBuf, and its position is in the destBuf is stored in the last element of destFields. This behavior mimics that of Perl. It is not an error condition, and no error status is returned when all destField positions are used.
status A reference to a UErrorCode to receive any errors.
Returns:
The number of fields into which the input string was split. ICU 3.0

Definition at line 1741 of file uregex.cpp.

References FALSE, NULL, and U_ILLEGAL_ARGUMENT_ERROR.

                                              {
    RegularExpression *regexp = (RegularExpression*)regexp2;
    if (validateRE(regexp, status) == FALSE) {
        return 0;
    }
    if (destBuf == NULL && destCapacity > 0 ||
        destCapacity < 0 ||
        destFields == NULL ||
        destFieldsCapacity < 1 ) {
        *status = U_ILLEGAL_ARGUMENT_ERROR;
        return 0;
    }
    
    return RegexCImpl::split(regexp, destBuf, destCapacity, requiredCapacity, destFields, destFieldsCapacity, status);
}


Generated by  Doxygen 1.6.0   Back to index