Logo Search packages:      
Sourcecode: icu version File versions  Download package

int32_t UnicodeString::extract ( int32_t  start,
int32_t  startLength,
char *  target,
uint32_t  targetLength,
const char *  codepage 
) const

Copy the characters in the range [start, start + length) into an array of characters in a specified codepage. This function does not write any more than targetLength characters but returns the length of the entire output string so that one can allocate a larger buffer and call the function again if necessary. The output string is NUL-terminated if possible.

Recommendation: For invariant-character strings use extract(int32_t start, int32_t length, char *target, int32_t targetCapacity, enum EInvariant inv) const because it avoids object code dependencies of UnicodeString on the conversion code.

start offset of first character which will be copied
startLength the number of characters to extract
target the target buffer for extraction
targetLength the length of the target buffer
codepage the desired codepage for the characters. 0 has the special meaning of the default codepage If codepage is an empty string (""), then a simple conversion is performed on the codepage-invariant subset ("invariant characters") of the platform encoding. See utypes.h. If target is NULL, then the number of bytes required for target is returned.
the output string length, not including the terminating NUL ICU 2.0

Definition at line 140 of file unistr_cnv.cpp.

References toUTF8(), and U_ZERO_ERROR.

    // if the arguments are illegal, then do nothing
    if(/*dstSize < 0 || */(dstSize > 0 && target == 0)) {
        return 0;

    // pin the indices to legal values
    pinIndices(start, length);

    // We need to cast dstSize to int32_t for all subsequent code.
    // I don't know why the API was defined with uint32_t but we are stuck with it.
    // Also, dstSize==0xffffffff means "unlimited" but if we use target+dstSize
    // as a limit in some functions, it may wrap around and yield a pointer
    // that compares less-than target.
    int32_t capacity;
    if(dstSize < 0x7fffffff) {
        // Assume that the capacity is real and a limit pointer won't wrap around.
        capacity = (int32_t)dstSize;
    } else {
        char *targetLimit = target + 0x7fffffff;
        if(targetLimit < target) {
            // Pin the capacity so that a limit pointer does not wrap around.
            targetLimit = (char *)U_MAX_PTR(target);
            capacity = (int32_t)(targetLimit - target);
        } else {
            // Pin the capacity to the maximum int32_t value.
            capacity = 0x7fffffff;

    // create the converter
    UConverter *converter;
    UErrorCode status = U_ZERO_ERROR;

    // just write the NUL if the string length is 0
    if(length == 0) {
        return u_terminateChars(target, capacity, 0, &status);

    // if the codepage is the default, use our cache
    // if it is an empty string, then use the "invariant character" conversion
    if (codepage == 0) {
        const char *defaultName = ucnv_getDefaultName();
        if(UCNV_FAST_IS_UTF8(defaultName)) {
            return toUTF8(start, length, target, capacity);
        converter = u_getDefaultConverter(&status);
    } else if (*codepage == 0) {
        // use the "invariant characters" conversion
        int32_t destLength;
        if(length <= capacity) {
            destLength = length;
        } else {
            destLength = capacity;
        u_UCharsToChars(getArrayStart() + start, target, destLength);
        return u_terminateChars(target, capacity, length, &status);
    } else {
        converter = ucnv_open(codepage, &status);

    length = doExtract(start, length, target, capacity, converter, status);

    // close the converter
    if (codepage == 0) {
    } else {

    return length;

Generated by  Doxygen 1.6.0   Back to index