Logo Search packages:      
Sourcecode: icu version File versions  Download package

U_STABLE int32_t U_EXPORT2 u_unescape ( const char *  src,
UChar *  dest,
int32_t  destCapacity 
)

Unescape a string of characters and write the resulting Unicode characters to the destination buffer. The following escape sequences are recognized:

\uhhhh 4 hex digits; h in [0-9A-Fa-f] \Uhhhhhhhh 8 hex digits \xhh 1-2 hex digits \x{h...} 1-8 hex digits \ooo 1-3 octal digits; o in [0-7] \cX control-X; X is masked with 0x1F

as well as the standard ANSI C escapes:

\a => U+0007, \b => U+0008, \t => U+0009, \n => U+000A, \v => U+000B, \f => U+000C, \r => U+000D, \e => U+001B, \" => U+0022, \' => U+0027, \? => U+003F, \\ => U+005C

Anything else following a backslash is generically escaped. For example, "[a\\-z]" returns "[a-z]".

If an escape sequence is ill-formed, this method returns an empty string. An example of an ill-formed sequence is "\\u" followed by fewer than 4 hex digits.

The above characters are recognized in the compiler's codepage, that is, they are coded as 'u', '\', etc. Characters that are not parts of escape sequences are converted using u_charsToUChars().

This function is similar to UnicodeString::unescape() but not identical to it. The latter takes a source UnicodeString, so it does escape recognition but no conversion.

Parameters:
src a zero-terminated string of invariant characters
dest pointer to buffer to receive converted and unescaped text and, if there is room, a zero terminator. May be NULL for preflighting, in which case no UChars will be written, but the return value will still be valid. On error, an empty string is stored here (if possible).
destCapacity the number of UChars that may be written at dest. Ignored if dest == NULL.
Returns:
the length of unescaped string.
See also:
u_unescapeAt

UnicodeString::unescape()

UnicodeString::unescapeAt() ICU 2.0

Definition at line 1361 of file ustring.c.

References NULL, UTF_APPEND_CHAR_UNSAFE, and UTF_CHAR_LENGTH.

                                                               {
    const char *segment = src;
    int32_t i = 0;
    char c;

    while ((c=*src) != 0) {
        /* '\\' intentionally written as compiler-specific
         * character constant to correspond to compiler-specific
         * char* constants. */
        if (c == '\\') {
            int32_t lenParsed = 0;
            UChar32 c32;
            if (src != segment) {
                if (dest != NULL) {
                    _appendUChars(dest + i, destCapacity - i,
                                  segment, (int32_t)(src - segment));
                }
                i += (int32_t)(src - segment);
            }
            ++src; /* advance past '\\' */
            c32 = (UChar32)u_unescapeAt(_charPtr_charAt, &lenParsed, (int32_t)uprv_strlen(src), (void*)src);
            if (lenParsed == 0) {
                goto err;
            }
            src += lenParsed; /* advance past escape seq. */
            if (dest != NULL && UTF_CHAR_LENGTH(c32) <= (destCapacity - i)) {
                UTF_APPEND_CHAR_UNSAFE(dest, i, c32);
            } else {
                i += UTF_CHAR_LENGTH(c32);
            }
            segment = src;
        } else {
            ++src;
        }
    }
    if (src != segment) {
        if (dest != NULL) {
            _appendUChars(dest + i, destCapacity - i,
                          segment, (int32_t)(src - segment));
        }
        i += (int32_t)(src - segment);
    }
    if (dest != NULL && i < destCapacity) {
        dest[i] = 0;
    }
    return i;

 err:
    if (dest != NULL && destCapacity > 0) {
        *dest = 0;
    }
    return 0;
}


Generated by  Doxygen 1.6.0   Back to index