Logo Search packages:      
Sourcecode: icu version File versions  Download package

UnicodeSet & UnicodeSet::applyPropertyPattern ( const UnicodeString pattern,
ParsePosition ppos,
UErrorCode ec 
) [private]

Parse the given property pattern at the given parse position and set this UnicodeSet to the result.

The original design document is out of date, but still useful. Ignore the property and value names: http://source.icu-project.org/repos/icu/icuhtml/trunk/design/unicodeset_properties.html

Recognized syntax:

[:foo:] [:^foo:] - white space not allowed within "[:" or ":]" \p{foo} \P{foo} - white space not allowed within "\\p" or "\\P" \N{name} - white space not allowed within "\\N"

Other than the above restrictions, white space is ignored. Case is ignored except in "\\p" and "\\P" and "\\N". In 'name' leading and trailing space is deleted, and internal runs of whitespace are collapsed to a single space.

We support binary properties, enumerated properties, and the following non-enumerated properties:

Numeric_Value Name Unicode_1_Name

Parameters:
pattern the pattern string
ppos on entry, the position at which to begin parsing. This should be one of the locations marked '^':
[:blah:] \p{blah} \P{blah} \N{name} ^ % ^ % ^ % ^ %

On return, the position after the last character parsed, that is, the locations marked ''. If the parse fails, ppos is returned unchanged.

Returns:
a reference to this.
Parse the given property pattern at the given parse position.

Definition at line 1244 of file uniset_props.cpp.

References applyPropertyAlias(), UnicodeString::charAt(), complement(), UnicodeString::extractBetween(), FALSE, ParsePosition::getIndex(), UnicodeString::indexOf(), UnicodeString::length(), ParsePosition::setIndex(), TRUE, U_FAILURE, U_SUCCESS, and US_INV.

Referenced by applyPattern(), and applyPropertyPattern().

                                                             {
    int32_t pos = ppos.getIndex();

    UBool posix = FALSE; // true for [:pat:], false for \p{pat} \P{pat} \N{pat}
    UBool isName = FALSE; // true for \N{pat}, o/w false
    UBool invert = FALSE;

    if (U_FAILURE(ec)) return *this;

    // Minimum length is 5 characters, e.g. \p{L}
    if ((pos+5) > pattern.length()) {
        FAIL(ec);
    }

    // On entry, ppos should point to one of the following locations:
    // Look for an opening [:, [:^, \p, or \P
    if (isPOSIXOpen(pattern, pos)) {
        posix = TRUE;
        pos += 2;
        pos = ICU_Utility::skipWhitespace(pattern, pos);
        if (pos < pattern.length() && pattern.charAt(pos) == COMPLEMENT) {
            ++pos;
            invert = TRUE;
        }
    } else if (isPerlOpen(pattern, pos) || isNameOpen(pattern, pos)) {
        UChar c = pattern.charAt(pos+1);
        invert = (c == UPPER_P);
        isName = (c == UPPER_N);
        pos += 2;
        pos = ICU_Utility::skipWhitespace(pattern, pos);
        if (pos == pattern.length() || pattern.charAt(pos++) != OPEN_BRACE) {
            // Syntax error; "\p" or "\P" not followed by "{"
            FAIL(ec);
        }
    } else {
        // Open delimiter not seen
        FAIL(ec);
    }

    // Look for the matching close delimiter, either :] or }
    int32_t close = pattern.indexOf(posix ? POSIX_CLOSE : PERL_CLOSE, pos);
    if (close < 0) {
        // Syntax error; close delimiter missing
        FAIL(ec);
    }

    // Look for an '=' sign.  If this is present, we will parse a
    // medium \p{gc=Cf} or long \p{GeneralCategory=Format}
    // pattern.
    int32_t equals = pattern.indexOf(EQUALS, pos);
    UnicodeString propName, valueName;
    if (equals >= 0 && equals < close && !isName) {
        // Equals seen; parse medium/long pattern
        pattern.extractBetween(pos, equals, propName);
        pattern.extractBetween(equals+1, close, valueName);
    }

    else {
        // Handle case where no '=' is seen, and \N{}
        pattern.extractBetween(pos, close, propName);
            
        // Handle \N{name}
        if (isName) {
            // This is a little inefficient since it means we have to
            // parse NAME_PROP back to UCHAR_NAME even though we already
            // know it's UCHAR_NAME.  If we refactor the API to
            // support args of (UProperty, char*) then we can remove
            // NAME_PROP and make this a little more efficient.
            valueName = propName;
            propName = UnicodeString(NAME_PROP, NAME_PROP_LENGTH, US_INV);
        }
    }

    applyPropertyAlias(propName, valueName, ec);

    if (U_SUCCESS(ec)) {
        if (invert) {
            complement();
        }
            
        // Move to the limit position after the close delimiter if the
        // parse succeeded.
        ppos.setIndex(close + (posix ? 2 : 1));
    }

    return *this;
}


Generated by  Doxygen 1.6.0   Back to index