Logo Search packages:      
Sourcecode: icu version File versions

CollationKey & RuleBasedCollator::getCollationKey ( const UnicodeString source,
CollationKey sortkey,
UErrorCode status 
) const [virtual]

Transforms a specified region of the string into a series of characters that can be compared with CollationKey.compare. Use a CollationKey when you need to do repeated comparisions on the same string. For a single comparison the compare method will be faster.

sourcethe source string.
keythe transformed key of the source string.
statusthe error code status.
the transformed key.
See also:
ICU 2.8 Use getSortKey(...) instead

Retrieve a collation key for the specified string. The key can be compared with other collation keys using a bitwise comparison (e.g. memcmp) to find the ordering of their respective source strings. This is handy when doing a sort, where each sort key must be compared many times.

The basic algorithm here is to find all of the collation elements for each character in the source string, convert them to an ASCII representation, and put them into the collation key. But it's trickier than that. Each collation element in a string has three components: primary ('A' vs 'B'), secondary ('u' vs ''), and tertiary ('A' vs 'a'), and a primary difference at the end of a string takes precedence over a secondary or tertiary difference earlier in the string.

To account for this, we put all of the primary orders at the beginning of the string, followed by the secondary and tertiary orders. Each set of orders is terminated by nulls so that a key for a string which is a initial substring of another key will compare less without any special case.

Here's a hypothetical example, with the collation element represented as a three-digit number, one digit for primary, one for secondary, etc.

String: A a B Collation Elements: 101 100 201 511 Collation Key: 1125<null>0001<null>1011<null>

To make things even trickier, secondary differences (accent marks) are compared starting at the *end* of the string in languages with French secondary ordering. But when comparing the accent marks on a single base character, they are compared from the beginning. To handle this, we reverse all of the accents that belong to each base character, then we reverse the entire string of secondary orderings at the end.

Implements Collator.

Definition at line 443 of file tblcoll.cpp.

References UnicodeString::getBuffer(), and UnicodeString::length().

    return getCollationKey(source.getBuffer(), source.length(), sortkey, status);

Here is the call graph for this function:

Generated by  Doxygen 1.6.0   Back to index