Logo Search packages:      
Sourcecode: icu version File versions  Download package

Normalizer2 Class Reference

#include <normalizer2.h>

Inheritance diagram for Normalizer2:

UObject UMemory FilteredNormalizer2

List of all members.


Detailed Description

Unicode normalization functionality for standard Unicode normalization or for using custom mapping tables. All instances of this class are unmodifiable/immutable. Instances returned by getInstance() are singletons that must not be deleted by the caller.

The primary functions are to produce a normalized string and to detect whether a string is already normalized. The most commonly used normalization forms are those defined in http://www.unicode.org/unicode/reports/tr15/ However, this API supports additional normalization forms for specialized purposes. For example, NFKC_Casefold is provided via getInstance("nfkc_cf", COMPOSE) and can be used in implementations of UTS #46.

Not only are the standard compose and decompose modes supplied, but additional modes are provided as documented in the Mode enum.

Some of the functions in this class identify normalization boundaries. At a normalization boundary, the portions of the string before it and starting from it do not interact and can be handled independently.

The spanQuickCheckYes() stops at a normalization boundary. When the goal is a normalized string, then the text before the boundary can be copied, and the remainder can be processed with normalizeSecondAndAppend().

The hasBoundaryBefore(), hasBoundaryAfter() and isInert() functions test whether a character is guaranteed to be at a normalization boundary, regardless of context. This is used for moving from one normalization boundary to the next or preceding boundary, and for performing iterative normalization.

Iterative normalization is useful when only a small portion of a longer string needs to be processed. For example, in ICU, iterative normalization is used by the NormalizationTransliterator (to avoid replacing already-normalized text) and ucol_nextSortKeyPart() (to process only the substring for which sort key bytes are computed).

The set of normalization boundaries returned by these functions may not be complete: There may be more boundaries that could be returned. Different functions may return different boundaries. ICU 4.4

Definition at line 77 of file normalizer2.h.


Public Member Functions

virtual UnicodeStringappend (UnicodeString &first, const UnicodeString &second, UErrorCode &errorCode) const =0
virtual UClassID getDynamicClassID () const =0
virtual UBool hasBoundaryAfter (UChar32 c) const =0
virtual UBool hasBoundaryBefore (UChar32 c) const =0
virtual UBool isInert (UChar32 c) const =0
virtual UBool isNormalized (const UnicodeString &s, UErrorCode &errorCode) const =0
virtual UnicodeStringnormalize (const UnicodeString &src, UnicodeString &dest, UErrorCode &errorCode) const =0
UnicodeString normalize (const UnicodeString &src, UErrorCode &errorCode) const
virtual UnicodeStringnormalizeSecondAndAppend (UnicodeString &first, const UnicodeString &second, UErrorCode &errorCode) const =0
virtual UNormalizationCheckResult quickCheck (const UnicodeString &s, UErrorCode &errorCode) const =0
virtual int32_t spanQuickCheckYes (const UnicodeString &s, UErrorCode &errorCode) const =0

Static Public Member Functions

static const Normalizer2getInstance (const char *packageName, const char *name, UNormalization2Mode mode, UErrorCode &errorCode)
static UClassID U_EXPORT2 getStaticClassID ()
static void U_EXPORT2 operator delete (void *, void *) U_NO_THROW
static void U_EXPORT2 operator delete (void *p) U_NO_THROW
static void U_EXPORT2 operator delete[] (void *p) U_NO_THROW
static void *U_EXPORT2 operator new (size_t, void *ptr) U_NO_THROW
static void *U_EXPORT2 operator new (size_t size) U_NO_THROW
static void *U_EXPORT2 operator new[] (size_t size) U_NO_THROW

The documentation for this class was generated from the following files:

Generated by  Doxygen 1.6.0   Back to index