Logo Search packages:      
Sourcecode: icu version File versions  Download package

BoyerMooreSearch Class Reference

#include <bmsearch.h>

Inheritance diagram for BoyerMooreSearch:

UObject UMemory

List of all members.

Detailed Description


This object holds the information needed to do a Collation sensitive Boyer-Moore search. It encapulates the pattern, the "bad character" and "good suffix" tables, the Collator-based data needed to compute them, and a reference to the text being searched.

To do a search, you fist need to get a CollData object by calling CollData::open. Then you construct a BoyerMooreSearch object from the CollData object, the pattern string and the target string. Then you call the search method. Here's a code sample:

 void boyerMooreExample(UCollator *collator, UnicodeString *pattern, UnicodeString *target)
     UErrorCode status = U_ZERO_ERROR;
     CollData *collData = CollData::open(collator, status);

     if (U_FAILURE(status)) {
         // could not create a CollData object

     BoyerMooreSearch *search = new BoyerMooreSearch(collData, *patternString, target, status);

     if (U_FAILURE(status)) {
         // could not create a BoyerMooreSearch object

     int32_t offset = 0, start = -1, end = -1;

     // Find all matches
     while (search->search(offset, start, end)) {
         // process the match between start and end
         // advance past the match
         offset = end; 

     // at this point, if offset == 0, there were no matches
     if (offset == 0) {
         // handle the case of no matches

     delete search;

     // CollData objects are cached, so the call to
     // CollData::close doesn't delete the object.
     // Call this if you don't need the object any more.

NOTE: This is a technology preview. The final version of this API may not bear any resenblence to this API.

Knows linitations: 1) Backwards searching has not been implemented.

2) For Han and Hangul characters, this code ignores any Collation tailorings. In general, this isn't a problem, but in Korean locals, at strength 1, Hangul characters are tailored to be equal to Han characters with the same pronounciation. Because this code ignroes tailorings, searching for a Hangul character will not find a Han character and visa-versa.

3) In some cases, searching for a pattern that needs to be normalized and ends in a discontiguous contraction may fail. The only known cases of this are with the Tibetan script. For example searching for the pattern "\u0F7F\u0F80\u0F81\u0F82\u0F83\u0F84\u0F85" will fail. (This case is artificial. We've been unable to find a pratical, real-world example of this failure.)

For internal use only.

ICU 4.0.1 technology preview

See also:

Definition at line 107 of file bmsearch.h.

Public Member Functions

 BoyerMooreSearch (CollData *theData, const UnicodeString &patternString, const UnicodeString *targetString, UErrorCode &status)
UBool empty ()
BadCharacterTable * getBadCharacterTable ()
CollDatagetData ()
virtual UClassID getDynamicClassID () const
GoodSuffixTable * getGoodSuffixTable ()
CEListgetPatternCEs ()
UBool search (int32_t offset, int32_t &start, int32_t &end)
void setTargetString (const UnicodeString *targetString, UErrorCode &status)
 ~BoyerMooreSearch ()

Static Public Member Functions

static UClassID getStaticClassID ()
static void U_EXPORT2 operator delete (void *, void *) U_NO_THROW
static void U_EXPORT2 operator delete (void *p) U_NO_THROW
static void U_EXPORT2 operator delete[] (void *p) U_NO_THROW
static void *U_EXPORT2 operator new (size_t, void *ptr) U_NO_THROW
static void *U_EXPORT2 operator new (size_t size) U_NO_THROW
static void *U_EXPORT2 operator new[] (size_t size) U_NO_THROW

Private Attributes

BadCharacterTable * badCharacterTable
GoodSuffixTable * goodSuffixTable
UnicodeString pattern
Target * target

The documentation for this class was generated from the following files:

Generated by  Doxygen 1.6.0   Back to index