#include <rbnf.h>
The resources contain three predefined formatters for each locale: spellout, which spells out a value in words (123 is "one hundred twentythree"); ordinal, which appends an ordinal suffix to the end of a numeral (123 is "123rd"); and duration, which shows a duration in seconds as hours, minutes, and seconds (123 is "2:03"). The client can also define more specialized RuleBasedNumberFormat
s by supplying programmerdefined rule sets.
The behavior of a RuleBasedNumberFormat
is specified by a textual description that is either passed to the constructor as a String
or loaded from a resource bundle. In its simplest form, the description consists of a semicolondelimited list of rules. Each rule has a string of output text and a value or range of values it is applicable to. In a typical spellout rule set, the first twenty rules are the words for the numbers from 0 to 19:
zero; one; two; three; four; five; six; seven; eight; nine; ten; eleven; twelve; thirteen; fourteen; fifteen; sixteen; seventeen; eighteen; nineteen;
For larger numbers, we can use the preceding set of rules to format the ones place, and we only have to supply the words for the multiples of 10:
20: twenty[>>]; 30: thirty[>>]; 40: forty[>>]; 50: fifty[>>]; 60: sixty[>>]; 70: seventy[>>]; 80: eighty[>>]; 90: ninety[>>];
In these rules, the base value is spelled out explicitly and set off from the rule's output text with a colon. The rules are in a sorted list, and a rule is applicable to all numbers from its own base value to one less than the next rule's base value. The ">>" token is called a substitution and tells the fomatter to isolate the number's ones digit, format it using this same set of rules, and place the result at the position of the ">>" token. Text in brackets is omitted if the number being formatted is an even multiple of 10 (the hyphen is a literal hyphen; 24 is "twentyfour," not "twenty four").
For even larger numbers, we can actually look up several parts of the number in the list:
100: << hundred[ >>];
The "<<" represents a new kind of substitution. The << isolates the hundreds digit (and any digits to its left), formats it using this same rule set, and places the result where the "<<" was. Notice also that the meaning of >> has changed: it now refers to both the tens and the ones digits. The meaning of both substitutions depends on the rule's base value. The base value determines the rule's divisor, which is the highest power of 10 that is less than or equal to the base value (the user can change this). To fill in the substitutions, the formatter divides the number being formatted by the divisor. The integral quotient is used to fill in the << substitution, and the remainder is used to fill in the >> substitution. The meaning of the brackets changes similarly: text in brackets is omitted if the value being formatted is an even multiple of the rule's divisor. The rules are applied recursively, so if a substitution is filled in with text that includes another substitution, that substitution is also filled in.
This rule covers values up to 999, at which point we add another rule:
1000: << thousand[ >>];
Again, the meanings of the brackets and substitution tokens shift because the rule's base value is a higher power of 10, changing the rule's divisor. This rule can actually be used all the way up to 999,999. This allows us to finish out the rules as follows:
1,000,000: << million[ >>]; 1,000,000,000: << billion[ >>]; 1,000,000,000,000: << trillion[ >>]; 1,000,000,000,000,000: OUT OF RANGE!;
Commas, periods, and spaces can be used in the base values to improve legibility and are ignored by the rule parser. The last rule in the list is customarily treated as an "overflow rule," applying to everything from its base value on up, and often (as in this example) being used to print out an error message or default representation. Notice also that the size of the major groupings in large numbers is controlled by the spacing of the rules: because in English we group numbers by thousand, the higher rules are separated from each other by a factor of 1,000.
To see how these rules actually work in practice, consider the following example: Formatting 25,430 with this rule set would work like this:
<< thousand >>  [the rule whose base value is 1,000 is applicable to 25,340] 
twenty>> thousand >>  [25,340 over 1,000 is 25. The rule for 20 applies.] 
twentyfive thousand >>  [25 mod 10 is 5. The rule for 5 is "five." 
twentyfive thousand << hundred >>  [25,340 mod 1,000 is 340. The rule for 100 applies.] 
twentyfive thousand three hundred >>  [340 over 100 is 3. The rule for 3 is "three."] 
twentyfive thousand three hundred forty  [340 mod 100 is 40. The rule for 40 applies. Since 40 divides evenly by 10, the hyphen and substitution in the brackets are omitted.] 
The above syntax suffices only to format positive integers. To format negative numbers, we add a special rule:
x: minus >>;
This is called a negativenumber rule, and is identified by "x" where the base value would be. This rule is used to format all negative numbers. the >> token here means "find the number's absolute value, format it with these rules, and put the result here."
We also add a special rule called a fraction rule for numbers with fractional parts:
x.x: << point >>;
This rule is used for all positive nonintegers (negative nonintegers pass through the negativenumber rule first and then through this rule). Here, the << token refers to the number's integral part, and the >> to the number's fractional part. The fractional part is formatted as a series of singledigit numbers (e.g., 123.456 would be formatted as "one hundred twentythree point four five six").
To see how this rule syntax is applied to various languages, examine the resource data.
There is actually much more flexibility built into the rule language than the description above shows. A formatter may own multiple rule sets, which can be selected by the caller, and which can use each other to fill in their substitutions. Substitutions can also be filled in with digits, using a DecimalFormat object. There is syntax that can be used to alter a rule's divisor in various ways. And there is provision for much more flexible fraction handling. A complete description of the rule syntax follows:
The description of a RuleBasedNumberFormat
's behavior consists of one or more rule sets. Each rule set consists of a name, a colon, and a list of rules. A rule set name must begin with a % sign. Rule sets with names that begin with a single % sign are public: the caller can specify that they be used to format and parse numbers. Rule sets with names that begin with %% are private: they exist only for the use of other rule sets. If a formatter only has one rule set, the name may be omitted.
The user can also specify a special "rule set" named %lenientparse
. The body of %lenientparse
isn't a set of numberformatting rules, but a RuleBasedCollator
description which is used to define equivalences for lenient parsing. For more information on the syntax, see RuleBasedCollator
. For more information on lenient parsing, see setLenientParse()
. Note: symbols that have syntactic meaning in collation rules, such as '&', have no particular meaning when appearing outside of the lenientparse
rule set.
The body of a rule set consists of an ordered, semicolondelimited list of rules. Internally, every rule has a base value, a divisor, rule text, and zero, one, or two substitutions. These parameters are controlled by the description syntax, which consists of a rule descriptor, a colon, and a rule body.
A rule descriptor can take one of the following forms (text in italics is the name of a token):
bv:  bv specifies the rule's base value. bv is a decimal number expressed using ASCII digits. bv may contain spaces, period, and commas, which are ignored. The rule's divisor is the highest power of 10 less than or equal to the base value. 
bv/rad:  bv specifies the rule's base value. The rule's divisor is the highest power of rad less than or equal to the base value. 
bv>:  bv specifies the rule's base value. To calculate the divisor, let the radix be 10, and the exponent be the highest exponent of the radix that yields a result less than or equal to the base value. Every > character after the base value decreases the exponent by 1. If the exponent is positive or 0, the divisor is the radix raised to the power of the exponent; otherwise, the divisor is 1. 
bv/rad>:  bv specifies the rule's base value. To calculate the divisor, let the radix be rad, and the exponent be the highest exponent of the radix that yields a result less than or equal to the base value. Every > character after the radix decreases the exponent by 1. If the exponent is positive or 0, the divisor is the radix raised to the power of the exponent; otherwise, the divisor is 1. 
x:  The rule is a negativenumber rule. 
x.x:  The rule is an improper fraction rule. 
0.x:  The rule is a proper fraction rule. 
x.0:  The rule is a master rule. 
nothing  If the rule's rule descriptor is left out, the base value is one plus the preceding rule's base value (or zero if this is the first rule in the list) in a normal rule set. In a fraction rule set, the base value is the same as the preceding rule's base value. 
A rule set may be either a regular rule set or a fraction rule set, depending on whether it is used to format a number's integral part (or the whole number) or a number's fractional part. Using a rule set to format a rule's fractional part makes it a fraction rule set.
Which rule is used to format a number is defined according to one of the following algorithms: If the rule set is a regular rule set, do the following:
double
), use the master rule. (If the number being formatted was passed in as a long
, the master rule is ignored.) If the rule set is a fraction rule set, do the following:
A rule's body consists of a string of characters terminated by a semicolon. The rule may include zero, one, or two substitution tokens, and a range of text in brackets. The brackets denote optional text (and may also include one or both substitutions). The exact meanings of the substitution tokens, and under what conditions optional text is omitted, depend on the syntax of the substitution token and the context. The rest of the text in a rule body is literal text that is output when the rule matches the number being formatted.
A substitution token begins and ends with a token character. The token character and the context together specify a mathematical operation to be performed on the number being formatted. An optional substitution descriptor specifies how the value resulting from that operation is used to fill in the substitution. The position of the substitution token in the rule body specifies the location of the resultant text in the original rule text.
The meanings of the substitution token characters are as follows:
>>  in normal rule  Divide the number by the rule's divisor and format the remainder 
in negativenumber rule  Find the absolute value of the number and format the result  
in fraction or master rule  Isolate the number's fractional part and format it.  
in rule in fraction rule set  Not allowed.  
>>>  in normal rule  Divide the number by the rule's divisor and format the remainder, but bypass the normal ruleselection process and just use the rule that precedes this one in this rule list. 
in all other rules  Not allowed.  
<<  in normal rule  Divide the number by the rule's divisor and format the quotient 
in negativenumber rule  Not allowed.  
in fraction or master rule  Isolate the number's integral part and format it.  
in rule in fraction rule set  Multiply the number by the rule's base value and format the result.  
==  in all rule sets  Format the number unchanged 
[]  in normal rule  Omit the optional text if the number is an even multiple of the rule's divisor 
in negativenumber rule  Not allowed.  
in improperfraction rule  Omit the optional text if the number is between 0 and 1 (same as specifying both an x.x rule and a 0.x rule)  
in master rule  Omit the optional text if the number is an integer (same as specifying both an x.x rule and an x.0 rule)  
in properfraction rule  Not allowed.  
in rule in fraction rule set  Omit the optional text if multiplying the number by the rule's base value yields 1. 
The substitution descriptor (i.e., the text between the token characters) may take one of three forms:
a rule set name  Perform the mathematical operation on the number, and format the result using the named rule set. 
a DecimalFormat pattern  Perform the mathematical operation on the number, and format the result using a DecimalFormat with the specified pattern. The pattern must begin with 0 or #. 
nothing  Perform the mathematical operation on the number, and format the result using the rule set containing the current rule, except:

Whitespace is ignored between a rule set name and a rule set body, between a rule descriptor and a rule body, or between rules. If a rule body begins with an apostrophe, the apostrophe is ignored, but all text after it becomes significant (this is how you can have a rule's rule text begin with whitespace). There is no escape function: the semicolon is not allowed in rule set names or in rule text, and the colon is not allowed in rule set names. The characters beginning a substitution token are always treated as the beginning of a substitution token.
See the resource data and the demo program for annotated examples of real rule sets using these features.
User subclasses are not supported. While clients may write subclasses, such code will not necessarily work and will not be guaranteed to work stably from release to release.
Localizations
Constructors are available that allow the specification of localizations for the public rule sets (and also allow more control over what public rule sets are available). Localization data is represented as a textual description. The description represents an array of arrays of string. The first element is an array of the public rule set names, each of these must be one of the public rule set names that appear in the rules. Only names in this array will be treated as public rule set names by the API. Each subsequent element is an array of localizations of these names. The first element of one of these subarrays is the locale name, and the remaining elements are localizations of the public rule set names, in the same order as they were listed in the first arrray.
In the syntax, angle brackets '<', '>' are used to delimit the arrays, and comma ',' is used to separate elements of an array. Whitespace is ignored, unless quoted.
For example:
< < foo, bar, baz >, < en, Foo, Bar, Baz >, < fr, 'le Foo', 'le Bar', 'le Baz' > < zh, \u7532, \u4e59, \u4e19 > >
DecimalFormat ICU 2.0
Definition at line 503 of file rbnf.h.
Public Types  
enum  EAlignmentFields { kIntegerField, kFractionField, kDecimalSeparatorField, kExponentSymbolField, kExponentSignField, kExponentField, kGroupingSeparatorField, kCurrencyField, kPercentField, kPermillField, kSignField, INTEGER_FIELD = kIntegerField, FRACTION_FIELD = kFractionField } 
enum  EStyles { kNumberStyle, kCurrencyStyle, kPercentStyle, kScientificStyle, kIsoCurrencyStyle, kPluralCurrencyStyle, kStyleCount } 
Public Member Functions  
virtual Format *  clone (void) const 
virtual UnicodeString &  format (const DigitList &number, UnicodeString &appendTo, FieldPosition &pos, UErrorCode &status) const 
virtual UnicodeString &  format (const DigitList &number, UnicodeString &appendTo, FieldPositionIterator *posIter, UErrorCode &status) const 
virtual UnicodeString &  format (const StringPiece &number, UnicodeString &appendTo, FieldPositionIterator *posIter, UErrorCode &status) const 
virtual UnicodeString &  format (int64_t number, UnicodeString &appendTo, FieldPositionIterator *posIter, UErrorCode &status) const 
virtual UnicodeString &  format (int32_t number, UnicodeString &appendTo, FieldPositionIterator *posIter, UErrorCode &status) const 
virtual UnicodeString &  format (double number, UnicodeString &appendTo, FieldPositionIterator *posIter, UErrorCode &status) const 
UnicodeString &  format (int64_t number, UnicodeString &appendTo) const 
virtual UnicodeString &  format (const Formattable &obj, UnicodeString &appendTo, FieldPositionIterator *posIter, UErrorCode &status) const 
UnicodeString &  format (int32_t number, UnicodeString &output) const 
UnicodeString &  format (double number, UnicodeString &output) const 
UnicodeString &  format (const Formattable &obj, UnicodeString &result, UErrorCode &status) const 
virtual UnicodeString &  format (const Formattable &obj, UnicodeString &toAppendTo, FieldPosition &pos, UErrorCode &status) const 
virtual UnicodeString &  format (double number, const UnicodeString &ruleSetName, UnicodeString &toAppendTo, FieldPosition &pos, UErrorCode &status) const 
virtual UnicodeString &  format (int64_t number, const UnicodeString &ruleSetName, UnicodeString &toAppendTo, FieldPosition &pos, UErrorCode &status) const 
virtual UnicodeString &  format (int32_t number, const UnicodeString &ruleSetName, UnicodeString &toAppendTo, FieldPosition &pos, UErrorCode &status) const 
virtual UnicodeString &  format (double number, UnicodeString &toAppendTo, FieldPosition &pos) const 
virtual UnicodeString &  format (int64_t number, UnicodeString &toAppendTo, FieldPosition &pos) const 
virtual UnicodeString &  format (int32_t number, UnicodeString &toAppendTo, FieldPosition &pos) const 
const UChar *  getCurrency () const 
virtual UnicodeString  getDefaultRuleSetName () const 
virtual UClassID  getDynamicClassID (void) const 
Locale  getLocale (ULocDataLocaleType type, UErrorCode &status) const 
const char *  getLocaleID (ULocDataLocaleType type, UErrorCode &status) const 
int32_t  getMaximumFractionDigits (void) const 
int32_t  getMaximumIntegerDigits (void) const 
int32_t  getMinimumFractionDigits (void) const 
int32_t  getMinimumIntegerDigits (void) const 
virtual int32_t  getNumberOfRuleSetDisplayNameLocales (void) const 
virtual int32_t  getNumberOfRuleSetNames () const 
virtual UnicodeString  getRules () const 
virtual UnicodeString  getRuleSetDisplayName (const UnicodeString &ruleSetName, const Locale &locale=Locale::getDefault()) 
virtual UnicodeString  getRuleSetDisplayName (int32_t index, const Locale &locale=Locale::getDefault()) 
virtual Locale  getRuleSetDisplayNameLocale (int32_t index, UErrorCode &status) const 
virtual UnicodeString  getRuleSetName (int32_t index) const 
UBool  isGroupingUsed (void) const 
virtual UBool  isLenient (void) const 
UBool  isParseIntegerOnly (void) const 
UBool  operator!= (const Format &other) const 
RuleBasedNumberFormat &  operator= (const RuleBasedNumberFormat &rhs) 
virtual UBool  operator== (const Format &other) const 
virtual void  parse (const UnicodeString &text, Formattable &result, UErrorCode &status) const 
virtual void  parse (const UnicodeString &text, Formattable &result, ParsePosition &parsePosition) const 
virtual Formattable &  parseCurrency (const UnicodeString &text, Formattable &result, ParsePosition &pos) const 
void  parseObject (const UnicodeString &source, Formattable &result, UErrorCode &status) const 
virtual void  parseObject (const UnicodeString &source, Formattable &result, ParsePosition &parse_pos) const 
RuleBasedNumberFormat (const RuleBasedNumberFormat &rhs)  
RuleBasedNumberFormat (URBNFRuleSetTag tag, const Locale &locale, UErrorCode &status)  
RuleBasedNumberFormat (const UnicodeString &rules, const UnicodeString &localizations, const Locale &locale, UParseError &perror, UErrorCode &status)  
RuleBasedNumberFormat (const UnicodeString &rules, const Locale &locale, UParseError &perror, UErrorCode &status)  
RuleBasedNumberFormat (const UnicodeString &rules, const UnicodeString &localizations, UParseError &perror, UErrorCode &status)  
RuleBasedNumberFormat (const UnicodeString &rules, UParseError &perror, UErrorCode &status)  
virtual void  setCurrency (const UChar *theCurrency, UErrorCode &ec) 
virtual void  setDefaultRuleSet (const UnicodeString &ruleSetName, UErrorCode &status) 
virtual void  setGroupingUsed (UBool newValue) 
virtual void  setLenient (UBool enabled) 
virtual void  setMaximumFractionDigits (int32_t newValue) 
virtual void  setMaximumIntegerDigits (int32_t newValue) 
virtual void  setMinimumFractionDigits (int32_t newValue) 
virtual void  setMinimumIntegerDigits (int32_t newValue) 
virtual void  setParseIntegerOnly (UBool value) 
virtual  ~RuleBasedNumberFormat () 
Static Public Member Functions  
static NumberFormat *U_EXPORT2  createCurrencyInstance (const Locale &inLocale, UErrorCode &) 
static NumberFormat *U_EXPORT2  createCurrencyInstance (UErrorCode &) 
static NumberFormat *U_EXPORT2  createInstance (const Locale &desiredLocale, EStyles choice, UErrorCode &success) 
static NumberFormat *U_EXPORT2  createInstance (const Locale &inLocale, UErrorCode &) 
static NumberFormat *U_EXPORT2  createInstance (UErrorCode &) 
static NumberFormat *U_EXPORT2  createPercentInstance (const Locale &inLocale, UErrorCode &) 
static NumberFormat *U_EXPORT2  createPercentInstance (UErrorCode &) 
static NumberFormat *U_EXPORT2  createScientificInstance (const Locale &inLocale, UErrorCode &) 
static NumberFormat *U_EXPORT2  createScientificInstance (UErrorCode &) 
static StringEnumeration *U_EXPORT2  getAvailableLocales (void) 
static const Locale *U_EXPORT2  getAvailableLocales (int32_t &count) 
static UClassID U_EXPORT2  getStaticClassID (void) 
static void U_EXPORT2  operator delete (void *, void *) U_NO_THROW 
static void U_EXPORT2  operator delete (void *p) U_NO_THROW 
static void U_EXPORT2  operator delete[] (void *p) U_NO_THROW 
static void *U_EXPORT2  operator new (size_t, void *ptr) U_NO_THROW 
static void *U_EXPORT2  operator new (size_t size) U_NO_THROW 
static void *U_EXPORT2  operator new[] (size_t size) U_NO_THROW 
static URegistryKey U_EXPORT2  registerFactory (NumberFormatFactory *toAdopt, UErrorCode &status) 
static UBool U_EXPORT2  unregister (URegistryKey key, UErrorCode &status) 
Protected Member Functions  
virtual void  getEffectiveCurrency (UChar *result, UErrorCode &ec) const 
void  setLocaleIDs (const char *valid, const char *actual) 
Static Protected Member Functions  
static void  syntaxError (const UnicodeString &pattern, int32_t pos, UParseError &parseError) 
Private Member Functions  
void  dispose () 
NFRuleSet *  findRuleSet (const UnicodeString &name, UErrorCode &status) const 
void  format (double number, NFRuleSet &ruleSet) 
Collator *  getCollator () const 
DecimalFormatSymbols *  getDecimalFormatSymbols () const 
NFRuleSet *  getDefaultRuleSet () const 
void  init (const UnicodeString &rules, LocalizationInfo *localizations, UParseError &perror, UErrorCode &status) 
void  initDefaultRuleSet () 
RuleBasedNumberFormat (const UnicodeString &description, LocalizationInfo *localizations, const Locale &locale, UParseError &perror, UErrorCode &status)  
void  stripWhitespace (UnicodeString &src) 
Private Attributes  
Collator *  collator 
DecimalFormatSymbols *  decimalFormatSymbols 
NFRuleSet *  defaultRuleSet 
UBool  lenient 
UnicodeString *  lenientParseRules 
Locale  locale 
LocalizationInfo *  localizations 
UBool  noParse 
NFRuleSet **  ruleSets 
Friends  
class  FractionalPartSubstitution 
class  NFRule 
class  NFSubstitution 