Logo Search packages:      
Sourcecode: icu version File versions  Download package

ucsdet.h File Reference

C API: Charset Detection API. More...

#include "unicode/utypes.h"
#include "unicode/localpointer.h"
#include "unicode/uenum.h"
Include dependency graph for ucsdet.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.


typedef struct UCharsetDetector UCharsetDetector
typedef struct UCharsetMatch UCharsetMatch


U_STABLE void U_EXPORT2 ucsdet_close (UCharsetDetector *ucsd)
U_STABLE const UCharsetMatch
ucsdet_detect (UCharsetDetector *ucsd, UErrorCode *status)
U_STABLE const UCharsetMatch
ucsdet_detectAll (UCharsetDetector *ucsd, int32_t *matchesFound, UErrorCode *status)
U_STABLE UBool U_EXPORT2 ucsdet_enableInputFilter (UCharsetDetector *ucsd, UBool filter)
U_STABLE UEnumeration *U_EXPORT2 ucsdet_getAllDetectableCharsets (const UCharsetDetector *ucsd, UErrorCode *status)
U_STABLE int32_t U_EXPORT2 ucsdet_getConfidence (const UCharsetMatch *ucsm, UErrorCode *status)
U_STABLE const char *U_EXPORT2 ucsdet_getLanguage (const UCharsetMatch *ucsm, UErrorCode *status)
U_STABLE const char *U_EXPORT2 ucsdet_getName (const UCharsetMatch *ucsm, UErrorCode *status)
U_STABLE int32_t U_EXPORT2 ucsdet_getUChars (const UCharsetMatch *ucsm, UChar *buf, int32_t cap, UErrorCode *status)
U_STABLE UBool U_EXPORT2 ucsdet_isInputFilterEnabled (const UCharsetDetector *ucsd)
U_STABLE UCharsetDetector
ucsdet_open (UErrorCode *status)
U_STABLE void U_EXPORT2 ucsdet_setDeclaredEncoding (UCharsetDetector *ucsd, const char *encoding, int32_t length, UErrorCode *status)
U_STABLE void U_EXPORT2 ucsdet_setText (UCharsetDetector *ucsd, const char *textIn, int32_t len, UErrorCode *status)

Detailed Description

C API: Charset Detection API.

This API provides a facility for detecting the charset or encoding of character data in an unknown text format. The input data can be from an array of bytes.

Character set detection is at best an imprecise operation. The detection process will attempt to identify the charset that best matches the characteristics of the byte data, but the process is partly statistical in nature, and the results can not be guaranteed to always be correct.

For best accuracy in charset detection, the input data should be primarily in a single language, and a minimum of a few hundred bytes worth of plain text in the language are needed. The detection process will attempt to ignore html or xml style markup that could otherwise obscure the content.

Definition in file ucsdet.h.

Generated by  Doxygen 1.6.0   Back to index