GetStringTypeEx
The GetStringTypeEx
function returns character-type information for the characters in the specified
source string. For each character in the string, the function sets one or more
bits in the corresponding 16-bit element of the output array. Each bit
identifies a given character type, such as whether the character is a letter, a
digit, or neither.
Unlike its
close relatives GetStringTypeA
BOOL GetStringTypeEx(
LCID Locale, |
// locale identifer |
DWORD dwInfoType, |
// information-type
options |
LPCTSTR lpSrcStr, |
// address of
source string |
int cchSrc, |
// size, in bytes
or characters, of source string |
LPWORD lpCharType |
// address of
buffer for output |
); |
|
Parameters
Locale
Specifies the
locale identifier. This value uniquely defines the ANSI code page to use to
translate the string pointed to by lpSrcStr from ANSI to Unicode. The function
then analyzes each Unicode character for character type information. Note that
the W version of this function ignores this parameter.
This
parameter can be a locale identifier created by the MAKELCID
LOCALE_SYSTEM_DEFAULT |
Default
system locale |
LOCALE_USER_DEFAULT |
Default
user locale |
dwInfoType
Specifies the
type of character information the user wants to retrieve. The various types are
divided into different levels (see the following Remarks section for a list of
the information included in each type). This parameter can specify one of the
following character type flags:
CT_CTYPE1 |
Retrieve
character type information. |
CT_CTYPE2 |
Retrieve
bidirectional layout information. |
CT_CTYPE3 |
Retrieve text
processing information. |
lpSrcStr
Points to the
string for which character types are requested. If cchSrc is -1, the string is assumed to be null terminated. This
must be a Unicode string for the W version of this function, and an ANSI
string for the A version. Note that for the A version, this can be a double-byte character
set (DBCS) string if the locale is appropriate for DBCS.
cchSrc
Specifies the
size, in bytes (ANSI version) or characters (Unicode version), of the string
pointed to by the lpSrcStr parameter. If this count includes a null
terminator, the function returns character type information for the null
terminator. If this value is -1,
the string is assumed to be null terminated and the length is calculated
automatically.
lpCharType
Points to an
array of 16-bit values. The length of this array must be large enough to
receive one 16-bit value for each character in the source string. When the
function returns, this array contains one word corresponding to each character
in the source string.
Return Values
If the
function succeeds, the return value is nonzero.
If the
function fails, the return value is zero. To get extended error information,
call GetLastError
ERROR_INVALID_FLAGS
ERROR_INVALID_PARAMETER
Remarks
The GetStringTypeEx
function exists to circumvent a limitation caused by the difference in
parameters of GetStringTypeA and GetStringTypeW. That parameter
difference prevents an application from automatically invoking the proper A or
W version of GetStringType* through the use of the #define
UNICODE switch. GetStringTypeEx, on the other hand, behaves properly
as regards that switch. Thus, it is the recommended Win32 function.
The Locale
parameter is only used to perform string conversion to Unicode. It has nothing
to do with the CTYPEs the function returns. The CTYPEs are solely determined by
Unicode code points, and do not vary on a locale basis. For example, Greek
letters are C1_ALPHA for any Locale value.
The lpSrcStr
and lpCharType pointers must not be the same. If they are the same, the
function fails and GetLastError returns ERROR_INVALID_PARAMETER.
The
character-type bits are divided into several levels. The information for one
level can be retrieved by a single call to this function. Each level is limited
to 16 bits of information so that the other mapping routines, which are limited
to 16 bits of representation per character, can also return character-type
information.
The character
types supported by this function include the following.
Ctype
1
These types
support ANSI C and POSIX (LC_CTYPE) character-typing functions. A combination
of these values is returned in the array pointed to by the lpCharType
parameter when the dwInfoType parameter is set to CT_CTYPE1.
Name |
Value |
Meaning |
C1_UPPER |
0x0001 |
Uppercase |
C1_LOWER |
0x0002 |
Lowercase |
C1_DIGIT |
0x0004 |
Decimal
digits |
C1_SPACE |
0x0008 |
Space
characters |
C1_PUNCT |
0x0010 |
Punctuation
|
C1_CNTRL |
0x0020 |
Control
characters |
C1_BLANK |
0x0040 |
Blank
characters |
C1_XDIGIT |
0x0080 |
Hexadecimal
digits |
C1_ALPHA |
0x0100 |
Any
linguistic character: alphabetic, syllabary, or ideographic |
The following
character types are either constant or computable from basic types and do not
need to be supported by this function.
Type |
Description |
Alphanumeric |
Alphabetic
characters and digits (C1_ALPHA and C1_DIGIT) |
Printable |
Graphic
characters and blank (all C1_* types except C1_CNTRL) |
Ctype
2
These types
support proper layout of Unicode text. The direction attributes are assigned so
that the bidirectional layout algorithm standardized by Unicode produces
accurate results. These types are mutually exclusive. For more information
about the use of these attributes, see The Unicode Standard: Worldwide
Character Encoding, Volumes 1 and 2, Addison Wesley Publishing Company:
1991, 1992, ISBN 0201567881.
Name |
Value |
Meaning |
Strong: |
|
|
C2_LEFTTORIGHT |
0x1 |
Left to
right |
C2_RIGHTTOLEFT |
0x2 |
Right to
left |
Weak: |
|
|
C2_EUROPENUMBER |
0x3 |
European
number, European digit |
C2_EUROPESEPARATOR |
0x4 |
European
numeric separator |
C2_EUROPETERMINATOR |
0x5 |
European
numeric terminator |
C2_ARABICNUMBER |
0x6 |
Arabic
number |
C2_COMMONSEPARATOR |
0x7 |
Common
numeric separator |
Neutral: |
|
|
C2_BLOCKSEPARATOR |
0x8 |
Block
separator |
C2_SEGMENTSEPARATOR |
0x9 |
Segment
separator |
C2_WHITESPACE |
0xA |
White space
|
C2_OTHERNEUTRAL |
0xB |
Other
neutrals |
Not
applicable: |
|
|
C2_NOTAPPLICABLE |
0x0 |
No implicit
directionality (for example, control codes) |
Ctype
3
These types
are intended to be placeholders for extensions to the POSIX types required for
general text processing or for the standard C library functions. These types
are supported in the current version of Windows NT. A combination of these
values is returned when dwInfoType is set to CT_CTYPE3.
Name |
Value |
Meaning |
C3_NONSPACING |
0x1 |
Nonspacing
mark |
C3_DIACRITIC |
0x2 |
Diacritic
nonspacing mark |
C3_VOWELMARK |
0x4 |
Vowel
nonspacing mark |
C3_SYMBOL |
0x8 |
Symbol |
C3_KATAKANA |
0x10 |
Katakana
character |
C3_HIRAGANA |
0x20 |
Hiragana
character |
C3_HALFWIDTH |
0x40 |
Half-width
character |
C3_FULLWIDTH |
0x80 |
Full-width
character |
C3_IDEOGRAPH |
0x100 |
Ideographic
character |
C3_KASHIDA |
0x200 |
Arabic
Kashida character |
C3_ALPHA |
0x8000 |
All
linguistic characters (alphabetic, syllabary, and ideographic) |
Not
applicable: |
|
|
C3_NOTAPPLICABLE |
0x0 |
Not applicable
|
See Also