GetStringTypeW
The GetStringTypeW
function returns character-type information for the characters in the specified
source string. For each character in the string, the function sets one or more
bits in the corresponding 16-bit element of the output array. Each bit
identifies a given character type, such as whether the character is a letter, a
digit, or neither.
BOOL GetStringTypeW(
DWORD dwInfoType, |
// information-type
options |
LPCWSTR lpSrcStr, |
// address of
source string |
int cchSrc, |
// number of
characters in string |
LPWORD lpCharType |
// address of
buffer for output |
); |
|
Parameters
dwInfoType
Specifies the
type of character information the user wants to retrieve. The various types are
divided into different levels (see the following Remarks section for a list of
the information included in each type). This parameter can specify one of the
following character type flags:
CT_CTYPE1 |
Retrieve
character type information. |
CT_CTYPE2 |
Retrieve
bidirectional layout information. |
CT_CTYPE3 |
Retrieve
text processing information. |
lpSrcStr
Points to the
string for which character types are requested. If cchSrc is -1, the string is assumed to be null terminated. This
must be a Unicode string.
cchSrc
Specifies the
size, in characters, of the string pointed to by the lpSrcStr parameter.
If this count includes a null terminator, the function returns character type
information for the null terminator. If this value is -1, the string is assumed to be null terminated and the
length is calculated automatically.
lpCharType
Points to an
array of 16-bit values. The length of this array must be large enough to
receive one 16-bit value for the number of characters specified in the cchSrc
parameter. When the function returns, this array contains one word
corresponding to each Unicode character in the source string.
Return Values
If the
function succeeds, the return value is nonzero.
If the function
fails, the return value is zero. To get extended error information, call GetLastError
ERROR_INVALID_FLAGS
ERROR_INVALID_PARAMETER
Remarks
Note that the
GetStringTypeA
The lpSrcStr
and lpCharType pointers must not be the same. If they are the same, the
function fails and GetLastError returns ERROR_INVALID_PARAMETER.
The
character-type bits are divided into several levels. The information for one
level can be retrieved by a single call to this function. Each level is limited
to 16 bits of information so that the other mapping routines, which are limited
to 16 bits of representation per character, can also return character-type
information.
The character
types supported by this function include the following.
Ctype
1
These types
support ANSI C and POSIX (LC_CTYPE) character-typing functions. A combination
of these values is returned in the array pointed to by the lpCharType
parameter when the dwInfoType parameter is set to CT_CTYPE1.
Name |
Value |
Meaning |
C1_UPPER |
0x0001 |
Uppercase |
C1_LOWER |
0x0002 |
Lowercase |
C1_DIGIT |
0x0004 |
Decimal
digits |
C1_SPACE |
0x0008 |
Space
characters |
C1_PUNCT |
0x0010 |
Punctuation
|
C1_CNTRL |
0x0020 |
Control
characters |
C1_BLANK |
0x0040 |
Blank
characters |
C1_XDIGIT |
0x0080 |
Hexadecimal
digits |
C1_ALPHA |
0x0100 |
Any
linguistic character: alphabetic, syllabary, or ideographic |
The following
character types are either constant or computable from basic types and do not
need to be supported by this function.
Type |
Description |
Alphanumeric |
Alphabetic
characters and digits (C1_ALPHA and C1_DIGIT) |
Printable |
Graphic
characters and blanks (all C1_* types except C1_CNTRL) |
Ctype
2
These types
support proper layout of Unicode text. The direction attributes are assigned so
that the bidirectional layout algorithm standardized by Unicode produces
accurate results. These types are mutually exclusive. For more information
about the use of these attributes, see The Unicode Standard: Worldwide
Character Encoding, Volumes 1 and 2, Addison Wesley Publishing Company:
1991, 1992, ISBN 0201567881.
Name |
Value |
Meaning |
Strong: |
|
|
C2_LEFTTORIGHT |
0x1 |
Left to
right |
C2_RIGHTTOLEFT |
0x2 |
Right to
left |
Weak: |
|
|
C2_EUROPENUMBER |
0x3 |
European
number, European digit |
C2_EUROPESEPARATOR |
0x4 |
European
numeric separator |
C2_EUROPETERMINATOR |
0x5 |
European
numeric terminator |
C2_ARABICNUMBER |
0x6 |
Arabic
number |
C2_COMMONSEPARATOR |
0x7 |
Common
numeric separator |
Neutral: |
|
|
C2_BLOCKSEPARATOR |
0x8 |
Block
separator |
C2_SEGMENTSEPARATOR |
0x9 |
Segment
separator |
C2_WHITESPACE |
0xA |
White space
|
C2_OTHERNEUTRAL |
0xB |
Other
neutrals |
Not
applicable: |
|
|
C2_NOTAPPLICABLE |
0x0 |
No implicit
directionality (for example, control codes) |
Ctype
3
These types
are intended to be placeholders for extensions to the POSIX types required for
general text processing or for the standard C library functions. These types
are supported in the current version of Windows NT. A combination of these
values is returned when dwInfoType is set to CT_CTYPE3.
Name |
Value |
Meaning |
C3_NONSPACING |
0x1 |
Nonspacing
mark |
C3_DIACRITIC |
0x2 |
Diacritic
nonspacing mark |
C3_VOWELMARK |
0x4 |
Vowel
nonspacing mark |
C3_SYMBOL |
0x8 |
Symbol |
C3_KATAKANA |
0x10 |
Katakana
character |
C3_HIRAGANA |
0x20 |
Hiragana
character |
C3_HALFWIDTH |
0x40 |
Half-width
character |
C3_FULLWIDTH |
0x80 |
Full-width
character |
C3_IDEOGRAPH |
0x100 |
Ideographic
character |
C3_KASHIDA |
0x200 |
Arabic
Kashida character |
C3_ALPHA |
0x8000 |
All
linguistic characters (alphabetic, syllabary, and ideographic) |
Not
applicable: |
|
|
C3_NOTAPPLICABLE |
0x0 |
Not
applicable |
See Also