NAME
mbtowc
—
converts a multibyte character to a
wide character
SYNOPSIS
#include
<stdlib.h>
int
mbtowc
(wchar_t
* restrict pwc, const
char * restrict s, size_t
n);
DESCRIPTION
The
mbtowc
()
function converts the multibyte character pointed to by
s to a wide character, and stores it in the wchar_t
object pointed to by pwc. This function may inspect at
most n bytes of the array pointed to by
s.
Unlike mbrtowc(3), the first n bytes pointed to by s need to form an entire multibyte character. Otherwise, this function returns an error and the internal state will be undefined.
If a call to
mbtowc
()
results in an undefined internal state, parsing of the string starting at
s cannot continue, not even at a later byte, and
mbtowc
() must be called with s
set to NULL
to reset the internal state before it
can safely be used again on a different string.
The behaviour of
mbtowc
()
is affected by the LC_CTYPE
category of the current
locale. Calling any other functions in
libc never
changes the internal state of mbtowc
(), except for
calling setlocale(3) with the LC_CTYPE
category set to a different locale. Such
setlocale(3) calls cause the internal state of this function to be
undefined.
In state-dependent encodings such as ISO/IEC
2022-JP, s may point to the special sequence of bytes
to change the shift-state. Because such sequence bytes do not correspond to
any individual wide character,
mbtowc
()
treats them as if they were part of the subsequent multibyte character.
The following special cases apply to the arguments:
- s == NULL
mbtowc
() initializes its own internal state to the initial state, and determines whether the current encoding is state-dependent.mbtowc
() returns 0 if the encoding is state-independent, otherwise non-zero. pwc is ignored.- pwc == NULL
mbtowc
() behaves just as if pwc was notNULL
, including modifications to internal state, except that the result of the conversion is discarded. This can be used to determine the size of the wide character representation of a multibyte string. Another use case is a check for illegal or incomplete multibyte sequences.- n == 0
- In this case, the first n bytes of the array pointed
to by s never form a complete character and
mbtowc
() always fails.
RETURN VALUES
Normally, mbtowc
() returns:
- 0
- s points to a null byte (‘\0’).
- positive
- Number of bytes for the valid multibyte character pointed to by
s. There are no cases where the value returned is
greater than the value of the
MB_CUR_MAX
macro. - -1
- s points to an invalid or an incomplete multibyte character. errno is set to indicate the error.
When s is NULL
,
mbtowc
() returns:
- 0
- The current encoding is state-independent.
- non-zero
- The current encoding is state-dependent.
EXAMPLES
The following program parses a UTF-8 string and reports encoding errors:
#include <limits.h> #include <locale.h> #include <stdio.h> #include <stdlib.h> int main(void) { char s[LINE_MAX]; wchar_t wc; int i, len; setlocale(LC_CTYPE, "C.UTF-8"); if (fgets(s, sizeof(s), stdin) == NULL) *s = '\0'; for (i = 0, len = 1; len != 0; i += len) { switch (len = mbtowc(&wc, s + i, MB_CUR_MAX)) { case 0: printf("byte %d end of string 0x00\n", i); break; case -1: printf("byte %d invalid 0x%0.2hhx\n", i, s[i]); len = 1; break; default: printf("byte %d U+%0.4X %lc\n", i, wc, wc); break; } } return 0; }
Recovering from encoding errors and continuing to parse the rest of the string as shown above is only possible for state-independent character encodings. For full generality, the error handling can be modified to reset the internal state. In that case, the rest of the string has to be skipped if the encoding is state-dependent:
case -1: printf("byte %d invalid 0x%0.2hhx\n", i, s[i]); len = !mbtowc(NULL, NULL, MB_CUR_MAX); break;
ERRORS
mbtowc
() will set
errno in the following cases:
- [
EILSEQ
] - s points to an invalid or incomplete multibyte character.
SEE ALSO
STANDARDS
The mbtowc
() function conforms to
ANSI X3.159-1989 (“ANSI C89”).
The restrict qualifier is added at ISO/IEC 9899:1999
(“ISO C99”). Setting errno
is an IEEE Std 1003.1-2008 (“POSIX.1”)
extension.
CAVEATS
On error, callers of mbtowc
() cannot tell
whether the multibyte character was invalid or incomplete. To treat
incomplete data differently from invalid data the
mbrtowc(3) function can be used instead.