converts a multibyte character to a wide character
* restrict wc
const char * restrict
, mbstate_t *
() function examines at most
bytes of the multibyte character byte
string pointed to by s
, converts those bytes
to a wide character, and stores the wide character in the wchar_t object
pointed to by wc
points to a valid character.
Conversion happens in accordance with the conversion state described by the
mbstate_t object pointed to by mbs
mbstate_t object must be initialized to zero before the application's first
call to mbrtowc
(). If the previous call to
() did not return (size_t)-1, the
mbstate_t object can safely be reused without reinitialization.
The behaviour of mbrtowc
() is affected by the
category of the current locale. If
the locale is changed without reinitialization of the mbstate_t object pointed
to by mbs
, the behaviour of
() is undefined.
() will accept an incomplete byte sequence
pointed to by s
which does not form a
complete character but is potentially part of a valid character. In this case,
() consumes all such bytes. The conversion
state saved in the mbstate_t object pointed to by
will be used to restart the suspended
conversion during the next call to mbrtowc
In state-dependent encodings, s
may point to a
special sequence of bytes called a “shift sequence”. Shift
sequences switch between character code sets available within an encoding
scheme. One encoding scheme using shift sequences is ISO/IEC 2022-JP, which
can switch e.g. from ASCII (which uses one byte per character) to JIS X 0208
(which uses two bytes per character). Shift sequence bytes correspond to no
individual wide character, so mbrtowc
them as if they were part of the subsequent multibyte character. Therefore
they do contribute to the number of bytes in the multibyte character.
Special cases in interpretation of arguments are as follows:
- wc == NULL
- The conversion from a multibyte character to a wide
character is performed and the conversion state may be affected, but the
resulting wide character is discarded.
This can be used to find out how many bytes are contained in the multibyte
character pointed to by s.
- s == NULL
- mbrtowc() ignores
n, and behaves equivalent to
which attempts to use the mbstate_t object pointed to by
mbs to start or continue conversion using
the empty string as input, and discards the conversion result.
If conversion succeeds, this call always returns zero. Unlike
mbtowc(3), the value
returned does not indicate whether the current encoding of the locale is
state-dependent, i.e. uses shift sequences.
mbrtowc(NULL, "", 1, mbs);
- mbs == NULL
- mbrtowc() uses its own
internal state object to keep the conversion state, instead of an
mbstate_t object pointed to by mbs. This
internal conversion state is initialized once at program startup. It is
not safe to call mbrtowc() again with a
mbs argument if
mbrtowc() returned (size_t)-1 because at this
point the internal conversion state is undefined.
Calling any other functions in libc never
changes the internal conversion state object of
- The bytes pointed to by s
form a terminating NUL character. If wc
NULL, a NUL wide character has
been stored in the wchar_t object pointed to by
- s points to a valid
character, and the value returned is the number of bytes completing the
character. If wc is not
NULL, the corresponding wide character
has been stored in the wchar_t object pointed to by
- s points to an illegal
byte sequence which does not form a valid multibyte character in the
current locale. mbrtowc() sets
errno to EILSEQ. The conversion state
object pointed to by mbs is left in an
undefined state and must be reinitialized before being used again.
Because applications using mbrtowc() are
shielded from the specifics of the multibyte character encoding scheme, it
is impossible to repair byte sequences containing encoding errors. Such
byte sequences must be treated as invalid and potentially malicious input.
Applications must stop processing the byte string pointed to by
s and either discard any wide characters
already converted, or cope with truncated input.
- s points to an incomplete
byte sequence of length n which has been
consumed and contains part of a valid multibyte character. The character
may be completed by calling mbrtowc() again
with s pointing to one or more subsequent
bytes of the multibyte character and mbs
pointing to the conversion state object used during conversion of the
incomplete byte sequence.
() function may cause an error in the
- s points to an invalid
- mbs points to an invalid
or uninitialized mbstate_t object.
() function conforms to ISO/IEC
9899/AMD1:1995 (“ISO C90, Amendment 1”). The restrict qualifier
is added at ISO/IEC 9899:1999 (“ISO C99”).
() is not suitable for programs that care
about internals of the character encoding scheme used by the byte string
pointed to by s
It is possible that mbrtowc
() fails because of
locale configuration errors. An “invalid” character sequence may
simply be encoded in a different encoding than that of the current locale.
The special cases for s
== NULL and
== NULL do not make any sense. Instead of
can be used.
Earlier versions of this man page implied that calling
() with a
argument would always set mbs
to the initial
conversion state. But this is true only if the previous call to
did not return (size_t)-1 or (size_t)-2.
It is recommended to zero the mbstate_t object instead.