MBRTOWC(3) | Library Functions Manual | MBRTOWC(3) |
mbrtowc
— converts
a multibyte character to a wide character (restartable)
#include
<wchar.h>
size_t
mbrtowc
(wchar_t
* restrict wc, const char
* restrict s, size_t
n, mbstate_t * restrict
mbs);
The
mbrtowc
()
function examines at most n bytes of the multibyte
character byte string pointed to by s, converts those
bytes to a wide character, and stores the wide character in the wchar_t
object pointed to by wc if wc is
not NULL
and s points to a
valid character.
Conversion happens in accordance with the
conversion state described by the mbstate_t object pointed to by
mbs. The mbstate_t object must be initialized to zero
before the application's first call to
mbrtowc
().
If the previous call to mbrtowc
() did not return
(size_t)-1, the mbstate_t object can safely be reused without
reinitialization.
The behaviour of
mbrtowc
()
is affected by the LC_CTYPE
category of the current
locale. If the locale is changed without reinitialization of the mbstate_t
object pointed to by mbs, the behaviour of
mbrtowc
() is undefined.
Unlike
mbtowc(3),
mbrtowc
()
will accept an incomplete byte sequence pointed to by
s which does not form a complete character but is
potentially part of a valid character. In this case,
mbrtowc
() consumes all such bytes. The conversion
state saved in the mbstate_t object pointed to by mbs
will be used to restart the suspended conversion during the next call to
mbrtowc
().
In state-dependent encodings,
s may point to a special sequence of bytes called a
“shift sequence”. Shift sequences switch between character
code sets available within an encoding scheme. One encoding scheme using
shift sequences is ISO/IEC 2022-JP, which can switch e.g. from ASCII (which
uses one byte per character) to JIS X 0208 (which uses two bytes per
character). Shift sequence bytes correspond to no individual wide character,
so
mbrtowc
()
treats them as if they were part of the subsequent multibyte character.
Therefore they do contribute to the number of bytes in the multibyte
character.
Special cases in interpretation of arguments are as follows:
This can be used to find out how many bytes are contained in the multibyte character pointed to by s.
mbrtowc
()
ignores wc and n, and behaves
equivalent to
mbrtowc(NULL, "", 1, mbs);
which attempts to use the mbstate_t object pointed to by mbs to start or continue conversion using the empty string as input, and discards the conversion result.
If conversion succeeds, this call always returns zero. Unlike mbtowc(3), the value returned does not indicate whether the current encoding of the locale is state-dependent, i.e. uses shift sequences.
mbrtowc
()
uses its own internal state object to keep the conversion state, instead
of an mbstate_t object pointed to by mbs. This
internal conversion state is initialized once at program startup. It is
not safe to call mbrtowc
() again with a
NULL
mbs argument if
mbrtowc
() returned (size_t)-1 because at this
point the internal conversion state is undefined.
Calling any other functions in
libc never
changes the internal conversion state object of
mbrtowc
().
NULL
, a NUL wide character has been stored in the
wchar_t object pointed to by wc.NULL
, the
corresponding wide character has been stored in the wchar_t object pointed
to by wc.mbrtowc
() sets errno to
EILSEQ. The conversion state object pointed to by
mbs is left in an undefined state and must be
reinitialized before being used again.
Because applications using mbrtowc
()
are shielded from the specifics of the multibyte character encoding
scheme, it is impossible to repair byte sequences containing encoding
errors. Such byte sequences must be treated as invalid and potentially
malicious input. Applications must stop processing the byte string
pointed to by s and either discard any wide
characters already converted, or cope with truncated input.
mbrtowc
() again with s
pointing to one or more subsequent bytes of the multibyte character and
mbs pointing to the conversion state object used
during conversion of the incomplete byte sequence.The mbrtowc
() function may cause an error
in the following cases:
The mbrtowc
() function conforms to ISO/IEC
9899/AMD1:1995 (“ISO C90, Amendment 1”). The restrict
qualifier is added at ISO/IEC 9899:1999 (“ISO C99”).
mbrtowc
() is not suitable for programs
that care about internals of the character encoding scheme used by the byte
string pointed to by s.
It is possible that mbrtowc
() fails
because of locale configuration errors. An “invalid” character
sequence may simply be encoded in a different encoding than that of the
current locale.
The special cases for s == NULL and
mbs == NULL do not make any sense. Instead of passing
NULL
for mbs,
mbtowc(3) can be used.
Earlier versions of this man page implied that calling
mbrtowc
() with a NULL
s argument would always set mbs
to the initial conversion state. But this is true only if the previous call
to mbrtowc
() using mbs did not
return (size_t)-1 or (size_t)-2. It is recommended to zero the mbstate_t
object instead.
February 8, 2016 | OpenBSD-6.0 |