NAME
setlocale
—
select character encoding
SYNOPSIS
#include
<locale.h>
char *
setlocale
(int
category, const char
*locale);
DESCRIPTION
The
setlocale
()
function sets and retrieves the active locale for the
current process. The locale modifies the behaviour of some functions in the
C library with respect to the character encoding, and on other operating
systems also with respect to some language and cultural conventions. For
more information about locales in general, see the
locale(1) manual page.
On OpenBSD, the only useful value for the
category is LC_CTYPE
. It sets
the locale used for character encoding, character classification, and case
conversion. For compatibility with natural language support in
packages(7), all other categories —
LC_COLLATE
, LC_MESSAGES
,
LC_MONETARY
, LC_NUMERIC
, and
LC_TIME
— can be set and retrieved, too, but
their values are ignored by the OpenBSD C library. A
category of LC_ALL
sets the entire locale
generically, which is strongly discouraged for security reasons in portable
programs.
The syntax and semantics of the
locale argument are not standardized and vary among
operating systems. On OpenBSD, if the
locale string ends with ".UTF-8", the UTF-8
locale is selected; otherwise, the "C" locale is selected, which
uses the ASCII character set. If the locale contains a
dot but does not end with ".UTF-8",
setlocale
()
fails.
If locale is an empty string (""),
the value of the environment variable LC_ALL
, with a
fallback to the variable corresponding to category,
and with a further fallback to LANG
, is used
instead, as documented in the
locale(1) manual page.
If locale is NULL
,
the locale remains unchanged. This can be used to determine the currently
active locale.
By default, C programs start in the "C"
locale. The only function in the library that sets the locale is
setlocale
();
the locale is never changed as a side effect of some other routine.
The LC_CTYPE
category modifies the
behaviour of at least the following functions:
iswctype(3),
mblen(3),
mbrlen(3),
mbrtowc(3),
mbsrtowcs(3),
mbstowcs(3),
mbtowc(3),
towctrans(3),
towlower(3),
towupper(3),
wcrtomb(3),
wcscasecmp(3),
wcsrtombs(3),
wcstombs(3),
wctomb(3),
wctrans(3),
wctype(3), and the functions documented in
iswalnum(3).
RETURN VALUES
In case of success, setlocale
() returns a
pointer to a static string describing the locale that is in force after the
call. Subsequent calls to setlocale
() may change the
content of the string. The format of the string is not standardized and
varies among operating systems.
On OpenBSD, if
setlocale
() was never called with a
non-NULL
locale argument, the
string "C" is returned. Otherwise, if the
category was not LC_ALL
or if
the locale is the same for all categories, a copy of the
locale argument is returned. Otherwise, the locales
for the six categories LC_COLLATE
,
LC_CTYPE
, LC_MESSAGES
,
LC_MONETARY
, LC_NUMERIC
,
LC_TIME
are concatenated in that order, with slash
(‘/
’) characters in between.
In case of failure, setlocale
() returns
NULL
. On OpenBSD, that can
only happen if the category is invalid, if a character
encoding other than UTF-8 is requested, if the requested
locale name is of excessive length, or if memory
allocation fails.
EXAMPLES
Calling
setlocale(LC_CTYPE,
"en_US.UTF-8");
at the beginning of a program selects the UTF-8 locale and returns "en_US.UTF-8". Calling
setlocale(LC_ALL, NULL);
right afterwards leaves the locale unchanged and returns "C/en_US.UTF-8/C/C/C/C".
SEE ALSO
STANDARDS
The setlocale
() function conforms to
ANSI X3.159-1989
(“ANSI C89”).
HISTORY
The setlocale
() function first appeared in
4.3BSD-Net/2.
CAVEATS
On systems other than OpenBSD, calling
setlocale
() or
uselocale(3) with a category other than
LC_CTYPE
can cause erratic behaviour of many library
functions. For security reasons, make sure that portable programs only use
LC_CTYPE
.
For example, the following functions may be affected. The list is probably incomplete. For example, additional library functions may be impacted if they directly or indirectly call affected functions, or if they attempt to imitate aspects of their behaviour. Functions that are not standardized may be affected too.
LC_COLLATE
- glob(3), strcoll(3), strxfrm(3), wcscoll(3), wcsxfrm(3), and the functions documented in regexec(3)
LC_MESSAGES
- catgets(3), catopen(3), nl_langinfo(3), perror(3), psignal(3), strerror(3), strsignal(3), and the functions documented in err(3)
LC_MONETARY
- localeconv(3),
nl_langinfo(3),
strfmon
() LC_NUMERIC
- atof(3),
localeconv(3),
nl_langinfo(3),
strfmon
(), and the functions documented in printf(3), scanf(3), strtod(3), wcstod(3), wprintf(3), wscanf(3). This category is particularly dangerous because it can cause bugs in the parsing and formatting of numbers, for example failures to recognize or properly write decimal points. LC_TIME
getdate
(), nl_langinfo(3), strftime(3), strptime(3). Similarly, this is prone to causing bugs in the parsing and formatting of date strings.LC_CTYPE
- On systems other than OpenBSD, this category may affect the behaviour of additional functions, for example: btowc(3), isalnum(3), isalpha(3), isblank(3), iscntrl(3), isdigit(3), isgraph(3), islower(3), isprint(3), ispunct(3), isspace(3), isupper(3), isxdigit(3), mbsinit(3), strcasecmp(3), strcoll(3), strxfrm(3), tolower(3), toupper(3), vis(3), wcscoll(3), wcsxfrm(3), wctob(3)