|
» |
|
|
|
NAMElocaledef — format and semantics of locale definition file DESCRIPTIONThis is a description of the syntax and meaning of the locale definition
that is provided as input to the
localedef
command to create a locale (see
localedef(1M)). The following is a list of category tags, keywords and subsequent
expressions which are recognized by
localedef.
The order of keywords within a category is irrelevant
with the exception of the
copy
keyword and other exceptions noted under the
LC_COLLATE
description.
(Note that, as a convention,
the category tags are composed of uppercase characters,
while the keywords are composed of lowercase characters). Category Tags and KeywordsThe following keywords do not belong to any category and should appear in
the beginning of the locale definition file:
- comment_char
Single character indicating the character
to be interpreted as starting a comment line within the locale definition
file. This character should be in the first column of a comment line.
The default
comment_char
is
#.
All lines with a
comment_char
in the first column are ignored. - escape_char
A single character indicating the character
to be interpreted as an escape character within the script.
The default
escape_char
is
\.
escape_char
is used to escape localedef metacharacters
to remove special meaning and in the character constant decimal,
octal, and hexadecimal formats. It is also used to continue a
line onto the next, if
escape_char
is the last character on the line (before the new-line character).
The following keywords can be used in any category:
- copy
A string naming another valid locale available on the system.
This causes the category in the locale being created
to be a copy of the same category in the named locale.
Since the
copy
keyword defines the entire category,
if used, it must be the only keyword in the category.
The following six categories are recognized:
- LC_CTYPE:
This category defines character classification, case conversion and other
character attributes. The following predefined character classifications
are recognized:
- upper
Character codes classified as uppercase letters. Characters specified
in the
cntrl,
digit,
punct
or
space
classifications cannot be specified in this category. - lower
Character codes classified as lowercase letters. Same restrictions
applicable to the
upper
category apply to this classification. - digit
Character codes classified as numeric. Only ten characters in contiguous
ascending sequence by numerical value can be specified. Alternative
digits cannot be specified here. - space
Character codes classified as white-space. No character specified for
the
upper,
lower,
alpha,
digit,
graph
or
xdigit
categories can be included in this classification. - punct
Character codes classified as punctuation characters. No character
included in the
upper,
lower,
alpha,
digit,
cntrl,
xdigit
or
space
categories can be specified. - cntrl
Character codes classified as control characters. No character included in
the
upper,
lower,
alpha,
digit,
punct,
graph,
print
or
xdigit
can be included here. - blank
Character codes classified as blank characters. The <space> and
<tab> characters are automatically included. - xdigit
Character codes classified as hexadecimal digits. Only the characters
defined for the
digit
class can be specified, followed by one or more sets of six characters,
with each set in ascending order. - alpha
Character codes classified as letters. Characters classified as
cntrl,
digit,
punct
or
space
cannot be specified. Characters specified as
upper
and
lower
classes are automatically included in this class. - print
Character codes classified as printable characters.
Characters specified for
upper,
lower,
alpha,
digit,
xdigit,
and
punct
classes and the <space> character are automatically included. No
character from the
cntrl
category can be specified. - graph
Character codes classified as printable characters,
except the <space> character.
In all other respect this classification is similar to the
print
category.
The following two are special classifications, used to designate
valid first-of-two and second-of-two
bytes.
Note that these are byte classifications and not character classifications;
hence, they cannot be used with the
iswctype
interface (see
wctype(3C)),
in the same manner as the other classifications can be used.
- first
Valid first bytes of two-byte characters. - second
Valid second bytes of two-byte characters.
Character case conversion definitions:
- toupper
Lowercase to uppercase character relationships. - tolower
Uppercase to lowercase character relationships.
Miscellaneous character attribute and classifications:
- alt_punct
String mapped into the ASCII
equivalent string ``b!"#$%&'()*+,-./:;<=>?@[\]^_`{}~'',
where b is a blank
(a
langinfo(5)
item). - charclass
Defines one or more locale-specific character class names as
strings
separated by semicolons. Each named character class can then be defined
subsequently in the
LC_CTYPE
definition. The first character of a character class name must be a letter
and the class name cannot match any of the predefined classifications
(for example,
space,
letter,
cntrl). - direction
String operand indicates text direction (a
langinfo(5)
item). String operand "1" indicates right-to-left text direction. - context
String operand indicates character context analysis. String "1"
indicates Arabic context analysis is required.
- LC_COLLATE:
The
LC_COLLATE
category provides collation sequence definition for relative ordering
between collating elements (single and multi-character collating
elements) in the locale.
The following keywords belong to this
category and should come between the category tag
LC_COLLATE
and
END LC_COLLATE.
The first two keywords can be in any order, but must come before
the
order_start
keyword.
Any number of the first two keywords can be specified.
- collating-element <symbol> from string
Defines a multi-character collating element,
symbol,
composed of the characters in
string.
String
is limited to two characters. - collating-symbol <symbol>
Makes
symbol
a collating symbol which can be used to define a place in the collating
sequence.
Symbol
does not represent any actual character. - order_start
Denotes the start of the collation sequence.
The directives have an effect on string collation.
The lines following the
order_start
keyword and before the
order_end
keyword contain collating element entries, one per line. Operands can optionally appear after the
order_start
keyword to defined rules for string comparison using a multiple-weight
scheme (if no operands are specified, a single
forward
operand is assumed).
The possible operands are:
- forward
Specifies that comparison operations proceed from start of string towards
the end of it. - backward
Specifies that comparison operations proceed from end of string towards
the beginning of it.
- order_end
Marks the end of the list of collating element entries.
- LC_MONETARY:
The
LC_MONETARY
category defines the rules and symbols used to format monetary
numeric information. The following keywords belong to this
category and should come between the category tag
LC_MONETARY
and
END LC_MONETARY:
- int_curr_symbol
The operand is a four-character string used to designate the international
currency symbol.
The first three characters should contain the alphabetic
international currency symbol in accordance with those specified in the
ISO 4217 standard.
The fourth character is the character used to
separate the international currency symbol from the monetary quantity. - currency_symbol
The operand is a string used as the local currency symbol. - mon_decimal_point
The operand is a string containing the symbol used as the decimal
delimiter (radix character). - mon_thousands_sep
The operand is a string containing the symbol used as a separator for
groups of digits to the left of decimal delimiter. - mon_grouping
The operand is a semicolon-separated list of integers.
The initial integer defines the size of the group
immediately preceding the decimal delimiter,
and the following integers define the preceding groups.
If the last integer is not -1, then the size of the
previous group (if any) will be repeatedly used for
the remainder of the digits. If the last integer is -1,
then no further grouping will be performed. - positive_sign
The operand is a string to indicate a non-negative monetary quantity. - negative_sign
The operand is a string to indicate a negative monetary quantity. - int_frac_digits
The operand is an integer representing the number of fractional digits
used in formatted monetary values using
int_curr_symbol. - frac_digits
The operand is an integer representing the number of fractional digits
used in formatted monetary values using
currency_symbol. - p_cs_precedes
The operand is an integer which if set to 1 indicates the
currency_symbol
precedes a monetary quantity, and if set to 0 the symbol succeeds the value. - p_sep_by_space
The operand is an integer which indicates the separation of the
currency_symbol,
the sign string, and the value for a non-negative formatted monetary quantity. The value of
p_sep_by_space,
n_sep_by_space,
int_p_sep_by_space,
and
int_n_sep_by_space
are interpreted according to the following:
- 0
No space separates the currency symbol and value. - 1
If the currency symbol and sign string are adjacent, a space separates
them from the value; otherwise, a space separates the currency
symbol from the value. - 2
If the currency symbol and sign string are adjacent, a space separates them;
otherwise, a space separates the sign string from the value.
- n_cs_precedes
The operand is an integer which if set to 1 indicates the
currency_symbol
precedes a negative monetary quantity, and if set to 0 the symbol succeeds
the negative value. - n_sep_by_space
The operand is an integer which indicates the separation of the
currency_symbol,
the sign string, and the value for a negative formatted monetary quantity. - p_sign_posn
The operand is an integer which indicates the positioning of the
positive_sign
for a positive monetary quantity.
The possible values are:
- 0
Parenthesis surround the quantity and the
currency_symbol
or
int_curr_symbol. - 1
The sign string precedes the quantity and the
currency_symbol
or
int_curr_symbol. - 2
The sign string succeeds the quantity and the
currency_symbol
or
int_curr_symbol. - 3
The sign string precedes the
currency_symbol
or
int_curr_symbol. - 4
The sign string succeeds the
currency_symbol
or
int_curr_symbol.
- n_sign_posn
The operand is an integer set to a value indicating the positioning of
the negative_sign for a negative formatted monetary quantity. - int_p_cs_precedes
The operand is an integer which if set to 1 indicates the
int_currency_symbol
precedes a monetary quantity, and if set to 0 the symbol succeeds the value. - int_p_sep_by_space
The operand is an integer which indicates the separation of the
int_currency_symbol,
the sign string, and the value for a non-negative internationally
formatted monetary quantity. - int_n_cs_precedes
The operand is an integer which if set to 1 indicates the
int_currency_symbol
precedes a negative monetary quantity, and if set to 0 the symbol succeeds
the negative value. - int_n_sep_by_space
The operand is an integer which indicates the separation of the
int_currency_symbol,
the sign string, and the value for a negative internationally
formatted monetary quantity. - int_p_sign_posn
The operand is an integer which indicates the positioning of the
positive_sign
for a positive monetary quantity formatted with the international format. - int_n_sign_posn
The operand is an integer which indicates the positioning of the
negative_sign
for a negative monetary quantity formatted with the international format.
- LC_NUMERIC:
The
LC_NUMERIC
category defines rules and symbols used to format non-monetary
numeric information.
The following keywords belong to this
category and should come between the category tag
LC_NUMERIC
and
END LC_NUMERIC:
- decimal_point
The operand is a string containing the symbol used as the decimal
delimiter (radix character) in numeric, non-monetary formatted
quantities. This keyword cannot be omitted and cannot be set
to the empty string. - thousands_sep
The operand is a string containing the symbol used as a separator
for groups of digits to the left of the decimal delimiter. - grouping
The operand is a semicolon-separated list of integers.
The initial integer defines the size of the group
immediately preceding the decimal delimiter,
and the following integers define the preceding groups.
If the last integer is not -1, then the size of the previous
group (if any) will be repeatedly used for the remainder of the
digits. If the last integer is -1, then no further grouping will
be performed. - alt_digit
String mapped into the ASCII
equivalent string "0123456789b+-.,eE
", where
b
is a blank (a
langinfo(5)
item).
The
alt_digit
keyword is an HP
extension to the
localedef
POSIX standards and it has a different meaning than the
alt_digits
defined in POSIX standards.
- LC_TIME:
The
LC_TIME
category defines the rules for generating locale-specific
formatted date strings.
The following mandatory keywords belong to this
category and should come between the category tag
LC_TIME
and
END LC_TIME:
- abday
Seven semicolon-separated strings
giving abbreviated names for the days
of the week beginning with Sunday. - day
Seven semicolon-separated strings
giving full names for the days of the week beginning with Sunday. - abmon
Twelve semicolon-separated strings giving abbreviated names for the months,
beginning with January. - mon
Twelve semicolon-separated strings giving full names for the months,
beginning with January. - d_t_fmt
The operand is a string defining the appropriate date and time
representation. - d_fmt
The operand is a string defining the appropriate date
representation. - t_fmt
The operand is a string defining the appropriate time
representation. - am_pm
The operand is two semicolon-separated strings giving
the representations for
AM
and
PM. - t_fmt_ampm
The operand is a string defining the appropriate time representation
in the 12-hour clock format with
am_pm. - era
The operand is a semi-colon-separated list of strings. Each string
defines the name and date of an era or emperor for a locale. Each
string should conform to the following format:
direction:offset:start_date:end_date:name:format where:
- direction
Either a
+
or
-
character.
The
+
character indicates the time axis should be such that the
years count in the positive direction when moving from the starting date
towards the ending date.
The
-
character indicates the time axis should be such
that the years count in the negative direction
when moving from the starting date towards the ending date. - offset
A number in the range
[SHRT_MIN,SHRT_MAX]
indicating the number of the first year of the era. - start_date
A date in the form
yyyy/mm/dd
where
yyyy,
mm,
and
dd
are the year, month and day numbers, respectively,
of the start of the era.
Years prior to the year 0 A.D.
are represented as negative numbers.
For example, an era beginning March 5th in the year 100 B.C.
would be represented as
3-100/3/5.
Years in the range
[SHRT_MIN+1,SHRT_MAX-1]
are supported. - end_date
The ending date of the era in the same form as the
start_date
above or one of the two special values
-*
or
+*.
A value of
-*
indicates the ending date of the era
extends to the beginning of time while
+*
indicates it extends to the end of time.
The ending date can be chronologically
either before or after the starting date of an era.
For example, the expressions for the Christian eras
A.D. and B.C. would be:
+:0:0000/01/01:+*:A.D.:%o %N
+:1:-0001/12/31:-*:B.C.:%o %N - name
A string representing the name of the era which is substituted for the
%N
directive of
date
and
strftime()
(see
date(1)
and
strftime(3C)). - format
A string for formatting the
%E
directive of
date
and
strftime().
This string is usually a function of the
%o
and
%N
directives.
If
format
is not specified, the string specified for the
LC_TIME
category keyword
era_d_fmt
(see below) is used as a default.
- era_d_fmt
The operand is a string defining the format of date in era notation. - era_t_fmt
The operand is a string defining the format of time in era notation. - era_d_t_fmt
The operand is a string defining the format of date and
time in era notation. - alt_digits
The operand is a semi-colon-separated list of strings. The first
string is the alternative symbol corresponding to zero, the second
string is the alternative symbol corresponding to one, and so on.
Note that if the HP-UX-proprietary
alt_digit
keyword has been specified in the same locale, the first ten symbols
should be identical for these two keywords.
In addition to the above, the following HP-UX-proprietary keywords are
recognized (these are provided for backward compatibility and their use
is otherwise not recommended):
year_unit,
mon_unit,
day_unit,
hour_unit,
min_unit,
sec_unit. - LC_MESSAGES:
The
LC_MESSAGES
category defines the format and values for affirmative and negative
responses.
The following keywords belong to this
category and should come between the category tag
LC_MESSAGES
and
END LC_MESSAGES:
- yesexpr
The string operand is
an Extended Regular Expression matching acceptable affirmative responses
to yes/no queries. - noexpr
The string operand is
an Extended Regular Expression matching acceptable negative responses
to yes/no queries. - yesstr
The string operand identifies the affirmative response for yes/no questions.
This keyword is now obsolete and
yesexpr
should be used instead. - nostr
The string operand identifies the negative response for yes/no questions
This keyword is now obsolete and
noexpr
should be used instead.
Keyword OperandsKeyword operands
consist of character-code constants and symbols, strings, and
metacharacters.
The types of legal expressions are:
character lists,
string lists,
integer lists,
shift,
collating element entries,
regular expression,
character constants
and
string:
- character lists
character list
operands consist of single character-code constants or symbolic names
separated by semicolons, or
a character-code range consisting
of a constant or symbolic name followed by an ellipsis
followed by another constant or symbolic name.
The constant preceding the ellipsis must have a smaller code value
than the constant following the ellipsis.
A range represents a set of consecutive character codes.
If the list is longer than a single line, the escape character must
be used at the end of each line as a continuation character.
It is an error to use any symbolic name that is not defined in an
accompanying charmap file (see
charmap(4)). - string lists
string list
operands
consist of strings separated by semicolons.
If longer than one line,
the escape character must be used for continuation. - string
string
operands consist of a sequence of zero or more characters
surrounded by double quotes (").
Within a string, the double-quote character must
be preceded by an escape character.
The following escape sequences also can be used:
- \n
newline - \t
horizontal tab - \b
backspace - \r
carriage return - \f
form feed - \\
backslash - \'
single quote - \ddd
bit pattern The escape
\ddd
consists of the escape character followed by
1, 2, or 3 octal digits specifying the value of the desired character (for
other possible bit pattern specification, see
character constants
below). Also, an escape character (\) and an immediately-following newline
are ignored.
Although the backslash (\) has been used for illustration, another escape
character can be substituted by the
escape_char
keyword. - character constants
Constants represent character codes in the operands.
They can be used in the following forms:
- decimal constants
An escape character followed by a
'd'
followed by up to three decimal digits. - octal constants
An escape character followed by up to three octal digits. - hexadecimal constants
An escape character followed by a
'x'
followed by two hexadecimal digits. - Unicode constants
An escape character followed by a
'u'
followed by four to eight hexadecimal digits which specifies a Unicode scalar
value in a charmap file to be used with the
-u
option of the
localedef
command. - character constants
A single character (for example, A)
having the numerical value of the
character in the machine's character set. - symbolic names
A string enclosed between
<
and
>
is a symbolic name.
localedef
input files are recommended to be written entirely in symbolic names,
utilizing a user defined or system-supplied charmap file.
This aids portability of
localedef
input files between different encoded character sets (see
charmap(4)). Symbolic names can be defined within a locale definition file by the
collating-element
and
collating-symbol
keywords.
These are not character constants.
It is an error
if such an internally defined symbolic name
collides with one defined in a charmap file.
- integer lists
Integer list
operands
consists of one or more decimal digits separated by semicolons. - shift
Shift
operands follow keywords
toupper
and
tolower,
and must consist of two character-code constants enclosed by
left and right parentheses and separated by a comma.
Each such character pair is separated from the next by a semicolon.
For
tolower,
the first constant represents an uppercase character
and the second the corresponding lowercase character.
For
toupper,
the first constant represents an lowercase character
and the second the corresponding uppercase character. - collating element entry
The
order_start
keyword is followed by collating element entries,
one per line, in ascending order by collating position.
The collating element entries have the form:
collation_element[weight[;weight]] collation_element
can be a character,
a collating symbol enclosed in angle brackets
representing a character or collating element,
the special symbol
UNDEFINED
or an ellipsis
(...). A character stands for itself;
a collating symbol can be a symbolic name for a character
that is interpreted by the charmap file,
a multi-character collating element defined by a
collating-element
keyword, or a collating symbol defined by the
collating-symbol keyword. The special symbol
UNDEFINED
specifies the collating position
of any characters not explicitly defined
by collating element entries.
For example, if some group of characters
is to be omitted from the collation sequence
and just collate after all defined characters,
a collating symbol might be defined before the
order_start
keyword:
Then somewhere in the list of collating element entries:
Notice that there is no second weight.
This means that on a second
pass all characters collate by their encoded value. An ellipsis is interpreted as a list of characters with an encoded value
higher than that of the character on the preceding line and lower than
that on the following line.
Because it is tied to encoded value of characters,
the ellipsis is inherently non-portable.
If it is used, a warning is issued and no output generated unless the
-c
option was given. The
weight
operands provide information about how the collating element is to be
collated on first and subsequent passes.
Weight
can be a two-character string, the special symbol
IGNORE,
or a collating element of any of the forms specified for
collating_element
except
UNDEFINED.
If there are no
weights,
the character is collating strictly by its position in the list.
If there is only one
weight
given, the character sorts by its relative position in the list on
the second collation pass. An equivalence class is defined by a series of collating element entries
all having the same character or symbol in the first
weight
position.
For example, in many locales all forms of the
character 'A' collate equal on the first pass.
This is represented in the collating element entries as:
'A' 'A';'A' # first element of equivalence class
'a' 'A';'a' # next element of class Two-to-one collating elements are specified by
collating-elements
defined before the
order_start
keyword.
For example, the two-to-one collating element
CH
in Spanish, would be defined before the
order_start
keyword as
collating element <CH> from "CH" It would then be used in a collating element entry as
<CH>. A one-to-two collating element is defined by having a two-character string
in one of the
weight
positions.
For example, if the character
'X'
collates equal to the
pair "AE",
the collating element entry would be:
A don't-care character is defined by the special symbol
IGNORE.
For example, the dash character,
'-'
may be a don't care on the first
collation pass.
The collating element entry is:
Symbols defined by the
collating-symbol
keyword can be used to indicate that a given character collates higher
or lower than some position in the sequence.
For example if all characters with an encoded value less than that of
'0'
are to collate lower than all other characters on the first pass,
and in relative order on the second pass,
define a collating symbol before the
order_start
keyword:
The first two collating element entries are then:
... <LOW>;...
'0' '0';'0' This also illustrates the use of the ellipsis to indicate a range.
The first ellipsis is interpreted as "all characters in the encoded
character set with a value lower than '0'"; the second ellipsis
means that all characters in the range defined by the first collate
in relative order. - regular expression
regular expression
operands conform to
the Extended Regular Expressions specifications as described in
regexp(5).
MetacharactersMetacharacters are characters having a special meaning to
localedef
in
operands.
To escape the special meaning of these characters,
surround them with single quotes or precede them by an escape character.
localedef
meta-characters include:
- <
Indicates the beginning of a symbolic name. - >
Indicates the end of a symbolic name. - (
Indicates the beginning of a character shift pair following the
toupper
and
tolower
keywords. - )
Indicates the end of a character shift pair. - ,
Used to separate the characters of a character shift pair. - "
Used to quote strings. - ;
Used as a separator in list operands. - escape character
Used to escape special meaning from other metacharacters and itself.
It is backslash (\) by default, but can be redefined by the
escape_char
keyword.
CommentsComments
are lines beginning with a comment character.
The comment character is pound sign (#) by default, but can be
redefined by the
comment_char
keyword.
Comments and blank lines are ignored. SeparatorsSeparator
characters include blanks and tabs.
Any number of separators can be used to delimit the
keywords, metacharacters, constants and strings that comprise a
localedef
script except that all characters between
<
and
>
are considered to be part of the symbolic name even they are <blank>s. EXAMPLESPlease see the files under
/usr/lib/nls/loc/src
for examples of
locale description files.
These files were used to create the various
locales which are delivered with HP-UX.
|