Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP-UX Reference > C

charmap(4)

HP-UX 11i Version 3: February 2007
» 

Technical documentation

» Feedback
Content starts here

 » Table of Contents

 » Index

NAME

charmap — symbolic translation file for localedef scripts

SYNOPSIS

localedef -f charmap locale_name

DESCRIPTION

Invoking the localedef command with the -f option causes symbolic names in the locale description file to be translated into the encodings given in the charmap file (see localedef(1M)). As a recommendation, a locale description file should be written completely with symbolic names.

The charmap file has three sections: a declarations section, a character definition section, and an optional width specification section.

Declarations Section

Declarations can precede the character definitions.

Each consists of the symbol (including the surrounding angle brackets), followed by one or more blanks (or tabs or space characters), followed by the value of the symbol.

Certain declarations are required for multibyte character codesets. For single-byte codesets, all are optional.

Following is a list of possible declarations:

<code_set_name> value

  • Used to declare the name of the coded character set for which the charmap file is defined. This keyword is required for multibyte character codesets. For HP15 encoding scheme, HP15 needs to be part of the name. For EUC encoding scheme, EUC needs to be part of the name.

<cswidth> value

  • Used to declare the cswidth parameter of the coded character set for which the charmap file is defined (see eucset(1)).

<mb_cur_max> value

  • Used to declare the maximum number of bytes in a multibyte character. Defaults to 1 if not given. For multibyte character codesets, this keyword must be specified.

<mb_cur_min> value

  • Used to declare the minimum number of bytes in a character for the encoded character set. The value must be less than or equal to <mb_cur_max>. If not given, the default is equal to <mb_cur_max>.

<escape_char> value

  • Used to declare the escape character, which is used to escape characters that otherwise would have special meaning. If not given, the default is backslash (\).

<comment_char> value

  • Used to declare the comment character, which is used to begin comments and should be placed in column one of the charmap file. If not given, the default is the # character.

Character Definition Section

The character-set mapping definitions immediately follow an identifier line containing the string CHARMAP and precede a trailer line consisting of the string END CHARMAP. (Empty lines and lines beginning with the comment character are ignored.)

The character definitions are of two forms.

The first form defines a single character and its encoding:

  • <symbolic_name> encoding

A symbolic_name is one or more visible characters from the portable character set as specified by XPG, enclosed in angle brackets. Metacharacters such as angle brackets, escape characters, or comment characters must be escaped if they are used in the name. Two or more symbolic names can be given for the same encoding.

The encoding is a character constant in one of four forms:

decimal

An escape character followed by the letter d, followed by one to three decimal digits.

octal

An escape character followed by one to three octal digits.

hexadecimal

An escape character followed by an x, followed by two hexadecimal digits.

Unicode

An escape character followed by a u, followed by four or five hexadecimal digits. This encoding form can only be used when the -u option of the localedef command is specified.

Multibyte characters are represented by the concatenation of character constants. All constants used in the encoding of a multibyte character must be of the same form.

The second form defines a range of characters consisting of all characters from the first symbolic name to the second, inclusive:

  • <symbolic_name> ... <symbolic_name> encoding

The symbolic name must consist of one or more nonnumeric characters followed by an integer formed of one or more decimal digits. The integer part of the second symbolic name must be larger than that of the first. The range is then interpreted as a list of symbolic names consisting of the same character portion and successive integer values from the first through the last. These names are assigned successive encodings starting with the one given.

For example, the character definition line

<C4>...<C6> \d129

is equivalent to:

<C4> \d129 <C5> \d130 <C6> \d131

Width Specification

The following declarations can follow the character set mapping definitions (after the END CHARMAP statement). Each consists of one of the keywords shown in the following list, starting in column 1, followed by the value(s) associated with the keyword, as defined below.

WIDTH

A positive integer value (either 1 or 2) defining the column width for the printable character in the coded character set mapping definitions. Coded character set character values are defined using symbolic character names followed by column width values. Defining a character with more than one WIDTH produces undefined results. The END WIDTH keyword is used to terminate the WIDTH definitions. Specifying the width of a non-printable character in a WIDTH declaration produces undefined results. Ellipses (...) can be used between two symbolic character names to specify a range of characters.

WIDTH_DEFAULT

A positive integer value defining the default column width for any printable character not listed by one of the WIDTH keywords. If no WIDTH_DEFAULT keyword is included in the charmap, the default character width is 1.

EXAMPLES

For examples, see any of the files under the /usr/lib/nls/loc/charmaps directory.

After the END CHARMAP statement, a syntax for a width definition would be:

WIDTH <A> 1 <B> 1 <C>...<Z> 1 ... <wc1>...<wcn> 2 END WIDTH

In this example, the numerical code point values represented by the symbols <A> and <B> are assigned a width of 1. The code point values <C> to <Z> inclusive, that is, <C>, <D>, <E>, and so on, are also assigned a width of 1. Using <A>...<Z> would have required fewer lines, but the alternative was shown to demonstrate flexibility. The keyword WIDTH_DEFAULT could have been added as appropriate.

STANDARDS CONFORMANCE

localedef: POSIX.2, XPG4, UNIX 2003.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 1983-2007 Hewlett-Packard Development Company, L.P.