Book HomeActionScript: The Definitive GuideSearch this book

4.5. The String Type

String is the datatype used for textual data (letters, punctuation marks, and other characters). A string literal is any combination of characters enclosed in quotation marks:

"asdfksldfsdfeoif"  // A frustrated string
"greetings"         // A friendly string
"moock@moock.org"   // A self-promotional string
"123"               // It may look like a number, but it's a string
'singles'           // Single quotes are acceptable too

Before we see how to form string literals, let's examine which characters are permitted in strings.

4.5.1. Character Encoding

Like all computer data, text characters are stored internally using a numeric code. They are encoded for storage and decoded for display using a character set, which maps (i.e., relates) characters to their numeric codes. Character sets vary for different languages and alphabets. Older Western applications use some derivative of ASCII, a standard character set that includes only 128 characters -- the English alphabet, numbers, and basic punctuation marks. Modern applications support a family of character sets known collectively as ISO-8859. Each of the ISO-8859 character sets encodes the standard Latin alphabet (`A' to `Z') plus a varying set of letters needed in the target languages. ActionScript uses ISO-8859-1, also known as Latin 1, as its primary character map.

The Latin 1 character set accommodates most Western European languages -- French, German, Italian, Spanish, Portuguese, and so on -- but not languages such as Greek, Turkish, Slavic, and Russian. Unicode, the preferred international standard for character encoding that maps up to a million characters, is not supported in ActionScript (support for Unicode would greatly increase the Flash Player size). However, ActionScript does support a second character set for Japanese characters called Shift-JIS. When working with text in ActionScript, we can use any character from Latin 1 or Shift-JIS.

Even though Unicode itself isn't supported, we can use the standard Unicode escape sequences to represent any character from Latin 1 or Shift-JIS. We can also manipulate character strings with Unicode-style functions. In theory, then, Unicode support could be added to Flash at some future date without breaking old code.

Appendix B, "Latin 1 Character Repertoire and Keycodes", lists each character's Unicode code point, which is the character's numeric position in the Unicode set. Later, we'll see how to use those code points to manipulate characters in our scripts.

4.5.2. String Literals

The most common way to make a string is to put either single or double quotation marks around a group of characters from the Latin 1 or Shift-JIS character sets:

"hello"
'Nice night for a walk.'
"The equation is 12 + 4 = 16, which programmers see as 12 + 4 == 16."

If we use a double quotation mark to start a string, we must end it with a double quotation mark as well. Likewise, if we use a single quotation mark to start a string, we must end that string with a single quotation mark. However, a double-quoted string may contain single-quoted characters and vice versa. These strings, for example, contain legal uses of single and double quotes:

"Nice night, isn't it?"               // Single (apostrophe) inside double quotes
'I said, "What a pleasant evening!"'  // Double quotes inside single quotes

4.5.2.2. Escape sequences

We saw earlier that single quotes (') may be used inside double-quoted literals, and double quotes (") may be used inside single-quoted literals. But what if we want to use both? For example:

'I remarked "Nice night, isn't it?"'

As is, that line of code causes an error because the interpreter thinks that the string literal ends with the apostrophe in the word "isn't." The interpreter reads it as:

'I remarked "Nice night, isn'  // The rest is considered unintelligible garbage

To use the single quote inside a string literal delimited by single quotes, we must use an escape sequence.

An escape sequence represents a literal string value using a backslash character (\), followed by a code that represents the desired character or the character itself. The escape sequences for single and double quotes are:

\'
\"

So, our cordial evening greeting, properly expressed as a string literal, should be:

'I remarked "Nice night, isn\'t it?"'  // Escape the apostrophe!

Other escape sequences, which can be used to represent various special or reserved characters, are listed in Table 4-1.

Table 4-1. ActionScript Escape Sequences

Escape Sequence

Meaning

\b

Backspace character (ASCII 8)

\f

Form feed character (ASCII 12)

\n

Newline character; causes a line break (ASCII 10)

\r

Carriage return (CR) character; causes a line break (ASCII 13)

\t

Tab character (ASCII 9)

\'

Single quotation mark

\"

Double quotation mark

\\

Backslash character; necessary when using backslash as a literal character to prevent \ from being interpreted as the beginning of an escape sequence

4.5.2.3. Unicode-style escape sequences

Not all characters from Latin 1 and Shift-JIS are accessible from a keyboard. In order to include inaccessible characters in a string, we use Unicode-style escape sequences. Note that Flash does not actually support Unicode; it merely emulates its syntax.

A Unicode-style escape sequence starts with a backslash and a lowercase u (i.e., \u) followed by a four-digit hex number that corresponds to the Unicode character's code point, such as:

\u0040  // The @ sign
\u00A9  // The copyright symbol
\u0041  // The capital letter "A"

A code point is a unique identification number that is assigned to each character in the Unicode character set. See Appendix B, "Latin 1 Character Repertoire and Keycodes" for a list of the Unicode code points for Latin 1. The Shift-JIS code points may be found at the Unicode Consortium site:

ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/JIS/SHIFTJIS.TXT

If we're only escaping characters from the Latin 1 character set, we may use a short form for the standard Unicode escape sequence. The short form consists of the prefix \x followed by a two-digit hexadecimal number that represents the Latin 1 encoding of the character. Since Latin 1 code points are the same as the first 256 Unicode code points, you can still use the reference chart in Appendix B, "Latin 1 Character Repertoire and Keycodes", but simply remove the u00, as in the following examples:

\u0040  // Unicode escape sequence
\x40    // \x shortcut form
\u00A9  // Unicode...
\xA9    // ...you get the idea

In addition to using Unicode escape sequences, we can insert any character into a string via the more cumbersome built-in function, fromCharCode( ), described later in Section 4.6.9, "Character Code Functions". Note that with both Unicode escape sequences and the fromCharCode( ) function, Flash 5 supports only those code points that map to characters in the Latin 1 and Shift-JIS character sets. Inserting other code points will not yield the correct Unicode character unless future versions of Flash support more of Unicode's character repertoire.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.