[azertyuiop]
to define the list of letters on the first row of a French keyboard,
[a-z]
to specify all the characters between
"a" and
"z",
[^a-z]
for all the characters that are not between
"a" and
"z," but also
[-^\\]
to define the characters "-,"
"^," and
"\," or
[-+]
to specify a decimal sign.
These examples are enough to see that what's between
these square brackets follows a specific syntax and semantic. Like
the regular expression's main syntax, we have a list
of atoms, but instead of matching each atom against a character of
the instance string, we define a logical space. Between the atoms and
the character class is the set of characters matching any of the
atoms found between the brackets.
We see also two special characters that have a different meaning
depending on their location! The character -,
which is a range delimiter when it is between a
and z, is a normal character when it is just after
the opening bracket or just before the closing bracket
([+-] and [-+] are, therefore,
both legal). On the contrary, ^, which is a
negator when it appears at the beginning of a class, loses this
special meaning to become a normal character later in the class
definition.
[\p{IsBasicLatin}-[^\p{L}]]
Or, using the \P construct, which is also a
complement, we can define the class as:
[\p{IsBasicLatin}-[\P{L}]]
The corresponding datatype definition would be: