|
» |
|
|
|
NAMEsort — sort or merge files SYNOPSISsort
[-m]
[-o
output]
[-bdfinruM]
[-t
char]
[-k
keydef]
[-y
[kmem]]
[-z
recsz]
[-T
dir]
[file ...] sort
[-c]
[-AbdfinruM]
[-t
char]
[-k
keydef]
[-y
[kmem]]
[-z
recsz]
[-T
dir]
[file ...] DESCRIPTIONsort
performs one of the following functions:
- 1.
Sorts lines of all the named files together and writes the result
to the specified output. - 2.
Merges lines of all the named (presorted) files together and writes the result
to the specified output. - 3.
Checks that a single input file is correctly presorted.
The standard input is read if
-
is used as a file name or no input files are specified. Comparisons are based on one or more sort keys extracted
from each line of input.
By default, there is one sort key, the entire input line.
Ordering is lexicographic by characters using the collating sequence
of the current locale.
If the locale is not specified or is set to the
POSIX
locale, then ordering is lexicographic by bytes
in machine-collating sequence.
If the locale includes multi-byte characters,
single-byte characters are machine-collated before multi-byte characters. Behavior Modification OptionsThe following options alter the default behavior:
- -A
Sorts on a byte-by-byte basis using each character's encoded value.
On some systems, extended characters will be considered negative values,
and so sort before ASCII characters. If you are sorting ASCII
characters in a non-C/POSIX locale, this flag performs much faster. - -c
Check that the single input file is sorted according to the ordering rules.
No output is produced; the exit code is set to indicate the result. - -m
Merge only; the input files are assumed to be already sorted. - -o output
The argument given is the name of an output file
to use instead of the standard output.
This file can be the same as one of the input files. - -u
Unique: suppress all but one in each
set of lines having equal keys.
If used with the
-c
option, check to see that there are no lines with duplicate keys, in addition
to checking that the input file is sorted. - -y [kmem]
The amount of main memory used by the sort
can have a large impact on its performance.
If this option is omitted,
sort
begins using a system default memory size,
and continues to use more space as needed.
If this option is presented with a value,
kmem,
sort
starts using that number of kilobytes of memory,
unless the administrative minimum or maximum is violated,
in which case the corresponding extremum will be used.
Thus,
-y 0
is guaranteed to start with minimum memory.
By convention,
-y
(with no argument) starts with maximum memory. - -z recsz
The size of the longest line read is recorded
in the sort phase so that buffers can be allocated
during the merge phase.
If the sort phase is omitted via the
-c
or
-m
options, a popular system default size will be used.
Lines longer than the buffer size will cause
sort
to terminate abnormally.
Supplying the actual number of bytes in the longest line
to be merged (or some larger value)
will prevent abnormal termination. - -T dir
Use
dir
as the directory for temporary scratch files rather
than the default directory, which is is one of the following, tried in order:
the directory as specified in the
TMPDIR
environment variable;
/var/tmp,
and finally,
/tmp.
Ordering Rule OptionsWhen ordering options appear before restricted
sort key specifications, the ordering rules are
applied globally to all sort keys.
When attached to a specific sort key (described below),
the ordering options override all global ordering options
for that key. The following options override the default ordering rules:
- -d
Quasi-dictionary order:
only alphanumeric characters and blanks (spaces and tabs),
as defined by
LC_CTYPE
are significant in comparisons (see
environ(5)). (UNIX Standard only, see
standards(5))
The behavior is undefined for a sort key to which
-i or -n also applies. - -f
Fold letters.
Prior to being compared, all lowercase letters are
effectively converted into their uppercase equivalents, as defined by
LC_CTYPE. - -i
In non-numeric comparisons, ignore all characters which are non-printable,
as defined by
LC_CTYPE.
For the
ASCII
character set, octal character codes 001 through 037 and 0177 are ignored. - -n
The sort key is restricted to
an initial numeric string
consisting of optional blanks, an optional minus sign,
zero or more digits with optional radix character, and
optional thousands separators.
The radix and thousands separator characters are defined by
LC_NUMERIC.
The field is sorted by arithmetic value.
An empty (missing) numeric field is treated as arithmetic zero.
Leading zeros and plus or minus signs on zeros do not affect the ordering.
The
-n
option implies the
-b
option (see below). - -r
Reverse the sense of comparisons. - -M
Compare as months.
The first several non-blank characters of the field
are folded to uppercase and compared with the
langinfo(5)
items
ABMON_1
<
ABMON_2
< ... <
ABMON_12.
An invalid field is treated as being less than
ABMON_1
string.
For example, American month names are compared such that
JAN
<
FEB
< ... <
DEC.
An invalid field is treated as being less than all months.
The
-M
option implies the
-b
option (see below).
Field Separator OptionsThe treatment of field separators can be altered using the options:
- -t char
Use
char
as the field separator character;
char
is not considered to be part of a field
(although it can be included in a sort key).
Each occurrence of
char
is significant
(for example,
<char><char>
delimits an empty field).
If
-t
is not specified, <blank> characters will be used as default
field separators; each maximal sequence of <blank>
characters that follows a non-<blank> character is a field separator. - -b
Ignore leading blanks when determining the starting and ending
positions of a restricted sort key.
If the
-b
option is specified before the first
-k
option
(+
pos1
argument), it is applied to all
-k
options
(+
pos1
arguments).
Otherwise, the
-b
option can be attached independently to each
-k
field_start
or
field_end
option
(+
pos1
or
(-
pos2
argument; see below).
Note that the
-b
option is only effective when restricted sort key
specifications are given.
Restricted Sort Key- -k keydef
The
keydef
argument defines a restricted sort key.
The format of this definition is
field_start[type][,field_end[type]] which defines a key field beginning at
field_start
and ending at
field_end.
The characters at positions
field_start
and
field_end
are included in the key field, providing that
field_end
does not precede
field_start.
A missing
field_end
means the end of the line.
Fields and characters within fields are numbered starting with
1.
Note that this is different
than the obsolete form of restricted sort keys,
where numbering starts at
0.
See
WARNINGS
below. Specifying
field_start
and
field_end
involves the notion of a field,
a minimal sequence of characters
followed by a field separator or a new-line.
By default, the first blank of a sequence of blanks
acts as the field separator.
All blanks in a sequence of blanks
are considered to be part of the next field;
for example, all blanks at the beginning of a line
are considered to be part of the first field. The arguments
field_start
and
field_end
each have the form
m.n
which are optionally followed by one or more of the
type
options
b,
d,
f,
i,
n,
r,
or
M.
These modifiers have the functionality for this key only,
that their command-line counterparts have for the entire record. A
field_start
position specified by
m.n
is interpreted to mean the
nth
character in the
mth
field.
A missing
n
means
.1,
indicating the first character of the
mth
field.
If the
-b
option is in effect,
n
is counted from the first non-blank character in the
mth
field. A
field_end
position specified by
m.n
is interpreted to mean the
nth
character in the
mth
field.
If
n
is missing, the
mth
field ends at the last character of the field.
If the
-b
option is in effect,
n
is counted from the first non-<blank> character in the
mth
field. Multiple
-k
options are permitted and are significant in command line order.
A maximum of 9
-k
options can be given.
If no
-k
option is specified, a default sort key of the entire line is used.
When there are multiple sort keys, later keys
are compared only after all earlier keys
compare equal.
Lines that otherwise compare equal are ordered
with all bytes significant.
If all the specified keys compare equal,
the entire record is used as the final key. The
-k
option is intended to replace the obsolete
[+
pos1
[+
pos2]]
notation, using
field_start
and
field_end
respectively.
The fully specified
[+
pos1
[+
pos2]]
form:
is equivalent to:
-k w+1.x+1,y.0 (if z == 0)
-k w+1.x+1,y+1.z (if z > 0)
Obsolete Restricted Sort KeyThe notation
+pos1
-pos2
restricts a sort key to one beginning at
pos1
and ending at
pos2.
The characters at positions
pos1
and
pos2
are included in the sort key (provided that
pos2
does not precede
pos1).
A missing
-pos2
means the end of the line. Specifying
pos1
and
pos2
involves the notion of a field,
a minimal sequence of characters followed
by a field separator or a new-line.
By default, the first blank (space or tab) of a sequence of
blanks acts as the field separator.
All blanks in a sequence of blanks are considered to be
part of the next field; for example,
all blanks at the beginning of a line are considered to be part of
the first field. pos1
and
pos2
each have the form
m.n
optionally followed by one or more of the flags
bdfinrM.
A starting position specified by
+m.n
is interpreted to mean character
n+1
in field
m+1.
A missing
.n
means
.0,
indicating the first character of field
m+1.
If the
b
flag is in effect,
n
is counted from the first non-blank in field
m+1;
+m.0b
refers to the first non-blank character in field
m+1. A last position specified by
-m.n
is interpreted to mean the
nth
character (including separators) after the last character of the
mth
field.
A missing
.n
means
.0,
indicating the last character of the
mth
field.
If the
b
flag is in effect,
n
is counted from the last leading blank in field
m+1;
-m.1b
refers to the first non-blank in field
m+1. EXTERNAL INFLUENCESFor information about the UNIX standard environment, see
standards(5). Environment VariablesLC_COLLATE
determines the default ordering rules applied to the sort. LC_CTYPE
determines the locale for interpretation of sequences of bytes of text
data as characters (e.g., single- verses multibyte characters in
arguments and input files)
and the behavior of character classification for the
-b,
-d,
-f,
-i,
and
-n
options. LC_NUMERIC
determines the definition of the radix and thousands separator characters
for the
-n
option. LC_TIME
determines the month names for the
-M
option. LC_MESSAGES
determines the language in which messages are displayed. LC_ALL
determines the locale to use to override the values of all the other
internationalization variables. NLSPATH
determines the location of message catalogs for the processing of
LC_MESSAGES. LANG
provides a default value for the internationalization variables that are unset
or null. If
LANG
is unset or null, the default value of "C" (see
lang(5))
is used. If any of the internationalization variables contains an invalid setting,
sort
behaves as if all internationalization variables are set to "C".
See
environ(5). International Code Set SupportSingle- and multi-byte character code sets are supported. EXAMPLESSort the contents of
infile
with the second field as the sort key:
Sort, in reverse order, the contents of
infile1
and
infile2,
placing the output in
outfile
and using the first two characters of the second field as the sort key:
sort -r -o outfile -k 2.1,2.2 infile1 infile2 Sort, in reverse order, the contents of
infile1
and
infile2,
using the first non-blank character of the fourth field
as the sort key:
sort -r -k 4.1b,4.1b infile1 infile2 Print the password file
(/etc/passwd)
sorted by numeric user
ID
(the third colon-separated field):
sort -t: -k 3n,3 /etc/passwd Print the lines of the presorted file
infile,
suppressing all but the first occurrence of lines
having the same third field:
DIAGNOSTICSsort
exits with one of the following values:
- 0
All input files were output successfully, or
-c
was specified and the input file was correctly presorted. - 1
Under the
-c
option, the file was not ordered as specified, or if the
-c
and
-u
options were both specified, two input lines were found with equal keys.
This exit status is not returned if the
-c
option is not used. - >1
An error occurred such as when one or more input lines are too long.
When the last line of an input file is missing a new-line character,
sort
appends one, prints a warning message, and continues. If an error occurs when accessing the tables
that contain the collation rules for the specified language,
sort
prints a warning message and defaults to the
POSIX
locale. If a
-d,
-f,
or
-i
option is specified for a language with multi-byte characters,
sort
prints a warning message and ignores the option. WARNINGSNumbering of fields and characters within fields
(-k
option) has changed to conform to the
POSIX
standard.
Beginning at
HP-UX
Release 9.0, the
-k
option numbers fields and characters within fields, starting with
1.
Prior to
HP-UX
Release 9.0, numbering started at
0. A field separator specified by the
-t
option is recognized only if it is a single-byte character. The character type classification categories
alpha,
digit,
space,
and
print
are not defined for multi-byte characters.
For languages with multi-byte characters,
all characters are significant in comparisons. For
non-text
input files, the behaviour is undefined. AUTHORsort
was developed by OSF and HP. FILES/var/tmp/stm???
/tmp/stm??? STANDARDS CONFORMANCEsort: SVID2, SVID3, XPG2, XPG3, XPG4, POSIX.2
|