SORT(1) | General Commands Manual | SORT(1) |
sort
— sort,
merge, or sequence check text and binary files
sort |
[-bCcdfgHhiMmnRrsuVz ]
[-k
field1[,field2]]
[-o output]
[-S size]
[-T dir]
[-t char]
[file ...] |
The sort
utility sorts text and binary
files by lines. A line is a record separated from the subsequent record by a
newline (default) or NUL ‘\0
’
character (-z
option). A record can contain any
printable or unprintable characters. Comparisons are based on one or more
sort keys extracted from each line of input, and are performed
lexicographically, according to the specified command-line options that can
tune the actual sorting behavior. By default, if keys are not given,
sort
uses entire lines for comparison.
If no file is specified, or if file is ‘-’, the standard input is used.
The options are as follows:
-C
,
--check
=silent
|quiet
-c
,
--check
-C
, but additionally write a message to
stderr
if the input file is not sorted.-m
,
--merge
-o
output,
--output
=output-S
size,
--buffer-size
=sizesort
may use up to about 90% of
available memory. If the input is too big to fit into the memory buffer,
temporary files are used.-s
-T
dir,
--temporary-directory
=dirTMPDIR
or /tmp if
TMPDIR
is not defined.-u
,
--unique
-C
or -c
,
sort
also checks that there are no lines with
duplicate keys.The following options override the default ordering rules. If
ordering options appear before the first -k
option,
they apply globally to all sort keys. When attached to a specific key (see
-k
), the ordering options override all global
ordering options for that key. Note that the ordering options intended to
apply globally should not appear after -k
or results
may be unexpected.
-d
,
--dictionary-order
-f
,
--ignore-case
-g
,
--general-numeric-sort
,
--sort
=general-numeric
-n
,
this option handles general floating points. It has a more permissive
format than that allowed by -n
but it has a
significant performance drawback.-h
,
--human-numeric-sort
,
--sort
=human-numeric
-h
or -H
options
(human-readable).-i
,
--ignore-nonprinting
-M
,
--month-sort
,
--sort
=month
-n
,
--numeric-sort
,
--sort
=numeric
-R
,
--random-sort
,
--sort
=random
--random-source
.
If multiple sort fields are specified, the same random hash function is
used for all of them.-r
,
--reverse
-V
,
--version-sort
For example:
$ ls sort* | sort -V sort-1.022.tgz sort-1.23.tgz sort-1.23.1.tgz sort-1.024.tgz sort-1.024.003. sort-1.024.003.tgz sort-1.024.07.tgz sort-1.024.009.tgz
The treatment of field separators can be altered using these options:
-b
,
--ignore-leading-blanks
-k
). If
-b
is specified before the first
-k
option, it applies globally to all key
specifications. Otherwise, -b
can be attached
independently to each field argument of the key
specifications. Note that -b
should not appear
after -k
, and that it has no effect unless key
fields are specified.-k
field1[,field2],
--key
=field1[,field2]-k
option may be specified multiple times, in
which case subsequent keys are compared after earlier keys compare equal.
The -k
option replaces the obsolete options
+
pos1 and
-
pos2, but the old notation
is also supported.-t
char,
--field-separator
=char-t
is not specified, the default field
separator is a sequence of blank-space characters, and consecutive blank
spaces do
not
delimit an empty field; further, the initial blank space
is
considered part of a field when determining key offsets. To use NUL as
field separator, use -t
'\0'.-z
,
--zero-terminated
\0
’) is used as the record
separator character.Other options:
--batch-size
=numsort
at once. This option affects behavior when
having many input files or using temporary files. The minimum value is 2.
The default value is 16.--compress-program
=program-d
option, it must decompress standard input to
standard output. If program fails,
sort
will exit with an error. The
compress(1) and
gzip(1) utilities meet these
requirements.--debug
--files0-from
=filename--heapsort
-u
and
-s
.--help
-H
,
--mergesort
--mmap
--qsort
-u
and
-s
.--radixsort
--random-source
=filename--version
A field is defined as a maximal sequence of characters other than
the field separator and record separator (newline by default). Initial blank
spaces are included in the field unless -b
has been
specified; the first blank space of a sequence of blank spaces acts as the
field separator and is included in the field (unless
-t
is specified). For example, by default all blank
spaces at the beginning of a line are considered to be part of the first
field.
Fields are specified by the -k
field1[,field2] option. If
field2 is missing, the end of the key defaults to the
end of the line.
The arguments field1 and
field2 have the form m.n
(m,n > 0) and can
be followed by one or more of the modifiers b
,
d
, f
,
i
, n
,
g
, M
and
r
, which correspond to the options discussed above.
When b
is specified, it applies only to
field1 or field2 where it is
specified while the rest of the modifiers apply to the whole key field
regardless if they are specified only with field1 or
field2 or both. A field1
position specified by m.n is interpreted as the
nth character from the beginning of the
mth field. A missing .n in
field1 means
‘.1
’, indicating the first character
of the mth field; if the -b
option
is in effect, n is counted from the first non-blank
character in the mth field; m.1b refers
to the first non-blank character in the mth field.
1.n refers to the
nth character from the beginning of the line; if
n is greater than the length of the line, the field is
taken to be empty.
nth positions are always counted from the field beginning, even if the field is shorter than the number of specified positions. Thus, the key can really start from a position in a subsequent field.
A field2 position specified by
m.n is interpreted as the nth character
(including separators) from the beginning of the mth
field. A missing .n indicates the last character of the
mth field; m = 0 designates the end of a
line. Thus the option -k
v.x,w.y is synonymous with the obsolete option
+
v-1.x-1
-
w-1.y; when
y is omitted,
-k
v.x,w is synonymous with
+
v-1.x-1
-
w.0. The obsolete
+
pos1
-
pos2 option is still
supported, except for -
w.0b,
which has no -k
equivalent.
TMPDIR
TMPDIR
may be overridden by the
-T
option.The sort
utility exits with one of the
following values:
-C
or -c
, the input file
already met the sorting criteria.-C
or
-c
options.The sort
utility is compliant with the
IEEE Std 1003.1-2008 (“POSIX.1”)
specification, except that it ignores the user's
locale(1) and always assumes
LC_ALL
=C.
The flags [-gHhiMRSsTVz
] are extensions to
that specification.
All long options are extensions to the specification. Some are
provided for compatibility with GNU sort
, others are
specific to this implementation.
Some implementations of sort
honor the
-b
option even when no key fields are specified.
This implementation follows historic practice and IEEE Std
1003.1-2008 (“POSIX.1”) in only honoring
-b
when it precedes a key field.
The historic practice of allowing the -o
option to appear after the file is supported for
compatibility with older versions of sort
.
The historic key notations
+
pos1 and
-
pos2 are supported for
compatibility with older versions of sort
but their
use is highly discouraged.
A sort
command appeared in
Version 1 AT&T UNIX.
Gabor Kovesdan
<gabor@FreeBSD.org>
Oleg Moskalenko
<mom040267@gmail.com>
This implementation of sort
has no limits
on input line length (other than imposed by available memory) or any
restrictions on bytes allowed within lines.
The performance depends highly on efficient choice of sort keys
and key complexity. The fastest sort is on whole lines, with option
-s
. For the key specification, the simpler to
process the lines the faster the search will be.
When sorting by arithmetic value, using -n
results in much better performance than -g
so its
use is encouraged whenever possible.
March 31, 2022 | OpenBSD-current |