- Linux Shell Scripting Cookbook(Third Edition)
- Clif Flynt Sarath Lakshman Shantanu Tushar
- 435字
- 2021-07-09 19:46:21
uniq
The uniq command finds the unique lines in a given input (stdin or a filename command line argument) and either reports or removes the duplicated lines.
This command only works with sorted data. Hence, uniq is often used with the sort command.
To produce the unique lines (all lines in the input are printed and duplicate lines are printed once), use this:
$ cat sorted.txt bash foss hack hack $ uniq sorted.txt bash foss hack
Alternatively, use this:
$ sort unsorted.txt | uniq
Display only unique lines (the lines that are not repeated or duplicated in the input file):
$ uniq -u sorted.txt bash foss
Alternatively, use this command:
$ sort unsorted.txt | uniq -u
To count how many times each of the lines appears in the file, use the following command:
$ sort unsorted.txt | uniq -c 1 bash 1 foss 2 hack
To find duplicate lines in the file, use this:
$ sort unsorted.txt | uniq -d hack
To specify keys, we can use a combination of the -s and -w arguments:
- -s: This specifies the number for the first N characters to be skipped
- -w: This specifies the maximum number of characters to be compared
The following example describes using the comparison key as the index for the uniq operation:
$ cat data.txt u:01:gnu d:04:linux u:01:bash u:01:hack
To test only the bold characters (skip the first two characters and use the next two) we use -s 2 to skip the first characters and -w 2 to use the next two:
$ sort data.txt | uniq -s 2 -w 2 d:04:linux u:01:bash
When the output from one command is passed as input to the xargs command, it's best to use a zero-byte terminator for each element of data. Passing output from uniq to xargs is no exception to this rule. If a zero-byte terminator is not used, the default space characters are used to split the arguments in the xargs command. For example, a line with the text this is a line from stdin will be taken as four separate arguments by the xargs command instead of a single line. When a zero-byte terminator, \0, is used as the delimiter character, the full line including spaces is interpreted as a single argument.
The -z option generates zero-byte-terminated output:
$ uniq -z file.txt
This command removes all the files, with filenames read from files.txt:
$ uniq -z file.txt | xargs -0 rm
If a filename appears multiple time, the uniq command writes the filename only once to stdout, thus avoiding a rm: cannot remove FILENAME: No such file or directory error.