Advertisement
Advertisement


Select unique or distinct values from a list in UNIX shell script


Question

I have a ksh script that returns a long list of values, newline separated, and I want to see only the unique/distinct values. It is possible to do this?

For example, say my output is file suffixes in a directory:

tar
gz
java
gz
java
tar
class
class

I want to see a list like:

tar
gz
java
class
2009/03/06
1
243
3/6/2009 10:33:38 AM

Accepted Answer

You might want to look at the uniq and sort applications.

./yourscript.ksh | sort | uniq

(FYI, yes, the sort is necessary in this command line, uniq only strips duplicate lines that are immediately after each other)

EDIT:

Contrary to what has been posted by Aaron Digulla in relation to uniq's commandline options:

Given the following input:

class
jar
jar
jar
bin
bin
java

uniq will output all lines exactly once:

class
jar
bin
java

uniq -d will output all lines that appear more than once, and it will print them once:

jar
bin

uniq -u will output all lines that appear exactly once, and it will print them once:

class
java
2017/05/23
441
5/23/2017 10:31:38 AM

./script.sh | sort -u

This is the same as monoxide's answer, but a bit more concise.

2017/05/23

With zsh you can do this:

% cat infile 
tar
more than one word
gz
java
gz
java
tar
class
class
zsh-5.0.0[t]% print -l "${(fu)$(<infile)}"
tar
more than one word
gz
java
class

Or you can use AWK:

% awk '!_[$0]++' infile    
tar
more than one word
gz
java
class
2019/11/19

For larger data sets where sorting may not be desirable, you can also use the following perl script:

./yourscript.ksh | perl -ne 'if (!defined $x{$_}) { print $_; $x{$_} = 1; }'

This basically just remembers every line output so that it doesn't output it again.

It has the advantage over the "sort | uniq" solution in that there's no sorting required up front.

2009/03/06

Pipe them through sort and uniq. This removes all duplicates.

uniq -d gives only the duplicates, uniq -u gives only the unique ones (strips duplicates).

2015/05/29

With AWK you can do, I find it faster than sort

 ./yourscript.ksh | awk '!a[$0]++'
2017/05/22

Source: https://stackoverflow.com/questions/618378
Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Email: [email protected]