How do i use comm to compare two sorted files in linux?



  • Man page for comm.

    comm two sorted files line by line.

    Typing the command without any options creates a three column output.

    • Column one contains lines unique to FILE1
    • Column two contains lines unique to FILE2
    • Column three contains lines common to both files.

    Sample files used.

    $ cat fileA
    Flying by the seat of my pants.
    Surfing by the seat of my pants.
    Rowing by the seat of my pants.
    
    $ cat fileB
    Flying by the seat of my pants.
    Surfing by the seat of my pants.
    Swimming by the seat of my pants.
    

    You can either sort the files before using the sort command or use sort on the command line and pass the input to comm.

    $ comm <(sort fileA) <(sort fileB)
    		Flying by the seat of my pants
    Rowing by the seat of my pants
    		Surfing by the seat of my pants
    

    Use options

    • -1 supress column 1 - print lines unique to fileA
    • -2 supress column 2 - print lines unique to fileB
    • -3 supress column 3 - print lines that appear in both fileA and fileB

    Think of it as the number you do not use, the other two number is what you get.

    Using -1 produces lines that are in both files and files unique to fileB

    $ comm -1 <(sort fileA) <(sort fileB)
    	Flying by the seat of my pants.
    	Surfing by the seat of my pants.
    Swimming by the seat of my pants.
    

    We suppressed column 1 and therefore we get lines that are in both fileA & fileB and
    lines that are only in fileB

    Another example, lines that appear in to both files.

    $ comm -12 <(sort fileA) <(sort fileB)
    Flying by the seat of my pants.
    Surfing by the seat of my pants.
    

    Lines that only appear in fileA.

    $ comm -23 <(sort fileA) <(sort fileB)
    Rowing by the seat of my pants.
    

    Lines that only appear in fileB.

    $ comm -13 <(sort fileA) <(sort fileB)
    Swimming by the seat of my pants.
    


© Lightnetics 2024