print the first k elements of each group within a comma separated file

Suppose I am given the following csv file:
# input.csv
a,1
a,2
a,3
b,5
b,6
b,7

and I want to produce the first 2 elements of each group labeled by column 1.
# output.csv
a,1
a,2
b,5
b,6

This is not rocket science, but also not as straight forward as say sorting or uniquing in bash. In the special case where k = 1, one could use the following sort syntax:
sort -t, -k1,1 u input.csv > output.csv
However I have been under the impression there is no straightforward way to go beyond k = 1. Today I found a way using awk:
awk ‘!(a[$1]++ > (‘$((k-1))’))’ input.csv > output.csv

thanks to this article:
http://www.theunixschool.com/2012/06/awk-10-examples-to-group-data-in-csv-or.html

Advertisements

About aquazorcarson

math PhD at Stanford, studying probability
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s