When working with big data, it becomes crucial to manage big files efficiently. Printing a single line from a 150GB text file (like SQL, triples, quads or whatever) can be horrible and often requires some thinking, let alone editing files.
Printing Big Files
Using head can work to get the first or the last 5 number of lines:
// first 5 lines head n 5 // last 5 lines head -n 5
Using sed (stream editor) to get any line or any range of lines:
// print line 1500 sed -n '1500p' my_big_file // print line 1500 to 2000 sed -n '1500,2000p' my_big_file
Splitting Big Files
Editing big files can be made doable by splitting the file before editing it and finally concatenating the pieces again.
// split a file and make every piece 1000 lines split -l 1000 my_big_file my_big_file_segment_ // split a file and make every piece 500 MB long split -b 500M my_big_file my_big_file_segment_
Both commands will output my_big_file_segment_aa, my_big_file_segment_ab, and so on.
When splitting files, it can be useful to know how many lines a file has.
wc -l my_big_file
Remove First Line
Handy when removing column names in csv files is removing the first line like this:
sed -i '1d' my_big_file.csv