Skip to content

Latest commit

 

History

History
48 lines (40 loc) · 2.09 KB

command_line_tricks.md

File metadata and controls

48 lines (40 loc) · 2.09 KB

Command Line Tricks

Process IDs

  • see all process ids: ps -ef
  • get a specific pid: pidof process_name_from_ps_dash_ef

Data Processing

Create file with random rows for sampling

Go To Specific Lines with less

Merge Text Files with Headers

  • Write header to final_file
    • head -n 1 file_with_header > final_file
  • Append data from other files (skipping the header in them)
    • add a new line if you have to: echo "" >> final_file (you have to test this and can't rely on new line characters in editors)
    • tail -n +2 file_to_append >> final_file
  • Notes:
    • head -n 1 grabs the first line of a file
    • tail -n +2 grabs the second line and every line after it
    • > overwrites the final_file and >> appends to it
  • Process Times
    • it took 10min to write a 30GB file to a final_file on EC2/EBS
    • this file had 65M rows and was a csv with ~150 sparsly populates columns

Delete all lines after match (including matched line)

  • sed -n '/text to match. pipes are ok in here./q;p' file_to_match_in > file_to_write_to
  • this rewrites the entire file to a new file (requiring the full file size in disk space)

Remove new line from end of file

find replace psv to csv

  • sed -i -e 's/|/,/g' alc_sample.csv