8  Data Manipulation

This guide covers essential commands for handling text data in files, including searching, sorting, and saving command outputs. These tools are invaluable for data analysis and scripting in Unix-like environments.



8.2 Regular Expressions

Regular expressions enhance grep’s searching capabilities. Use the -E option for extended regex support:

  • .: Matches any single character.
  • ^: Matches the start of a line.
  • $: Matches the end of a line.
  • [ ]: Matches any character within the brackets.
  • ?: The preceding element is optional.
  • *: The preceding element can appear zero or more times.
  • +: The preceding element must appear one or more times.
  • |: Logical OR operator between expressions.

Example: Search for multiple words.

grep -E 'word1|word2|word3' myfile.txt

8.3 Sorting File Lines

  • Ascending Order: Default sorting.

    sort myfile.txt
  • Descending Order: Reverse the sort order.

    sort -r myfile.txt
  • Random Order: Shuffle lines.

    sort -R myfile.txt
  • Numeric Sort: Treat comparisons as numerical.

    sort -n myfile.txt
  • Saving Output: Use -o to save the sorted result to a file.

    sort -o sorted_file.txt myfile.txt

8.4 Counting Text Elements

  • Basic Count: Displays line, word, and byte counts.

    wc myfile.txt
  • Lines Only: Count the number of lines.

    wc -l myfile.txt
  • Words Only: Count the number of words.

    wc -w myfile.txt
  • Bytes Only: Count the number of bytes.

    wc -c myfile.txt
  • Characters Only: Count the number of characters.

    wc -m myfile.txt

8.5 Removing Duplicates with uniq

  • Basic Usage: Filter out adjacent duplicate lines.

    uniq myfile.txt
  • Saving Output: Redirect the output to a new file.

    uniq myfile.txt > result.txt
  • Count Occurrences: Prefix lines by their occurrence counts.

    uniq -c myfile.txt
  • Show Duplicates Only: Display only the repeated lines.

    uniq -d myfile.txt

8.6 Extracting Columns with cut

For files with delimited columns, cut allows you to extract specific fields:

  • Specify Delimiter: Use -d to define the column delimiter.
  • Select Columns: -f selects the columns to extract.

Examples:

# Extract columns 1 to 3
cut -d ',' -f 1-3 myfile.txt

# Extract from column 3 onwards
cut -d ',' -f 3- myfile.txt

8.7 Redirection and Pipes

  • Standard Output to File (>): Create or overwrite a file with the command output.

    grep "myword" myfile.txt > result.txt
  • Append to File (>>): Add the command output to the end of an existing file.

    grep "myword" myfile.txt >> result.txt
  • Standard Error to File (2>): Redirect error messages to a file.

    grep "myword" myfile.txt 2> error.log
  • Combine Output and Errors (2>&1): Direct both standard output and errors to

the same file. bash grep "myword" myfile.txt > result.txt 2>&1

  • Pipes (|): Use the output of one command as input to another.

    grep "myword" myfile.txt | sort

8.8 Viewing File Contents

To display the contents of a file directly in the terminal:

cat myfile.txt

This command prints the entire content of myfile.txt to the screen.

8.9 Interactive Terminal Input

For interactive input, especially useful for commands like sort, you can use the here document syntax:

sort -n << END

After executing this command, you can type in the words or lines you wish to sort. Each line you enter will be considered for sorting. Once you’re done, type END to indicate the completion of input and perform the sorting operation.

8.10 Conclusion

These commands form the foundation of text processing and data manipulation in Unix-like systems, enabling efficient analysis and transformation of data.