Skip to content
Linux Text Processing & File Reading

Linux Text Processing & File Reading

Introduction

Reading and extracting meaning from text is one of the most common tasks on a Linux system — log files, configuration, user databases, output streams. This note covers the core toolchain: searching with grep and find, navigating files with less, head, and tail, and slicing structured data with cut, awk, and sed.


Searching Content with grep

grep filters input line by line, returning only lines that match a pattern. It reads from files, pipes, or stdin.

grep aliases /etc/bash.bashrc        # Search a single file
grep target data.py                  # Search for "target" in data.py
grep -i firewall anaconda-ks.cfg    # Case-insensitive search

Searching Across Directories

By default, grep operates on the files you specify — it does not recurse into subdirectories unless told to.

grep -i firewall *          # All files in the current directory
grep -iR firewall *         # Recurse into subdirectories

Inverting Matches

The -v flag flips the logic — it shows every line that does not match the pattern.

grep -vi firewall anaconda-ks.cfg   # Everything except lines containing "firewall"

Flag Reference

Flag Description
-i Case-insensitive matching
-v Invert — show non-matching lines only
-l List filenames only, suppress matched content
-n Prefix each match with its line number
-R Recursive — search through subdirectories
-a Force text processing on binary files

Security note: Use grep -a on binary or unknown files. Without it, grep aborts when it hits a null byte — useful when triaging suspicious uploads or packed payloads.


Locating Files with find

find is a recursive search engine for the filesystem itself — it locates files by metadata rather than content.

find /var/log -name "*.log"              # All .log files under /var/log
find . -type f -size 1033c               # Files exactly 1033 bytes
find /home -user vagrant -type f         # Files owned by vagrant
find . -perm -4000                       # Files with the SUID bit set

Combining with Other Commands

find can pass its results directly into another command using -exec:

find . -type f -size 1033c ! -executable -exec file {} +

This finds non-executable files of a specific size and pipes each one into file for signature analysis.

Performance note: Ending an -exec chain with + instead of \; batches results into a single process invocation — significantly faster on large directory trees.

Flag Reference

Flag Description
-type f Restrict results to regular files
-name Match by filename pattern
-size Filter by size (c = bytes, k = kilobytes)
-user Filter by file owner
-perm Filter by octal permissions
-exec Run a command against each result

Reading Files

less — Scrollable Viewer

Unlike cat, which dumps content and exits, less opens a scrollable view. You can navigate with arrow keys and search with /pattern — the same syntax as Vim.

less /var/log/syslog

more — Paginated Viewer

more is the older cousin of less. It displays content one screen at a time, advancing with Enter instead of arrow keys. A progress percentage is shown at the bottom.

more /var/log/syslog

In practice, less has superseded more on nearly every modern system. Use less unless you’re on a minimal environment where it isn’t available.


Previewing File Boundaries

head — First Lines

head shows the beginning of a file. By default, it reads the first ten lines.

head /etc/passwd              # First 10 lines (default)
head -20 /etc/passwd          # First 20 lines

tail — Last Lines

tail does the opposite — it reads from the end of the file. Its standout feature is follow mode, which streams new lines as they’re appended.

tail /var/log/auth.log         # Last 10 lines (default)
tail -2 /var/log/auth.log      # Last 2 lines
tail -f /var/log/auth.log      # Live — follow new lines as they appear

tail -f is indispensable when monitoring logs in real time — authentication attempts, service failures, or live traffic events.


Extracting Structured Data

cut — Field Extraction

cut splits each line by a delimiter and extracts specific fields. It’s lightweight and fast for simple columnar data.

Consider /etc/passwd — each line is colon-delimited:

root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
vagrant:x:1000:1000:vagrant:/home/vagrant:/bin/bash

To extract only usernames (field one):

cut -d: -f1 /etc/passwd
root
daemon
vagrant

The -d flag sets the delimiter, -f selects which field to extract.

awk — Pattern Processing

awk does everything cut can, and significantly more. It treats each line as a record with positional fields and supports conditional logic, formatting, and arithmetic.

awk -F ':' '{print $1}' /etc/passwd       # Same result as the cut example
awk -F ':' '{print $1, $7}' /etc/passwd   # Username and shell
Variable Meaning
$0 The entire line
$1 First field
$2 Second field
$n nth field

Where cut is sufficient, prefer it — it’s simpler and faster. Reach for awk when you need filtering, formatting, or multi-field logic in a single pass.

sed — Stream Editing

sed performs find-and-replace (and much more) on input streams. It follows the same substitution syntax as Vim.

sed 's/old/new/g' file.txt                    # Replace all occurrences, print to stdout
sed -i 's/old/new/g' file.txt                 # Edit the file in-place
sed -i 's/old/new/g' *.txt                    # All .txt files in the directory

Insert text at a specific line number:

sed -i '2i\This is the inserted text' file.txt   # Insert at line 2

Caution with -i: In-place editing overwrites the original file. On macOS/BSD sed, -i requires an extension argument (sed -i '' ...). Always test without -i first.


Command Cheat Sheet

Command Purpose Key Strength
grep Search file content by pattern Fast pattern matching across files
find Locate files by metadata Recursive, attribute-based search
less Scrollable file viewer Navigation and search within files
head Preview file start Quick look at headers or config
tail Preview file end Real-time log monitoring with -f
cut Extract fields by delimiter Simple columnar extraction
awk Process and transform text Multi-field logic in one pass
sed Stream substitution and editing In-place find-and-replace