Linux Text Processing & File Reading
Introduction
Reading and extracting meaning from text is one of the most common tasks on a Linux system — log files, configuration, user databases, output streams. This note covers the core toolchain: searching with grep and find, navigating files with less, head, and tail, and slicing structured data with cut, awk, and sed.
Searching Content with grep
grep filters input line by line, returning only lines that match a pattern. It reads from files, pipes, or stdin.
grep aliases /etc/bash.bashrc # Search a single file
grep target data.py # Search for "target" in data.py
grep -i firewall anaconda-ks.cfg # Case-insensitive searchSearching Across Directories
By default, grep operates on the files you specify — it does not recurse into subdirectories unless told to.
grep -i firewall * # All files in the current directory
grep -iR firewall * # Recurse into subdirectoriesInverting Matches
The -v flag flips the logic — it shows every line that does not match the pattern.
grep -vi firewall anaconda-ks.cfg # Everything except lines containing "firewall"Flag Reference
| Flag | Description |
|---|---|
-i |
Case-insensitive matching |
-v |
Invert — show non-matching lines only |
-l |
List filenames only, suppress matched content |
-n |
Prefix each match with its line number |
-R |
Recursive — search through subdirectories |
-a |
Force text processing on binary files |
Security note: Use
grep -aon binary or unknown files. Without it,grepaborts when it hits a null byte — useful when triaging suspicious uploads or packed payloads.
Locating Files with find
find is a recursive search engine for the filesystem itself — it locates files by metadata rather than content.
find /var/log -name "*.log" # All .log files under /var/log
find . -type f -size 1033c # Files exactly 1033 bytes
find /home -user vagrant -type f # Files owned by vagrant
find . -perm -4000 # Files with the SUID bit setCombining with Other Commands
find can pass its results directly into another command using -exec:
find . -type f -size 1033c ! -executable -exec file {} +This finds non-executable files of a specific size and pipes each one into file for signature analysis.
Performance note: Ending an
-execchain with+instead of\;batches results into a single process invocation — significantly faster on large directory trees.
Flag Reference
| Flag | Description |
|---|---|
-type f |
Restrict results to regular files |
-name |
Match by filename pattern |
-size |
Filter by size (c = bytes, k = kilobytes) |
-user |
Filter by file owner |
-perm |
Filter by octal permissions |
-exec |
Run a command against each result |
Reading Files
less — Scrollable Viewer
Unlike cat, which dumps content and exits, less opens a scrollable view. You can navigate with arrow keys and search with /pattern — the same syntax as Vim.
less /var/log/syslogmore — Paginated Viewer
more is the older cousin of less. It displays content one screen at a time, advancing with Enter instead of arrow keys. A progress percentage is shown at the bottom.
more /var/log/syslogIn practice,
lesshas supersededmoreon nearly every modern system. Uselessunless you’re on a minimal environment where it isn’t available.
Previewing File Boundaries
head — First Lines
head shows the beginning of a file. By default, it reads the first ten lines.
head /etc/passwd # First 10 lines (default)
head -20 /etc/passwd # First 20 linestail — Last Lines
tail does the opposite — it reads from the end of the file. Its standout feature is follow mode, which streams new lines as they’re appended.
tail /var/log/auth.log # Last 10 lines (default)
tail -2 /var/log/auth.log # Last 2 lines
tail -f /var/log/auth.log # Live — follow new lines as they appeartail -f is indispensable when monitoring logs in real time — authentication attempts, service failures, or live traffic events.
Extracting Structured Data
cut — Field Extraction
cut splits each line by a delimiter and extracts specific fields. It’s lightweight and fast for simple columnar data.
Consider /etc/passwd — each line is colon-delimited:
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
vagrant:x:1000:1000:vagrant:/home/vagrant:/bin/bashTo extract only usernames (field one):
cut -d: -f1 /etc/passwdroot
daemon
vagrantThe -d flag sets the delimiter, -f selects which field to extract.
awk — Pattern Processing
awk does everything cut can, and significantly more. It treats each line as a record with positional fields and supports conditional logic, formatting, and arithmetic.
awk -F ':' '{print $1}' /etc/passwd # Same result as the cut example
awk -F ':' '{print $1, $7}' /etc/passwd # Username and shell| Variable | Meaning |
|---|---|
$0 |
The entire line |
$1 |
First field |
$2 |
Second field |
$n |
nth field |
Where cut is sufficient, prefer it — it’s simpler and faster. Reach for awk when you need filtering, formatting, or multi-field logic in a single pass.
sed — Stream Editing
sed performs find-and-replace (and much more) on input streams. It follows the same substitution syntax as Vim.
sed 's/old/new/g' file.txt # Replace all occurrences, print to stdout
sed -i 's/old/new/g' file.txt # Edit the file in-place
sed -i 's/old/new/g' *.txt # All .txt files in the directoryInsert text at a specific line number:
sed -i '2i\This is the inserted text' file.txt # Insert at line 2Caution with
-i: In-place editing overwrites the original file. On macOS/BSDsed,-irequires an extension argument (sed -i '' ...). Always test without-ifirst.
Command Cheat Sheet
| Command | Purpose | Key Strength |
|---|---|---|
grep |
Search file content by pattern | Fast pattern matching across files |
find |
Locate files by metadata | Recursive, attribute-based search |
less |
Scrollable file viewer | Navigation and search within files |
head |
Preview file start | Quick look at headers or config |
tail |
Preview file end | Real-time log monitoring with -f |
cut |
Extract fields by delimiter | Simple columnar extraction |
awk |
Process and transform text | Multi-field logic in one pass |
sed |
Stream substitution and editing | In-place find-and-replace |