Skip to content
Binary Inspection [strings, xxd, base64]

Binary Inspection [strings, xxd, base64]


Introduction

When you’re handed a compiled binary, an encoded string, or a raw data blob, you need tools that can peer inside without executing it. Three essential utilities for this work are strings, xxd, and base64. They allow you to extract readable text, inspect raw hexadecimal, and encode or decode data streams — all without leaving the safety of your terminal.

This note covers their usage in depth, with practical patterns drawn from forensics work and CTF challenges like OverTheWire Bandit.


strings — ASCII Extraction

strings scans a file byte by byte and prints any sequence of printable characters that meets a minimum length. It doesn’t parse structure — it just finds human-readable text buried in binary noise.

strings suspicious_binary

Filtering Significant Sequences

By default, strings prints sequences of four or more printable characters. Use -n to raise the threshold and filter out short, meaningless fragments:

strings -n 10 binary_file        # Only sequences of 10+ characters

This is particularly useful when scanning compiled binaries where short strings (like ELF, /lib, ld.) are structural noise rather than meaningful content.

Selecting the Character Encoding

Different binaries use different text encodings. strings defaults to the system’s native encoding but can be told to look for others:

strings -e l binary_file          # 16-bit little-endian (UTF-16LE)
strings -e b binary_file          # 16-bit big-endian (UTF-16BE)
strings -e L binary_file          # 32-bit little-endian (UTF-32LE)

This matters when extracting strings from Windows executables, Java class files, or firmware images where non-ASCII encodings are common.

Combining with Other Tools

strings is rarely used alone. Pipe its output into grep to search for specific patterns:

strings binary_file | grep -i password
strings binary_file | grep -E "flag\{.*\}"     # CTF flag pattern
strings binary_file | grep -E "https?://"       # URLs embedded in binary

Practical Applications

Use Case Command Pattern
Find hardcoded credentials strings binary | grep -iE "pass|key|token|secret"
Extract URLs or IPs strings binary | grep -E "https?|[0-9]+\.[0-9]+\.[0-9]+"
Find embedded config paths strings binary | grep -E "^/|\.conf|\.cfg|\.ini"
Hunt for flags in CTFs strings binary | grep -E "flag\{.*\}|FLAG|CTF"
Scan for email addresses strings binary | grep -E "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"

Key Flags

Flag Description
-n <length> Minimum string length (default: 4)
-e <encoding> Character encoding (l = little-endian, b = big-endian, L = 32-bit LE)
-o Print the offset (in octal) of each string within the file
-t x Print offsets in hexadecimal instead of octal
--radix=x Print offsets in hexadecimal

Forensics discipline: Run strings before attempting to execute an unknown binary. It’s passive — it reads without modifying the file or triggering any embedded behaviour.


xxd — Hexadecimal Manipulation

xxd creates a hex dump of a file, giving you a byte-by-byte view of its raw content. It can also reverse a hex dump back into its original binary form.

Generating a Hex Dump

xxd data.bin

The default output shows three columns: the byte offset (address), the hex values in groups of two bytes, and the ASCII representation:

00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
00000010: 0300 3e00 0100 0000 4010 0000 0000 0000  ..>.....@.......

The rightmost column is invaluable — it highlights readable strings embedded in the binary data, letting you spot text fragments without running strings.

Limiting Output

For large files, you rarely need the entire dump:

xxd -l 256 data.bin              # First 256 bytes only
xxd -s 0x100 -l 128 data.bin    # Start at offset 0x100, read 128 bytes

Plain Hex Output

When you need raw hex without address offsets or ASCII columns:

xxd -p data.bin

This outputs a continuous stream of hex characters — useful when piping into other tools or when the address offsets are unnecessary noise.

Reversing a Hex Dump

xxd can reconstruct binary data from a hex dump:

xxd -r dump.txt restored.bin       # Standard hex dump back to binary
xxd -r -p plain_hex.txt restored.bin   # Plain hex (no offsets) back to binary

Practical Patterns

Inspect a file’s header to identify its type:

xxd -l 16 unknown_file

Common magic bytes you’ll recognise:

Hex Sequence File Type
7f 45 4c 46 ELF executable (Linux)
4d 5a PE executable (Windows)
89 50 4e 47 PNG image
50 4b 03 04 ZIP archive
1f 8b gzip compressed
25 50 44 46 PDF document

Search for a specific byte pattern in a binary:

xxd data.bin | grep "ca fe ba be"    # Java class file magic bytes

Create a hex dump, edit it, and restore:

xxd data.bin > dump.txt              # Dump to text
vim dump.txt                         # Edit specific bytes
xxd -r dump.txt data_modified.bin    # Reconstruct the modified binary

Key Flags

Flag Description
-l <length> Limit output to specified number of bytes
-s <offset> Seek to a specific byte offset before reading
-p Plain hex output — no addresses or ASCII column
-r Reverse — convert hex dump back to binary
-c <cols> Number of hex octets per line (default: 16)
-g <bytes> Group hex octets (default: 2)

When to reach for xxd: Use it when you need to see the raw bytes — file header identification, offset-specific inspection, or manual patching of binary data. If you only need readable strings, strings is faster and cleaner.


base64 — Stream Encoding

base64 encodes binary data into a 64-character ASCII alphabet, making it safe for transport through systems that only handle text (email, JSON, URLs, config files). It also decodes base64 back to its original form.

Encoding

echo "secret message" | base64
# Output: c2VjcmV0IG1lc3NhZ2UK

Or encode a file directly:

base64 data.bin > data.b64

Decoding

echo "c2VjcmV0IG1lc3NhZ2UK" | base64 -d
# Output: secret message

Or decode from a file:

base64 -d encoded.b64 > original.bin

Bandit-Style Patterns

In OverTheWire Bandit, base64 files appear frequently. The standard workflow:

cat encoded.txt | base64 -d        # Decode file contents to stdout
base64 -d < encoded.txt            # Same result, using redirection
base64 -d encoded.txt              # Direct file argument (also works)

Handling Errors

When decoding fails, it’s almost always one of these:

Whitespace or non-base64 characters in the input:

base64 -d -i corrupted.b64         # -i ignores non-alphabet characters

Missing padding: Base64 requires input length to be a multiple of 4. If padding (= characters) is stripped — common in URLs — add it back manually:

# "c2VjcmV0" is missing padding; "c2VjcmV0==" is correct
echo "c2VjcmV0==" | base64 -d

Encoding detection: If you’re unsure whether something is base64, look for the telltale signs: the character set is limited to A-Z, a-z, 0-9, +, /, and = for padding. Length is always a multiple of 4 (with padding).

Key Flags

Flag Description
-d Decode mode
-i Ignore non-alphabet characters during decoding
-w 0 Disable line wrapping during encoding (useful for pipes and scripts)
-w 76 Wrap encoded output at 76 characters (default)

Practical note: When piping base64 output into other tools or scripts, always use -w 0 to prevent line breaks from interfering:

base64 -w 0 data.bin | curl -X POST -d @- https://example.com/upload

Tool Comparison

Tool Input Output Primary Use
strings Binary or any file Human-readable text sequences Find embedded text in binaries
xxd Any file Hex dump (or binary from hex) Inspect raw bytes, identify file types, patch data
base64 Binary or text ASCII-encoded data (or decoded binary) Transport-safe encoding, decode CTF challenges

Chaining Them Together

These tools complement each other. A typical forensic triage workflow:

# Step 1: Quick check for readable content
strings -n 8 unknown.bin | head -20

# Step 2: Inspect the file header
xxd -l 32 unknown.bin

# Step 3: If it's base64-encoded, decode and re-inspect
base64 -d unknown.b64 > decoded.bin
strings decoded.bin | grep -iE "password\|key\|flag"

# Step 4: Hex dump for deeper analysis
xxd decoded.bin | less