How to Compare Text Files: Diff Tools and Techniques
Comparing text is a daily task for developers. Whether reviewing code changes, debugging configuration drift, or merging documents, understanding diff output is essential. This guide covers the algorithms, tools, and techniques for effective text comparison.
Understanding Diff Output
The classic unified diff format shows changes between two files:
--- original.txt
+++ modified.txt
@@ -1,5 +1,6 @@
Line 1: unchanged
-Line 2: removed text
+Line 2: modified text
+Line 2.5: added line
Line 3: unchanged
Line 4: unchanged
-Line 5: also removed
- Lines starting with
-were removed (from the original) - Lines starting with
+were added (in the modified version) - Lines starting with
(space) are unchanged context @@markers show the line numbers affected
Diff Algorithms
Myers Algorithm
The default algorithm used by Git and most diff tools. It finds the shortest edit script β the minimum number of insertions and deletions to transform one file into another. It produces clean, readable diffs for most content.
Patience Diff
Better for structured text like source code. Instead of finding the shortest edit, it first matches unique lines that appear in both files, then diffs the sections between them. This often produces more meaningful diffs that align with logical code blocks.
git diff --patience
Histogram Diff
An improvement on patience diff, used as Git's default since version 2.x. It handles repeated lines better and produces cleaner output for files with significant structural changes.
Comparing Text Online
For quick comparisons without installing tools, our Text Diff Checker provides side-by-side and inline diff views directly in your browser. Paste two texts, and see changes highlighted instantly β all processing happens locally.
Command-Line Diff Tools
diff (POSIX)
The classic Unix tool:
# Unified format (most readable)
diff -u file1.txt file2.txt
# Side-by-side
diff -y file1.txt file2.txt
# Ignore whitespace
diff -w file1.txt file2.txt
# Recursive directory comparison
diff -r dir1/ dir2/
git diff
Even outside a Git repository, git diff provides superior output:
# Compare two files
git diff --no-index file1.txt file2.txt
# Word-level diff (highlights changed words, not whole lines)
git diff --word-diff
# Stat summary (files changed, insertions, deletions)
git diff --stat
colordiff / delta
For colored terminal output:
# colordiff: drop-in replacement for diff
colordiff file1.txt file2.txt
# delta: modern diff viewer for Git
git diff | delta
Diff for Code Review
Effective code review depends on readable diffs. Here are techniques to improve diff quality:
1. Keep Commits Focused
Large diffs spanning hundreds of lines are hard to review. Each commit should address one concern:
- Separate formatting changes from logic changes
- Split large refactors into incremental steps
- Move files in one commit, modify them in another
2. Use Word-Level Diff
Line-level diffs hide the actual change when a line has a small modification buried in a long string:
# Shows only the changed words, not entire lines
git diff --word-diff
3. Ignore Whitespace in Reviews
Formatting changes add noise to meaningful diffs:
git diff -w # Ignore all whitespace changes
git diff -b # Ignore whitespace amount changes
4. Review with Context
More context lines help understand the surrounding code:
git diff -U10 # Show 10 lines of context (default is 3)
Handling Merge Conflicts
When Git encounters conflicting changes, it marks the conflict in the file:
<<<<<<< HEAD
const timeout = 5000;
=======
const timeout = 10000;
>>>>>>> feature-branch
To resolve:
- Understand both changes β why was each made?
- Decide which version to keep, or combine them
- Remove the conflict markers
- Test the result
For complex merges, use a three-way merge tool that shows the common ancestor alongside both versions.
Comparing Non-Text Content
JSON Diff
Standard text diff struggles with JSON because key order and formatting changes create noise. Semantic JSON diff compares the actual data structure. Check our JSON Diff tool for structural comparison.
CSV Diff
Tabular data needs column-aware comparison. Standard diff treats each row as a string, missing cell-level changes.
Binary Files
Diff cannot meaningfully compare binary files. For images, use visual diff tools. For documents, convert to text first or use format-specific comparison tools.
Diff in Automation
CI/CD Pipelines
Use diff to verify expected output in tests:
command_under_test > actual_output.txt
diff expected_output.txt actual_output.txt
# Exit code 0 = identical, 1 = different
Configuration Drift Detection
Compare production config against the expected state:
diff deployed_config.yaml expected_config.yaml
Documentation Change Tracking
Track changes in documentation for review:
git diff --stat HEAD~5..HEAD -- docs/
FAQ
What does "hunk" mean in diff output?
A hunk is a contiguous block of changes in a diff. Each @@ line starts a new hunk. Git groups nearby changes into single hunks β if two changes are within 3 lines of each other (the default context), they appear in the same hunk. Hunks can be staged independently using git add -p.
How do I compare two branches in Git?
Use git diff branch1..branch2 to see all differences between two branches. Add --stat for a summary, or -- path/to/file to compare a specific file. For comparing what a branch has added since it diverged, use three dots: git diff branch1...branch2.
Related Resources
- Text Diff Checker β Compare text side-by-side in your browser
- JSON Diff Debugging Guide β Structural comparison for JSON data
- Regex Cheat Sheet β Pattern matching for finding specific changes