B1: Terminal Basics
Your computer has a command line — and it’s more powerful than you think
Learning Objectives
By the end of this module, you should be able to:
- Explain why economists use the command line and how it supports reproducible research
- Navigate your computer’s filesystem using terminal commands (
pwd,ls,cd) - Read, search, and manipulate files from the command line (
cat,head,grep,wc) - Combine commands using pipes and redirection to answer data questions without opening a spreadsheet
- Use the terminal confidently enough to follow along with Git, Stata batch mode, and other research tools
Why Economists Need the Terminal
You already have Finder (Mac) or File Explorer (Windows). You can point, click, drag, and drop. Why learn a text-based interface from the 1970s?
Three reasons:
1. Reproducibility. If you cleaned a dataset by clicking through menus, can you reproduce exactly what you did six months later? A sequence of terminal commands (or a script) is a precise, replayable record of every step.
2. Automation. Renaming 200 files, running the same Stata do-file on 50 datasets, or checking whether every CSV in a replication package has the right number of columns — these are trivial on the command line and painful with a mouse.
3. Access to power tools. Git, SSH, cloud computing, package managers, and many research tools either require or work best through the terminal. If you plan to do any computational work beyond Excel, you will encounter the command line.
Think of the terminal as Stata’s command window for your entire computer. In Stata, you could do everything through the menus — but you don’t, because typing commands is faster, more precise, and reproducible. The terminal is the same idea, applied to your whole filesystem instead of just your data.
Opening Your Terminal
Mac
You already have a terminal. Open Terminal.app (find it in Applications > Utilities, or hit Cmd + Space and type “Terminal”).
You’ll see something like this:
emilys-macbook:~ emily$
That’s your prompt. It’s waiting for you to type something. The ~ means you’re in your home directory (more on that shortly).
Many people eventually switch to iTerm2, a more feature-rich terminal for Mac. It’s free and excellent. But the built-in Terminal.app works fine for everything in this module.
Windows
Windows doesn’t natively use the same commands we’ll cover here (it has its own system called PowerShell). You have two main options:
| Option | What It Is | Best For |
|---|---|---|
| Git Bash | Comes with Git for Windows; provides a Unix-like terminal | Quick setup, light use |
| WSL (Windows Subsystem for Linux) | A full Linux environment inside Windows | Serious work, long-term use |
For this course, Git Bash is sufficient. If you plan to do more computational work, WSL is worth the setup time.
Reading Files
You don’t need to open a file in an application to see what’s in it. The terminal gives you several ways to peek at file contents.
See the whole file (cat)
cat (short for “concatenate”) prints the entire contents of a file to your screen.
$ cat README.md
# My Research Project
This project estimates the effect of...Good for short files. For long files, your screen will fill with text before you can read it.
See just the beginning or end (head, tail)
$ head data.csv # first 10 lines (default)
$ head -n 5 data.csv # first 5 lines
$ tail -n 20 results.log # last 20 lineshead is invaluable for checking the structure of a CSV file without loading it:
$ head -n 3 household_survey.csv
hhid,district,treatment,income,n_children
1001,Nairobi,1,45000,3
1002,Mombasa,0,32000,1Now you know the variable names and delimiter without opening Excel or Stata.
Page through a file (less)
less lets you scroll through a file interactively:
$ less analysis.log- Use arrow keys or
j/kto scroll - Press
Spacefor the next page - Press
/then type a search term to find text - Press
qto quit
Think of cat as list in Stata (dumps everything), head as list in 1/10 (first few observations), and less as the Stata data browser (you can scroll and search). You pick the right tool based on how much you need to see.
Searching: grep
grep is one of the most powerful commands you’ll learn. It searches for patterns in files and returns matching lines.
Basic usage
$ grep "income" analysis.do
gen log_income = ln(income)
reg log_income treatment age education, robust
label var income "Monthly household income (KES)"This found every line in analysis.do that contains the word “income”.
Useful flags
| Flag | What It Does | Example |
|---|---|---|
-i |
Case-insensitive search | grep -i "income" file.do matches “Income”, “INCOME”, etc. |
-n |
Show line numbers | grep -n "regress" analysis.do shows which line each match is on |
-r |
Search recursively through all files in a directory | grep -r "treatment" ./do-files/ |
-l |
Show only file lists (which files contain the match) | grep -rl "robust" . lists all files mentioning “robust” |
-c |
Count matches | grep -c "district" data.csv counts how many rows mention “district” |
Searching a project
Suppose you have a replication package and you want to find every file that uses a particular variable:
$ grep -rn "hh_consumption" ./code/
./code/01_clean.do:45: gen hh_consumption = food_exp + nonfood_exp
./code/02_analysis.do:12: sum hh_consumption, detail
./code/02_analysis.do:31: reg hh_consumption treatment, cluster(village)In seconds, you know exactly where that variable is created and used — across every file in the project. Try doing that by opening files one at a time.
grep "treatment effect" file.do looks for the exact string “treatment effect” (with the space). If you want to search for either “treatment” or “effect” separately, those are two separate searches.
File Operations
Create a directory (mkdir)
$ mkdir replication-package
$ mkdir -p project/data/raw # -p creates parent directories as neededCopy files (cp)
$ cp analysis.do analysis_backup.do # copy a file
$ cp -r code/ code_backup/ # copy a directory (-r = recursive)Move or rename files (mv)
$ mv old_name.do new_name.do # rename a file
$ mv analysis.do ./code/ # move a file to a different folderRemove files (rm)
$ rm temp_file.csv # delete a file
$ rm -r temp_folder/ # delete a directory and everything in itThere is no Trash, no Recycle Bin, no undo. When you rm a file, it is gone. This is not like deleting a file in Finder.
Safety habits:
- Use
lsbeforermto verify you’re targeting the right files - Never run
rm -rf /orrm -rf ~— this would delete your entire filesystem or home directory - Consider
rm -iwhich asks for confirmation before each deletion - When in doubt,
mvto a trash folder instead of deleting
$ ls temp_*.csv # check what matches
temp_data.csv temp_results.csv
$ rm temp_*.csv # now delete (you know what you're removing)Pipes and Redirection
This is where the command line gets genuinely powerful. Pipes let you chain commands together, sending the output of one command as input to the next.
The pipe operator (|)
The | (pipe) takes the output of one command and feeds it into the next:
$ cat household_survey.csv | head -n 5This says: “print the file, but only show me the first 5 lines.” (Same result as head -n 5 household_survey.csv, but the pipe pattern becomes essential for longer chains.)
Counting things (wc)
wc stands for word count, but it does more than that:
$ wc -l household_survey.csv # count lines
5001 household_survey.csv
$ wc -w README.md # count words
342 README.mdSince a CSV file has one row per line (usually), wc -l tells you how many observations you have (minus 1 for the header). A quick way to check dataset size without loading anything.
Combining pipes
Now the real payoff. You can chain as many commands as you need:
How many observations are in the treatment group?
$ grep ",1," household_survey.csv | wc -l
2487This says: find all lines containing ,1, (the treatment indicator, surrounded by commas), then count them.
What variables are in this dataset?
$ head -n 1 household_survey.csv
hhid,district,treatment,income,n_childrenWhich do-files use the regress command?
$ grep -rl "regress" ./code/ | sort
./code/02_analysis.do
./code/03_robustness.do
./code/05_heterogeneity.doOutput redirection (> and >>)
Instead of printing to the screen, you can send output to a file:
$ grep "ERROR" analysis.log > errors.txt # write to a new file (overwrites)
$ grep "WARNING" analysis.log >> errors.txt # append to existing filePipes are the terminal’s version of method chaining or piping in R (%>%). Each command does one thing well, and you compose them to answer complex questions. It’s the Unix philosophy: small, focused tools that combine. This is also how Stata works — gen, replace, collapse, merge each do one thing, and you chain them together in a do-file.
A Few More Useful Commands
| Command | What It Does | Example |
|---|---|---|
clear |
Clear the terminal screen | clear |
history |
Show your recent commands | history (then re-run one with !42) |
man |
Read the manual for a command | man grep (press q to exit) |
which |
Find where a program lives | which stata shows the path to Stata |
echo |
Print text | echo "hello" or echo $PATH to see your PATH |
If the terminal seems frozen or stuck in a command:
Ctrl + C— cancel the current commandq— quit interactive views (likelessorman)Ctrl + D— exit the terminal sessionCtrl + L— clear the screen (same asclear)
You will use Ctrl + C constantly. It’s the universal “nevermind, stop” signal.
Exercise: Exploring a Replication Package
This exercise simulates what you’d actually do when you download a replication package or start working with a collaborator’s project. You’ll use only the terminal — no Finder, no Stata, no Excel.
Setup
Pick a project folder on your computer — ideally one with some do-files, CSVs, or other research files. If you don’t have one handy, create a practice structure:
$ mkdir -p ~/practice-project/code
$ mkdir -p ~/practice-project/data/raw
$ mkdir -p ~/practice-project/output
$ echo "hhid,treatment,income,district" > ~/practice-project/data/raw/survey.csv
$ echo "1001,1,45000,Nairobi" >> ~/practice-project/data/raw/survey.csv
$ echo "1002,0,32000,Mombasa" >> ~/practice-project/data/raw/survey.csv
$ echo "1003,1,51000,Nairobi" >> ~/practice-project/data/raw/survey.csv
$ echo "1004,0,28000,Kisumu" >> ~/practice-project/data/raw/survey.csvTasks
Work through these using only the terminal:
- Navigate to the project folder and confirm your location with
pwd - Explore the folder structure: What directories exist? What files are in each?
- Check the data: How many observations (rows) are in the CSV? What are the variable names?
- Search: If you have do-files, find all lines that contain
genorregress. If using the practice data, search for “Nairobi” in the CSV. - Count: How many observations are from Nairobi? (Use
grepandwc -l) - Save your work: Redirect the results of your Nairobi search to a file called
nairobi_obs.txt - Verify: Use
catto confirm the file was created correctly
Sample solution (for the practice data)
$ cd ~/practice-project
$ pwd
/Users/emily/practice-project
$ ls -R
code data output
./code:
./data:
raw
./data/raw:
survey.csv
./output:
$ wc -l data/raw/survey.csv
5 data/raw/survey.csv
$ head -n 1 data/raw/survey.csv
hhid,treatment,income,district
$ grep "Nairobi" data/raw/survey.csv
1001,1,45000,Nairobi
1003,1,51000,Nairobi
$ grep "Nairobi" data/raw/survey.csv | wc -l
2
$ grep "Nairobi" data/raw/survey.csv > output/nairobi_obs.txt
$ cat output/nairobi_obs.txt
1001,1,45000,Nairobi
1003,1,51000,NairobiDiscussion Questions
- Many economics journals now require replication packages. How does command-line literacy help you create better replication packages? How does it help you evaluate someone else’s package?
- A colleague says “I can do all of this in Stata — why learn another tool?” What can the terminal do that Stata can’t? What’s the value of having a tool that works outside of any specific application?
- Think about a repetitive task you’ve done manually (renaming files, checking data, copying folders). How might you approach it differently with the terminal?
- Why do you think the command line has survived for 50+ years while graphical interfaces have changed completely every decade? What does this tell you about which skills are worth investing in?
Key Takeaways
- The terminal is a reproducible interface to your computer. Every action is a typed command that can be recorded, shared, and replayed — unlike point-and-click workflows.
- A handful of commands covers most needs.
pwd,ls,cd,cat,head,grep,wc, and pipes will handle the majority of what you need as an economist. - Pipes are the key insight. Combining small, focused commands into chains lets you answer complex questions without writing a script or opening an application.
- This is the foundation for everything else. Git, remote computing, Stata batch mode, and AI coding tools all assume you can navigate a terminal. Time invested here pays off repeatedly.
For instructors: This module works best as a live-coding session where students follow along on their own machines. Go slowly through the first few commands (pwd, ls, cd) — students who have never used a terminal will need time to build confidence. The exercise at the end can be done individually or in pairs.
Common student issues: (1) Windows students may need help installing Git Bash or WSL before the session — consider sending setup instructions in advance. (2) Students often forget cd changes are persistent across commands (they expect each command to “reset”). (3) Tab completion is the single most impactful thing you can teach early — demonstrate it repeatedly.
Adaptation: For a shorter session (~45 min), skip the “File Operations” and “A Few More Useful Commands” sections and focus on navigation, reading files, grep, and pipes. These are the highest-value skills for the exercise.
Connection to other modules: This module is a prerequisite for B2 (Git Basics), which assumes students can navigate the terminal. Consider scheduling them in the same week.