Missing Semester Learning Notes
This is the note while learning the course Missing Semester.
Ref: https://missing.csail.mit.edu/
Shell
Environment Variable
If the shell is asked to execute a command that doesn’t match one of its programming keywords, it consults an environment variable called $PATH
that lists which directories the shell should search for programs when it is given a command:
1 | missing:~$ echo $PATH |
When we run the echo
command, the shell sees that it should execute the program echo
, and then searches through the :
-separated list of directories in $PATH
for a file by that name. When it finds it, it runs it (assuming the file is executable; more on that later). We can find out which file is executed for a given program name using the which
program. We can also bypass $PATH
entirely by giving the path to the file we want to execute.
Permission of directory
1 | missing:~$ ls -l /home |
Above, only the owner is allowed to modify (w
) the missing
directory (i.e., add/remove files in it). To enter a directory, a user must have “search” (represented by “execute”: x
) permissions on that directory (and its parents). To list its contents, a user must have read (r
) permissions on that directory.
Connect programs
The simplest form of redirection is < file
and > file
. These let you rewire the input and output streams of a program to a file respectively:
1 | missing:~$ echo hello > hello.txt |
You can also use >>
to append to a file.
The |
operator lets you “chain” programs such that the output of one is the input of another:
1 | missing:~$ ls -l / | tail -n1 |
Shell Script
Variable and String
To assign variables in bash, use the syntax foo=bar
and access the value of the variable with $foo
. Note that foo = bar
will not work since it is interpreted as calling the foo
program with arguments =
and bar
. In general, in shell scripts the space character will perform argument splitting.
Strings in bash can be defined with '
and "
delimiters, but they are not equivalent. Strings delimited with '
are literal strings and will not substitute variable values whereas "
delimited strings will.
1 | foo=bar |
Function and Argument
As with most programming languages, bash supports control flow techniques including if
, case
, while
and for
. Similarly, bash
has functions that take arguments and can operate with them. Here is an example of a function that creates a directory and cd
s into it.
1 | mcd () { |
Here $1
is the first argument to the script/function. Unlike other scripting languages, bash uses a variety of special variables to refer to arguments, error codes, and other relevant variables. Below is a list of some of them. A more comprehensive list can be found here.
$0
- Name of the script$1
to$9
- Arguments to the script.$1
is the first argument and so on.$@
- All the arguments$#
- Number of arguments$?
- Return code of the previous command$$
- Process identification number (PID) for the current script!!
- Entire last command, including arguments. A common pattern is to execute a command only for it to fail due to missing permissions; you can quickly re-execute the command with sudo by doingsudo !!
$_
- Last argument from the last command. If you are in an interactive shell, you can also quickly get this value by typingEsc
followed by.
Return Value
Commands will often return output using STDOUT
, errors through STDERR
, and a Return Code to report errors in a more script-friendly manner. The return code or exit status is the way scripts/commands have to communicate how execution went. A value of 0 usually means everything went OK; anything different from 0 means an error occurred.
Exit codes can be used to conditionally execute commands using &&
(and operator) and ||
(or operator), both of which are short-circuiting operators. Commands can also be separated within the same line using a semicolon ;
. The true
program will always have a 0 return code and the false
command will always have a 1 return code. Let’s see some examples
1 | false || echo "Oops, fail" |
Temporary file
Another common pattern is wanting to get the output of a command as a variable. This can be done with command substitution. Whenever you place $( CMD )
it will execute CMD
, get the output of the command and substitute it in place. For example, if you do for file in $(ls)
, the shell will first call ls
and then iterate over those values. A lesser known similar feature is process substitution, <( CMD )
will execute CMD
and place the output in a temporary file and substitute the <()
with that file’s name. This is useful when commands expect values to be passed by file instead of by STDIN. For example, diff <(ls foo) <(ls bar)
will show differences between files in dirs foo
and bar
.
An example of shell script
1 | !/bin/bash |
When performing comparisons in bash, try to use double brackets [[ ]]
in favor of simple brackets [ ]
.
Shell globbing
When launching scripts, you will often want to provide arguments that are similar. Bash has ways of making this easier, expanding expressions by carrying out filename expansion. These techniques are often referred to as shell globbing.
- Wildcards - Whenever you want to perform some sort of wildcard matching, you can use
?
and*
to match one or any amount of characters respectively. For instance, given filesfoo
,foo1
,foo2
,foo10
andbar
, the commandrm foo?
will deletefoo1
andfoo2
whereasrm foo*
will delete all butbar
. - Curly braces
{}
- Whenever you have a common substring in a series of commands, you can use curly braces for bash to expand this automatically. This comes in very handy when moving or converting files.
1 | convert image.{png,jpg} |
Shell check
Finding files
One of the most common repetitive tasks that every programmer faces is finding files or directories. All UNIX-like systems come packaged with find
, a great shell tool to find files. find
will recursively search for files matching some criteria. Some examples:
1 | Find all directories named src |
Beyond listing files, find can also perform actions over files that match your query. This property can be incredibly helpful to simplify what could be fairly monotonous tasks.
1 | Delete all files with .tmp extension |
Despite find
’s ubiquitousness, its syntax can sometimes be tricky to remember. For instance, to simply find files that match some pattern PATTERN
you have to execute find -name '*PATTERN*'
(or -iname
if you want the pattern matching to be case insensitive).
my example:
1 | find . -name 'source*' |
Finding code
Finding files by name is useful, but quite often you want to search based on file content. A common scenario is wanting to search for all files that contain some pattern, along with where in those files said pattern occurs. To achieve this, most UNIX-like systems provide grep
, a generic tool for matching patterns from the input text. grep
is an incredibly valuable shell tool that we will cover in greater detail during the data wrangling lecture.
For now, know that grep
has many flags that make it a very versatile tool. Some I frequently use are -C
for getting Context around the matching line and -v
for inverting the match, i.e. print all lines that do not match the pattern. For example, grep -C 5
will print 5 lines before and after the match. When it comes to quickly searching through many files, you want to use -R
since it will Recursively go into directories and look for files for the matching string.
But grep -R
can be improved in many ways, such as ignoring .git
folders, using multi CPU support, &c. Many grep
alternatives have been developed, including ack, ag and rg. All of them are fantastic and pretty much provide the same functionality. For now I am sticking with ripgrep (rg
), given how fast and intuitive it is. Some examples:
1 | Find all python files where I used the requests library |
Vim
Modal editing
Vim’s design is based on the idea that a lot of programmer time is spent reading, navigating, and making small edits, as opposed to writing long streams of text. For this reason, Vim has multiple operating modes.
- Normal: for moving around a file and making edits
- Insert: for inserting text
- Replace: for replacing text
- Visual (plain, line, or block): for selecting blocks of text
- Command-line: for running a command
Keystrokes have different meanings in different operating modes. For example, the letter x
in Insert mode will just insert a literal character ‘x’, but in Normal mode, it will delete the character under the cursor, and in Visual mode, it will delete the selection.
In its default configuration, Vim shows the current mode in the bottom left. The initial/default mode is Normal mode. You’ll generally spend most of your time between Normal mode and Insert mode.
You change modes by pressing <ESC>
(the escape key) to switch from any mode back to Normal mode. From Normal mode, enter Insert mode with i
, Replace mode with R
, Visual mode with v
, Visual Line mode with V
, Visual Block mode with <C-v>
(Ctrl-V, sometimes also written ^V
), and Command-line mode with :
.
Command-line
Command mode can be entered by typing :
in Normal mode. Your cursor will jump to the command line at the bottom of the screen upon pressing :
. This mode has many functionalities, including opening, saving, and closing files, and quitting Vim.
:q
quit (close window):w
save (“write”):wq
save and quit:e {name of file}
open file for editing:ls
show open buffers:help {topic}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
open help
- `:help :w` opens help for the `:w` command
- `:help w` opens help for the `w` movement
### Movement
You should spend most of your time in Normal mode, using movement commands to navigate the buffer. Movements in Vim are also called “nouns”, because they refer to chunks of text.
- Basic movement: `hjkl` (left, down, up, right)
- Words: `w` (next word), `b` (beginning of word), `e` (end of word)
- Lines: `0` (beginning of line), `^` (first non-blank character), `$` (end of line)
- Screen: `H` (top of screen), `M` (middle of screen), `L` (bottom of screen)
- Scroll: `Ctrl-u` (up), `Ctrl-d` (down)
- File: `gg` (beginning of file), `G` (end of file)
- Line numbers: `:{number}<CR>` or `{number}G` (line {number})
- Misc: `%` (corresponding item)
- Find: `f{character}`, `t{character}`, `F{character}`, `T{character}`
- find/to forward/backward {character} on the current line
- `,` / `;` for navigating matches
- Search: `/{regex}`, `n` / `N` for navigating matches
### Selection
Visual modes:
- Visual
- Visual Line
- Visual Block
Can use movement keys to make selection.
### Edits
Everything that you used to do with the mouse, you now do with the keyboard using editing commands that compose with movement commands. Here’s where Vim’s interface starts to look like a programming language. Vim’s editing commands are also called “verbs”, because verbs act on nouns.
- `i`enter Insert mode
- but for manipulating/deleting text, want to use something more than backspace
- `o` / `O` insert line below / above
- `d{motion}`delete {motion}
- e.g. `dw` is delete word, `d$` is delete to end of line, `d0` is delete to beginning of line
- `c{motion}`change {motion}
- e.g. `cw` is change word
- like `d{motion}` followed by `i`
- `x` delete character (equal do `dl`)
- `s` substitute character (equal to `xi`)
- Visual mode + manipulation
- select text, `d` to delete it or `c` to change it
- `u` to undo, `<C-r>` to redo
- `y` to copy / “yank” (some other commands like `d` also copy)
- `p` to paste
- Lots more to learn: e.g. `~` flips the case of a character
### Counts
You can combine nouns and verbs with a count, which will perform a given action a number of times.
- `3w` move 3 words forward
- `5j` move 5 lines down
- `7dw` delete 7 words
> To be continued
# Data Wrangling
### Filtering in remote server
```shell
ssh myserver 'journalctl | grep sshd | grep "Disconnected from"' | less
Why the additional quoting? Well, our logs may be quite large, and it’s wasteful to stream it all to our computer and then do the filtering. Instead, we can do the filtering on the remote server, and then massage the data locally. less
gives us a “pager” that allows us to scroll up and down through the long output. To save some additional traffic while we debug our command-line, we can even stick the current filtered logs into a file so that we don’t have to access the network while developing:
1 | ssh myserver 'journalctl | grep sshd | grep "Disconnected from"' > ssh.log |
sed
sed
is a “stream editor” that builds on top of the old ed
editor. In it, you basically give short commands for how to modify the file, rather than manipulate its contents directly (although you can do that too). There are tons of commands, but one of the most common ones is s
: substitution. For example, we can write:
1 | ssh myserver journalctl |
What we just wrote was a simple regular expression; a powerful construct that lets you match text against patterns. The s
command is written on the form: s/REGEX/SUBSTITUTION/
, where REGEX
is the regular expression you want to search for, and SUBSTITUTION
is the text you want to substitute matching text with.