If you have any questions—or discover cool UNIX tricks—please get in touch! You can email Jeff Rich (jrich@carnegiescience.edu) or me (amatthews@carnegiescience.edu) or find us in our offices (#130 and #132).

UNIX Introduction

UNIX is an operating system first developed in the 1960s, but remains under constant development to this day. It is made up of three parts: 1. the kernel, 2. the shell, and 3. the programs. This is all about the shell—which interfaces between the kernel (the hub of the operating system) and the user (you!). unix

Why learn UNIX?

It’s powerful, but don’t take my word for it. The research publication Nature listed “Five reasons why researchers should learn to love the command line,” summarized here:

  1. Wrangle files. You can repeat simple tasks (like renaming, adding a date stamps, converting to a different format) across multiple files.
  2. Handle big data. At a certain point, data sets become too big for GUI programs to handle. Using the command-line you can quickly sort through large numbers of files (>millions) and/or extract information from large files.
  3. Manipulate spreadsheets. From the command line you can cut and extract columns, count words/lines/characters, filter files for a certain condition. Using the ‘pipe’ function you can combine these functions.
  4. Parallelize your work. You almost always (maybe actually always?) interact with high performance computing systems (HPCs) through the command-line and bash scripts.
  5. Automate. Anything you can type on the command line can be included in a bash/shell script. This means you can line up various commands to run after one-another, and you can even schedule them to run at a particular time!

    Opening a terminal:

    There are many ways to open a terminal (also called the “command line”) on either Mac or Linux machines, but here are some quick options: On a Mac: ⌘ Cmd + Space and type terminal, hit enter. On a Linux: Click ApplicationsSystem ToolsTerminal. On a Windows: (Your machine should have bash accesible from the Windows Terminal, or if you have Visual Studio you can use that! We will help you figure it out.)

There are different types of shells (e.g. bash, csh, zsh, etc.), but we will be using the bash shell. To check what shell you’re using, type: echo $SHELL in the terminal:

$ echo $SHELL
/bin/bash

echo tells the terminal to write the given text to screen. For example:

$ echo 'Am I a cat?'
Am I a cat?

The $ before echo just indicates the beginning of the line on a Linux-based machine. It may differ depending on whether you are using a Linux or a Mac machine. The $ before “SHELL” in echo $SHELL indicates a shell variable. Shell variables can be useful for scripting and customizing your workspace.

Directories and Navigation

Let’s get familiar with navigating the filesystem on our computer. The home directory is indicated by the special character ~. On a Mac, this is equivalent to /Users/[your computer name].

We’re going to be downloading a sample directory to play around with, but before we do, let’s create a suitable place for it to live. To make sure we are in our home directory, type cd in the command line, this changes to the home directory regardless of where you are in your filesystem. To create a new directory, we will use the command mkdir:

$ mkdir cassi

This will create a new directory named cassi (you can name it whatever you want!) in the home directory. To enter the directory, type cd cassi. If you want to list all files and subdirectories within a directory, type ls. Since we just created this directory, there are no files and nothing should appear.

There are two command line tools commonly used to download files from a server: wget and curl. wget is a simpler transfer utility that can be used with urls beginning with http,https, ftp, and ftps. curl has expanded capabilities, such as supporting more protocols (sets of rules governing the communication and exchange of data over the internet) and more options for uploading files. If it’s a simple task, wget is most likely sufficient.

In the cassi directory, we are going to download a sample directory set to play around with. To download with wget:

$ wget https://github.com/allison-matthews/unixtutorial/archive/main.zip

and to download with curl:

$ curl -L -O https://github.com/allison-matthews/unixtutorial/archive/main.zip

Notice the -O in the curl command, this is necessary to save the contents to a file/directory, otherwise everything would be printed on the command line. wget will automatically saves the contents to a file/directory.

tar and zip

Compressed and/or archived files and directories can have different file endings corresponding to the algorithm used for the compression: .zip, .gz, .tar , .tgz, .tar.gz, etc. tar was originally designed for creating archives to store files on magnetic tape which is why it has its name “tape archive”. If there is a z in the file extension, it means the file/directory was compressed - files ending simply in .tar are uncompressed. There are many options you can use with tar when uncompressing a file/directory. A common usage is: tar -xzvf filename.tar.gz.

graph TD; %%{init: {'theme':'dark'}}%% A[tar -xzvf]; A--> B[x: extract archive]; A--> C[z: uncompress archive]; A--> D[v: verbose]; A--> E[f: filename/location of archive];

To create an archive with tar, you might use something like: tar -czvf tarfile.tar *. This creates a tar file named “tarfile.tar” with all the files in the current directory. To list the contents of an archive: tar -tzvf filename.

The ZIP format supports lossless file compression, and files with the extension .zip may contain one or more directories. To extract the contents of a .zip archive, use the unzip command: unzip filename.zip. If you don’t want to see the printed messages: unzip -q filename.zip (-q means quiet). Likewise, you can use zip in the same way to compress a file: zip filename.zip file or a directory: zip -r directoryname.zip directory/.

Practice:

Extract the contents of unixtutorial-main.zip

tar comic

Exploring detailed information of files/directories

If you type ls into the command line now, you should see a list of files and subdirectories. In order to list more information on the contents, enter ls -l. You should see something similar to the image below. ls -l example Lines that begin with d indicate directories. The first column lists the file/directory permissions: r: read permission, w: write permission, x: execute permission.

graph TD; %%{init: {'theme':'dark'}}%% A[Permissions] & B[User] & C[Group] & D[Size] & E[Creation time] & F[filename]; A-->G[drwxr-xr-x]; B-->H[amatthews]; C-->J[obsusers]; D-->K[4096 bytes]; E-->L[Jun 8 2020]; F-->M[data];

Let’s explore the Permissions a little more:

graph TD; %%{init: {'theme':'dark'}}%% A[drwxr-xr-x]; A --> B([d: directory]); A --> C([rwx: USER can read,
write, and execute]); A --> D([r-x: GROUP can read,
execute, but not write]); A --> E([r-x: OTHERS can read,
execute, but not write])

Sometimes you may want to change the permission of a file or directory. In order to do so, you use the chmod command. For example, to give write permissions for the file README.md to group members and all others on the computer:

$ chmod go+w README.md

changed permissions Now the user, the group, and all others can write to the file README.md. To remove permissions, use - in the place of +. As shown above, g and o correspond to group and others permissions, and it follows that u is used when changing user permissions. Without specifying u, g, or o, the default is to change only the user (u) permissions.

You may have noticed that my sample directory is named unixpizza instead of unixtutorial-main. The mv (move) command changes a file’s complete path, so we can use the command to simply change a file or directory name. For example, while in the cassi directory, you can input the following the change the directory name of unixtutorial-main to something easier to type.

$ mv unixtutorial-main unixpizza

Suppose we want to move the makepizza.py file into the sub-directory: flatbread/. While in the unixpizza/ directory:

$ mv makepizza.py flatbread/makepizza.py

While in the unixpizza/ directory, if you wanted to move something (the makepizza.py file, for example) in the flatbread directory back into the unixpizza/ directory:

$ mv flatbread/makepizza.py makepizza.py

OR

$ mv flatbread/makepizza.py .

The period . indicates the current directory, so that command is telling the computer to move the file makepizza.py from the flatbread/ directory into the current directory, keeping its name makepizza.py. There are times when you may want to move a file to a different directory and change its name in the new directory. This can be done with one command in the terminal. For example, to move the python program makepizza.py into the flatbread/ sub-directory and change its name to make_me_a_pizza.py:

$ mv makepizza.py flatbread/make_me_a_pizza.py

WARNING: If you use mv to change a filename to something that already exists in that directory, it will overwrite the existing file. There is no way to recover it.

COMMAND ACTION
pwd (present working directory) show the full path of your current directory
cd (change directory), alone this will go to your home directory
cd - go to the last directory you were in
cd .. go up one directory
ls list items in directory
ls -l a “long listing” of items in the current directory with time of creation and file permissions
ls -lt a “long listing” of items in the current directory with time of creation and file permissions, sorted by most recently modified
mkdir make new directory within current directory
chmod change mode: change the permissions of a file or directory.
mv move a file to a different directory or change its name
cp copy a file to a new filename or into a different directory
cp -r -r tells the computer to copy directories recursively, saving you from having to copy files individually.

Control Characters

Your mouse won’t be able to move your cursor on the command line, but control characters allow you to move around with ease. You might even find it faster. Here are some scenarios where control characters come in handy:

  1. Suppose you’re trying to run a python script and notice you made a typo at the very beginning of the line. Instead of hitting the left arrow repeatedly in frustration, you can type cntrl-a and your cursor will move to the start of the line.
  2. Sometimes you haven’t had your tea or coffee yet and you start running a program before realizing you forgot to change a parameter or filename within the code. The good news is you don’t have to wait until the program finishes to go back and make your changes. Instead, typing cntrl-c into the terminal will cancel whatever command is running.

Below is an incomplete list of some control characters that make life just a little more efficient:

CONTROL CHAR ACTION
cntrl-a move cursor to start of line
cntrl-e move cursor to end of line
cntrl-c cancel command
cntrl-l clear terminal screen (same as clear)
cntrl-k delete from cursor to end of line
cntrl-a cntrl-k go to beginning of line and delete whole line

Tab Completion

If you start typing a few characters of a file or directory, e.g.

$ cd Downl

and then hit the tab key. The shell will automatically complete the directory (or file) name that matches the start of what you typed. This can be very helpful with long filenames. The shell will complete the name until there is no longer a unique match. Example: suppose you had a directory with the following files:

Tile_8615432346_15.pkl 
Tile_8615432350_15.pkl
Tile_8615432357_15.pkl

Typing ls -l Ti and then tab will result in this on the command line:

$ ls -l Tile_86154323

Entering another character (say 4) will result in only one unique match, and then tab will complete the enter filename

$ ls -l Tile_8615432346_15.pkl

Aliases

If typing ls -l is more trouble than it’s worth, we can make an alias for it! In the terminal, type alias. This will list the current aliases you have set up thus far, which may be none - but that will change. To establish an alias for ls -l:

$ alias ll='ls -l'

This can also be useful if you frequently make the same typos. For example:

$ alias emasc='emacs'

Now, if you enter alias into the terminal, it should read

alias ll='ls -l'
alias emasc='emacs'

Text editors via the terminal

emacs vs vim Jupyter notebooks are useful for some things, but you often want to create and run python programs directly from the command line. There are many text editors, but two extremely popular (and therefore likely to be available on most computers) are emacs and vim. Scientists and programmers have been arguing about which is better for decades (see above comic), but in reality they both get the job done. A good text editor to start with is nano. To create a file with nano:

nano my_new_file.txt

nano text editor

Practice:

Work with your neighbor to create a python program that prints “Hello Carnegie!” using nano. To run your python program, type: python program.py on the command line.

Miscellaneous, but very useful, commands

A note before going through individual commands: if you ever need more information on what a command does or what additional options you can feed it to get a certain output, use man (manual) or --help.

$ man cat

man cat example

$ cat --help

cat --help example NB: There is a package called cheat which you can download that gives much more readable information and examples. See the “Resources” section for details on installing and using it.

top and ps

There may come a time when you need to kill a program and cntrl-c is not doing the trick. Entering the top command in your terminal will list all processes on your computer with their Program ID’s (PIDs).

top example

If you need to force kill a program, find the associated PID (3130 for example) and type

kill 3130  

top is also useful for determining how much memory and CPU a program is using. If a program seems to be running much more slowly than you anticipated, it could be because there is no more available RAM (random access memory).

ps is another useful command for finding a PID, especially if you are looking for processes with a name or string that you already know. When run on a personal computer ps -A returns a list of all processes run by every user, including root.

This is likely many hundreds of processes, so you can either use this command in combination with ‘grep’ (see below) or you can direct the output to a text file using ps -A > proc.txt. You can then view proc.txt with a text editor and search for the PID. In general > can be used to direct text output to a file and may be handy in other situations.

cat

The cat command (short for concatenate) outputs the contents of a file to the command line. cat example It works on multiple files. cat on multiple files Here is a good resource for more information and examples on how you can use cat, including printing the output of file with line numbers, creating files, copying the contents of a file to another. My favorite: to display the lines of a file in reverse order use tac instead of cat (clever). tac example

head, tail, wc

The commands head and tail enable you to view the first (head) or last (tail) lines of a file without having to open it. By default, these commands will show the first/last 10 lines. If you want to only see the first/last 3 lines, you can use the -n option followed by the number of lines you wish to see. There is also a command called wc (word count) which tells you three statistics about a file: 1) the number of lines, 2) the number of words, and 3) the number of characters. See the example below. head/tail example

grep, sed

Regular Expressions are used for textual pattern matching and is almost a computer language onto itself. It is built into most modern programming languages (the python package is called re) and text editors. On the command line, you can use regular expressions through the grep command (get regular expression). To use grep:

$ grep <pattern> <files to search>

Here are some simple search examples:

grep input translation
$ grep cheese carne.txt select all lines in carne.txt containing ‘cheese’
$ grep '^p' carne.txt ^ = start of line, select all lines starting with ‘p’
$ grep 's$' carne.txt $ = end of line, select all times ending with ‘s’

Square brackets match a single character. For example, [bc]at will match ‘bat’ and ‘cat, but not ‘hat’. You can place a range of letters or numbers inside the brackets:

input translation
[b-k]at matches ‘bat’, ‘cat’, ‘gat’, but not ‘mat’ or ‘pat’
[3-6]5 matches 35, 45, 55, 65, but not 75, 85, etc.
[a-z] matches any single lowercase letter
[a-zA-Z] matches any single upper or lowercase letter
[a-zA-Z]& matches any character that appears at the end of a line
[^abc] matches any character except ‘a’, ‘b’, or ‘c’

The website https://regex101.com/ allows you to test regular expressions to ensure they match the pattern you want. They also have a fun quiz to practice your regex skills!

Practice:

Without opening any files:

  1. Use grep to find the pizzas that contain onions.
  2. Combine grep and sort to find the pizzas that use the word “organic” the most times. (Hint: remember you can type –help after a command to see additional options!)
  3. Head to https://regex101.com/quiz and try out their practice problems!

sed stands for stream editor, and while it has a ton of capabilities (e.g. searching through a file, find and replace, insertion and deletion) the most common thing it is used for is find and replace. For example, in the past I’ve run programs that needed a separate file that listed the values of parameters being used. I wanted to run the program many times for a range of parameters, so instead of manually opening the file, finding the right line, and making the change, I used sed! If you want more examples of how to use it, go here.

pipe, sort

pipe allows you to input the output of one command directly into another. Suppose you want to output the sorted contents of a file to the terminal. You can do this by ‘piping’ the output of cat to the command sort: simple pipe example Truthfully, you would have gotten the same output by entering sort pe.txt, without the pipe. If you want to sort a file numerically, use the -n option: numerical sort example

Practice:

Without opening any files:

  1. Combine wc with sort to list the files from those with the fewest lines to the most lines.
  2. Same as above, but list the files from those with the fewest characters to the most characters. (Hint: remember you can use –help after a command to see additional options!)

history and !

As you are working you may find that you’ve forgotten a command or sequence and want to double check that you didn’t accidentally delete the wrong file, or you may want to run a command again. Typing history will return a list of all the commands run in that terminal window or shell. If this list is long, you can pipe the list to more:

$ history | more

If you would like to re-run a command in your history, you can use ! followed by the command number in the history. For instance, if you want to rerun command 37 in the list generated by history:

$ !37

will issue that same command. The ! command has a lot of functionality and my be handy in certain situations, see more at https://www.redhat.com/sysadmin/bash-bang-commands

Wildcards * , ? and []

When dealing with files, using wildcards can be a useful way to handle several files at once. If you would like to list all files ending in “.tar” in your current directory for instance, you can use

$ ls *.tar

the string *.tar will allow the command to act on any file that ends in ‘.tar.’ The * may be placed in one or more places in a string:

$ ls *a*
$ ls flatbread/*/*txt

The first command lists all files with an “a” somewhere in the filename, the second command will list all files in all of the subdirectories of “flatbread” that end in “.txt”

You can use ? as a wildcard for single characters:

$ ls flatbread/???_pizza/
$ ls flatbread/???_pizza/*txt

These commands will list any files or the files ending in “txt” in subdirectories of “flatbread” that begin with three of any character and end in “_pizza”

Finally, [] can be used to list files with a range of letters or numbers. This command can be useful when managing lists of files:

$ ls flatbread/white_pizza/[p-s]*txt

WARNING: Be extremely careful when using wildcards when deleting files!

To learn more about wildcards, visit https://www.warp.dev/terminus/linux-wildcards

Replacing Characters with ^

If you are issuing a series of commands and you make a typo, you can replace a string in the command you just issued using ^. This can be especially handy if you’re dealing with file or directory names:

$ cat flatbread/white_pizza/ransh.txt
cat: flatbread/white_pizza/ransh.txt: No such file or directory
$ ^ransh^ranch

This will replace the first instance of “ransh” in the previous command with “ranch”

Connecting to a remote machine

(this section came from a previous tutorial put together by Drew Newman and Mahdi Qezlou - thank you!) This summer, some of you may be asked to connect to a remote machine or download/upload files to/from it. You can do it all from your terminal via ssh command (secured shell). NOTE: To connect to computers on the Carnegie network from off campus, you need to be connected via the VPN. If you need this access, please ask one of us or Chris Burns and we will get it set up. First, you need to know the domain, username, and password of the remote machine. If the remote machine is a high-performance computing center like Mies in Carnegie, ask the support group to share this info with you. You can also set up ssh access to a desktop located somewhere else. The simplest working example on how to use ssh is:

$ ssh username@domain

For example:

$ ssh amatthews@ssh.obs.carnegiescience.edu

It will ask about your password and then connect you to the shell on the remote machine. From then on, you are using the terminal on that machine.

Transferring files to/from the remote machine:

If you need to transfer files between your local computer and the remote machine, you should use commands like rsync. This command is based on ssh; therefore, the username and passwords are the same. rsync recursively updates the desired files on your local or remote machines, and if any of the files already exists on the destination directory and are the same as the new one, they will not be transferred. Example transferring from your local machine to a remote machine:

$ rsync -a -P path/to/local/files amatthews@ssh.obs.carnegiescience.edu:/path/to/remote/files

Example transferring from a remote machine to your local one:

$ rsync -a -P amatthews@ssh.obs.carnegiescience.edu:/path/to/remote/files /path/to/local/destination

The -a option preserves some info on the files like ownership, groups, permission, and modification time. The -P option loads a progress bar and permits partial transfer in case your network cuts off. Once reconnected, using rsync again downloads the remaining portion of the files.

Additional Resources

  • A long list of UNIX commands.
  • cheat is a package you can download that enables you to type a UNIX command tight in the terminal and see the definition, how it is used, and various examples (it’s really useful).
    • You can download cheat using the package manager Homebrew.
    • If you don’t have Homebrew installed, follow the instructions on the homepage to do so.
    • To install cheat:
        brew install cheat
      
    • Or you can look at additional ways to install it here.
    • Here is an example of what the cheat command outputs when you use it. It gives you not only directions for use, but real examples with explanations. NB: tail -f can be very useful if you’re having a program update a file and you want to check on its progress! cheat example
  • If you need to repeat many commands or sequences of commands, especially complex ones, you might consider writing a shell script, a text file that can do this for you. See e.g. https://www.tutorialspoint.com/unix/shell_scripting.htm