Command Line Basics

Logging in

Start up Terminal (or PuTTY), if you have an RCDS account:

ssh username@marvin.ibest.uidaho.edu

If not, an account will have been created for you on our classroom servers:

ssh username@jayne.ibest.uidaho.edu

other options

You should see our Message Of The Day (MOTD)

Benjamins-iMac:chef-cookbooks boswald$ ssh benji@marvin.ibest.uidaho.edu
benji@marvin.ibest.uidaho.edu's password:

   IIDS RESEARCH COMPUTING AND DATA SERVICES
   WARNING: To protect the system from unauthorized use and to
   ensure that the system is functioning properly, activities
   on this system are monitored recorded and subject to audit.
   Use of this system is expressed consent to such monitoring
   and recording. Any unauthorized access or use of this system
   is prohibited and subject to criminal and civil penalties.

   +-----------------------------------------------+
   |        Current Standalone System Usage        |
   +-----------------------------------------------+
   |          SERVER   MEM Usage(%)   CPU Usage(%) |
   |           crick           2.48           0.00 |
   |            ford           3.27           0.29 |
   |          marvin           5.87           6.22 |
   |          slarti           7.68           0.64 |
   |           tesla          24.15           0.68 |
   |        trillian           4.00           0.07 |
   |          watson           1.29           2.00 |
   |           whale           1.53           1.70 |
   |          zaphod          29.41           0.11 |
   +-----------------------------------------------+

   SYSTEM:  marvin.ibest.uidaho.edu
   CORES:   16
   MEMORY:  60445 MB
   SUMMARY: (collected Thu Sep  8 16:05:01 PDT 2016)
      * CPU Usage (total average) = 6.41%
      * Memory used (real)        = 14010 MB
      * Memory free (cache)       = 56634 MB
      * Swap in use               = 872 MB
      * Load average              = 1.00  1.00  1.00

   QUESTIONS: Submit all questions, requests, and system issues
   to: comp-core@uidaho.edu

  Good afternoon Benjamin

benji@marvin ~ $

Note the server name you just logged into in the usage table. If it’s especially busy (CPU Usage > 70%), you’ll be better off logging into a different server.

Your Home Directory

When you log in you are brought to your home directory by default

pwd

Should get the response

/mnt/home/your_user_name

No matter which server you log into, your home directory will be the same. This is the magic of distributed file systems.

Bash

By default you are running in the Bash Shell, which is how you interact with the file system, start programs etc. If you want to search for a command, include the word bash in your query. For example you could google ‘bash create directory path’. Here are some of the most common and useful bash commands:

  • ls - show me the contents of the current directory

  • mkdir <dir_name> - create a new directory

    mkdir your_name_here

  • cd <dir_name> - change directory

    cd your_name_here

  • cd .. - change to the parent directory

Example:

benji@marvin ~ $ mkdir workshop
benji@marvin ~ $ cd workshop
benji@marvin ~/workshop $ cd ..
  • nano <filename> - edit (and create if necessary) a file

    nano somefile.txt

  • rm <filename> - delete a file

  • mv <filename> <destination> - move or rename a file

  • man <command> - show help documentation

  • which <command> - locate the actual executable file of a command (and test whether it exists)
  • top - show system utilization

  • cat <filename> - print the contents of a file to screen (std out)

  • less <filename> - show the contents of a file interactively

Getting data to the server

To download data directly from the internet, use wget

Lets get some data to work with, Mycobacterium tuberculosis 16S Ribosomal RNA

wget -nc http://www.hpc.uidaho.edu/example-data/Myco.tb.fasta

If your data is on your local computer, you can scp the data to the server:

scp /path/to/local/data user@server.domain.com:/path/to/destination

*Note: SCP does not deal well with spaces in paths or filenames.

Other options:

Using the RCDS’s installed software

We use the environment modules package to dynamically load/unload most all of the specialized software.

module avail

To load a module use:

module load module_name/[version]

For example:

module load clustalw

And run:

clustalw2 Myco.tb.fasta -output=nexus

Now that we’ve aligned the sequences - let’s make a tree using MrBayes. First create a MrBayes command file mb.run with nano that contains:

begin mrbayes;
   set autoclose=yes nowarn=yes;
   execute Myco.tb.nxs;
   lset nst=6 rates=gamma;
   mcmc nruns=1 ngen=10000 samplefreq=10 file=Myco.tb.mbout1;
   sumt burnin=500;
end;

Load the MrBayes module and run that file

module load mrbayes
mb mb.run

MrBayes chokes on our nexus file because it doesn’t like ‘|’ characters in the species names. So let’s use some command line voodoo to change those things.

head -n 25 Myco.tb.nxs
head -n 25 Myco.tb.nxs | sed -r "s/gi\|.*gb\|(.*)\.1\|(.*)/\1\2/g"

Sed You’ll probably need to spend some time with the regular expressions to make it work. Once it’s tuned - perform the actual substitution, and edit our mb.run file from above to use the new file

cat Myco.tb.nxs | sed -r "s/gi\|.*gb\|(.*)\.1\|(.*)/\1\2/g" > Myco.tb.mb.nxs
nano mb.run

The | character is commonly called a pipe, which takes the output from one command and dumps it to the next. The cat command by default prints to STDOUT (the screen), but we ‘piped’ it to sed which is a character stream editor (Sed tutorial). The command sed also by default prints to STDOUT, but we directed the output to a file with the > command. The > command creates or overwrites whatever file name follows it - without warning! - so you need to be careful using it. To append to the end of a file, use the >> command.

Try the mrbayes run again

mb mb.run

It works!

Another useful command is grep, which is mostly used for searching through files. For example we could effectivly pull out the sequence data for one of the species in our alignment file with:

grep "AF498004" Myco.tb.mb.nxs

benji@marvin ~/workshop $ grep "AF498004" Myco.tb.mb.nxs
AF498004   --------------------------------------------------
AF498004   --------------------------------------------------
AF498004   --------------CTNAAATGAGAGTTTGATCCTGGCTCAGGNCCGAAC
AF498004   GCTGGCGGCGTGCTTA-ACNCATGCTAGTCGNACGCA---AAGGTCTCCT
AF498004   CGGAGAT--------TCTCGA----GT-GGCGANCGGGTGAGTAACA--C
AF498004   GTGGGTGATCTGCCGTGCATTCGGGATAAGCCTGGGAA--NCTGGGTCTA
AF498004   ATACCGGATAGGACCCCGGAATGCATNNCCTGTGGTGTNTANCGNTTAGC
AF498004   GNNATGGGATGAGCCCGNG------CTATGCGCTGTTGTGGNGTC-TCGT
AF498004   C-CNCCNTNCCCGNCCNGTNGCNGGCAAAANTNGNNNGGGATTTTCCAAA
AF498004   AGGGTTTC-CAAAGGNNNTNAAA---------------------------
AF498004   --------------------------------------------------
AF498004   --------------------------------------------------
AF498004   -----------------------------------------------

We’re starting to generate a bunch of files, so let’s clean up. First make a new directory and then move files into it using the glob or * character.

mkdir Myco
mv Myco.tb.* Myco/

The * character is a wildcard - it will match any character(s). Now let create a single compressed archive of all the Myco.tb files.

tar -cjf Myco.tar.bz2 Myco

The tar (tape archive) command used with the -c option creates an archive (single file) from a directory or a list of files. The -j option specifies that we want the archive compressed to save disk space, and the -f option is for specifying the archive file name. Note that tar does not remove the original files, you have to explicitly do that:

rm -rf Myco

The archive file is now easy to retrieve using scp. Let’s un-archive those files and work with them some more:

tar -xf Myco.tar.bz2
cd Myco

Screen

If you were really wanting to create a good phylogeny for these Mycobacterium, you would want to run MrBayes for much longer. However, you also don’t want to have to stay logged into the RCDS servers while the program is running. In order to keep a program running without being logged in, use the screen command.

screen

What happened? You’re now running commands inside a new interactive shell that you can detach from by pressing Ctrl-A then Ctrl-D. Lets start a longer running MrBayes job. Edit your mb.run file so that the number of generations is 200000. And then start it.

nano mb.run
mb mb.run

Now detach from the screen with Ctrl-A then Ctrl-D

benji@marvin ~/workshop/Myco $ screen
[detached]
benji@marvin ~/workshop/Myco $

And you should be able to see your MrBayes still running by using the top (or htop)command:

top -u benji

At this point you could log out, and MrBayes would continue on until it finished. You reattach with:

screen -r

When you’re done with a screen, close it out by typing exit when you’re in the screen.

Now is as good a time as any to say - be a good computational neighbor. Our servers are shared by many researchers, so please don’t start a computationally intensive job on a server that is already really busy (pay attention to the MOTD, or use top/htop). The servers will cope relatively well with an overloaded processor, but if you run them out of memory - first they slow down markedly as they start to use hard disk space to offload memory (called swapping). Then the system basically goes crazy and starts a process called OOM killer, which pretty much randomly kills things in a last ditch effort to keep the system from becoming completely frozen.

If you accidentally start a process running and want to stop it use Ctrl-C (when it’s running interactively). If you know the process id (from top) you can stop it with the kill command.

kill 2345

Command History

If you use the up and down arrow keys, you can scroll through all the commands you’ve previously entered. To see a list of all the commands you’ve entered, use the history command. When you have a bunch of commands in you history (it will store about 1000), pipe the history through another command like less or tail.

history
history | less
history | tail -n 40

Programming in Bash

You can write a program to do whatever you want using only Bash - but the syntax is a bit different than most other programming languages. First and foremost, spaces matter. In most programming languages the following three lines are equivalent:

a=10
a = 10
a= 10

In Bash, only the first is correct. Once you assign a value to a variable, refer to it by prepending a $

echo $a

The echo command simple means print to the screen (STDOUT). If you just enter $a, the Bash shell will try to run the command 10 (and give you an error). Similarly, only the first of these commands will work:

if [ $a -lt 11 ]; then echo "less than eleven"; fi
if[ $a -lt 11]; then echo "less than eleven"; fi
if [$a -lt 11]; then echo "less than eleven"; fi

Let’s experiment with looping and conditionals. First, let’s create a new directory

cd ..
mkdir bashfun
cd bashfun

Now let’s create a bunch of input files from the built in $RANDOM variable

for i in {1..50}; do echo $RANDOM > num.$i; done

As an exercise, we’ll now create two directories and sort the file by whether the numbers in them are even or odd.

mkdir even odd
for nf in $(ls num.*); do rn=$(cat $nf); if [ $(expr $rn % 2) -eq 0 ]; then mv $nf even/ ; else mv $nf odd; fi ; done
ls even
ls odd
cat even/*
cat odd/*

Let’s deconstruct the above for statement:

                       # when you wrap text in a $(), that tells Bash to execute the commands within
for nf in $(ls num.*)  # list all the files that start with num. and loop over them 
 do                    # starts the execution loop
   rn=$(cat $nf)       # read the file with the name stored in nf and store it as rn
                       # this really only works when the file contains a single line
   if [ $(expr $rn % 2) -eq 0 ] # expr tells Bash to do mathematical operations
     then                       # % means modulo, or the remaider of integer division
                                # compare numbers in Bash with -lt -gt and -eq
      mv $nf even/     # move the file to the even directory
   else                # the above if returned false, so
      mv $nf odd;      # move the file to the odd directory
   fi                  # end if command
done                   # end for loop

Scripts

All of the above commands could be put into a script, and then executed repeatedly. Here’s what that would look like:

#!/bin/bash

# create a bunch of random numbers
for i in {1..50}; do echo $RANDOM > num.$i; done

# sort them
for nf in $(ls num.*); do 
  rn=$(cat $nf) 
  if [ $(expr $rn % 2) -eq 0 ]; then 
    mv $nf even/
    else mv $nf odd
  fi
done

Create a file named random_sort.sh with the above script. The first line is called the shebang line, and indicates what interpreter to use to run the script - in our case Bash (other options could be python or perl etc…). Comment lines start with a #, and are skipped over by Bash. To make this script executable, we need to set the executable bit

chmod +x random_sort.sh

Then we can execute it with:

./random_sort.sh

Why the ./? This tells Bash to look in the current directory for the executable, which it would otherwise not do - because it is a security risk. Bash looks for executables in the $PATH. To see what directories are currently in the $PATH, we can just echo it out.

echo $PATH

If you’ve still got the modules from above loaded, you should see their direcotories listed. Unload the modules and see how the $PATH changes.

module unload mrbayes
echo $PATH

Mostly, the module command just manipulates your $PATH (It also can set other environment variables and load other modules).

Let’s modify our script to accept a command line argument - the number of random number files to generate.

#!/bin/bash

if [ -z $1 ]; then
  echo "You need to enter a number"
  exit
fi
# create a bunch of random numbers
for i in $(seq 1 $1); do echo $RANDOM > num.$i; done

# sort them
for nf in $(ls num.*); do 
  rn=$(cat $nf) 
  if [ $(expr $rn % 2) -eq 0 ]; then 
    mv $nf even/
    else mv $nf odd
  fi
done

Command line arguments are passed to a script in the variables $1, $2, $3 … etc. (the variable $0 contains the name of the script/command). At the top of the script we check to see if the $1 variable is empty (-z), and if it is the script exits. Now if we run our script with a number, it will generate that many files.

./random_sort.sh 10

Of course, there are more advanced methods to parse command line arguments.

Practice exercises:

  • modify the random_sort.sh script to print out the mean of the evens and the mean of the odds
  • modify the random_sort.sh script to create two files - one of the even numbers and one of the odd numbers, and sort the numbers in each file