Course Homepage
September 2017
This project is maintained by UofABioinformaticsHub
Now we’ve found how to navigate directories in bash
let’s move onto some useful tools, tricks and commands.
A very helpful & time-saving tool in bash
is the ability to automatically complete a command, file or directory name using the <tab>
key.
Move the the directory above the Bash_Workshop
directory using the cd
command.
(If you’re already in the Bash_Workshop
directory, you’ll simply need cd ..
).
Now try typing ls Bash
& then hit the <tab>
key.
Notice how Bash_Workshop
is completed automatically!
This functionality will automatically fill as far as it can until conflicting options are reached.
In this case, there was only one option so it was able to complete all the way to the end of the file path.
Where multiple options are present, you can hit the <tab>
key twice and all options will be given to you.
This can be used to also find command names.
Type in he
followed by two strikes of the <tab>
key and it will show you all of the commands that begin with the string he
, such as head
, help
or any others that may be installed on your computer.
If we’d hit the <tab>
key after typing hea
, then the command head
would have auto-completed, although clearly this wouldn’t have saved you any typing.
So far we have explored the commands pwd
, cd
, ls
& man
as well as the pager less
.
Inspect the man
pages for the commands in the following table & fill in the appropriate fields.
Have a look at the useful options & try to understand what they will do if specified when invoking the command.
Write your answers on a piece of paper, or in a plain text file.
Command | Description of function | Useful options |
---|---|---|
man |
Display on-line manual | -k |
pwd |
Print working directory, i.e show where you are | none commonly used |
ls |
List contents of a directory | -a, -h, -l |
cd |
Change directory | (scroll down in man builtins to find cd ) |
mv |
-b, -f, -u | |
cp |
-b, -f, -u | |
rm |
-r (careful…) | |
rmdir |
||
mkdir |
-p | |
cat |
||
less |
||
wc |
-l | |
head |
-n# (e.g., -n100) | |
tail |
-n# (e.g., -n100) | |
echo |
-e | |
cut |
-d, -f, -s | |
sort |
||
uniq |
-c | |
wget |
Sometimes the side effects of a command can also be useful.
For example, we can also use touch
to create an empty file using the command string touch filename
.
To demonstrate this, let’s create an empty text file.
touch blank.txt
We can read it, but it won’t have anything in it, as it’s an empty file.
cat blank.txt
Let’s delete it as it’s really a bit of a pointless file.
rm blank.txt
All the information we’ve seen in the terminal so far has been from a data stream known as standard output, or stdout
for short.
There are two primary data streams in play when we use commands in bash
.
The first as we’ve seen is stdout
, with the alternative stream being standard error, or stderr
for short.
This is where commands and tools send their error messages.
We’ll ignore that for the rest of the day, but it’s good to know it exists.
When a command sends information to us via stdout
, we refer to this as printing to stdout
.
This dates back to the days before everyone had printers, when printing to the screen was the main method of interacting with computers.
We can display a line of plain text in stdout
by using the command echo
.
The most simple function that people learn to write in most languages is called Hello World
and we’ll do the same thing today.
echo 'Hello World'
That’s pretty amazing isn’t it & you can make the terminal window say anything you want without meaning it.
echo 'This computer will self destruct in 10 seconds!'
There are a few subtleties about text which are worth noting. Inspect the man echo page & note the effects of the -e option. This allows you to specify tabs, new lines & other special characters by using the backslash to signify these characters. This is an important concept & the use of a backslash to escape the normal meaning of a character is very common. Try the following three commands & see what effects these special characters have.
echo 'Hello\tWorld'
echo -e 'Hello\tWorld'
echo -e 'Hello\nWorld'
As we’ve seen above, the command echo
just repeats any subsequent text.
Now enter
echo ~
Why did this happen?
Although this may have seemed trivial, we often include lines like the above in scripts to provide messages about the progress of our tasks.
As we’ll see later today, echo
is actually a very heavily used command.
stdout
To a FileInstead of just sending the output of a command to stdout
, we can redirect this output into a file using the >
symbol.
If the file doesn’t exist, bash
will simply create the specified file and write the output into it.
If the file does exist, it will be immediately overwritten without any warnings.
Let’s see the >
symbol in action.
First, let’s make sure we’re in ~/Bash_Workshop
cd ~/Bash_Workshop
Now we’ll combine the echo
command with the redirection:
echo 'Hello' > hello.txt
Notice that the word Hello
was no longer printed to your screen.
Instead a file called hello.txt
was created and we can simply view the entire contents of the file using the command cat
cat hello.txt
If we wish to add any additional information to the file, we can use the >>
symbol which appends the new information to the file.
Again, if the file doesn’t already exist it will be created, but this time if the file does exist, the file will not be overwritten, but rather the new information will be added to the end of the existing information.
Let’s see this in action.
echo "It's me." >> hello.txt
cat hello.txt
We can also check how many lines we have in this file using the command wc
.
wc hello.txt
In the previous line, what do the three numbers represent?
Did you remember to use tab auto-complete in the above commands?
stdout
to Another CommandWe’ve just seen how we can take stdout
and send it to a file, but we can also send it to another command.
This is one of more common things you will do in bash
and we can perform complex manipulations on the data within a file, without ever opening the file in the manner you would be most familiar with.
To send stdout
to another command, we simply use the pipe symbol (|).
As a simple example, we could take the output from a long ls
command and pipe it into less
for easier browsing.
The following command will usually give you more information than bash
will retain in a stdout
screen.
ls -R /usr/bin
If we wanted to keep all this for closer examination, we could send it to less
instead of having all of the information dumped on the screen.
ls -R /usr/bin | less
Page through this for a while, then when you’re bored quit using the q
key.
cp
)Copying a file using bash
is so easy it hardly requires any effort.
The command simply follows the syntax cp sourceFile destinationFile
.
By way of simple example
cp hello.txt hello.txt-copy
We don’t need these any more, so let’s be brave and delete both.
rm hello*
wget
)Later today, we’re going to look through a file containing a list of words.
Let’s download this from the internet and place it in our Bash_Workshop
folder, using the command wget
, which stands for web get
.
(Feel free to copy & paste the url to the file. It’s pretty long.)
cd ~/Bash_Workshop
wget https://uofabioinformaticshub.github.io/Intro-Bash-Sept-2017/files/american-english
mv
)The name for this file isn’t as convenient as we’d like, so we can use the command mv
to rename it as the file words
instead of american-english
.
Note that under bash
to rename a file, we move it to the same folder, but with a different file name.
This is a slightly unconventional way to think about renaming, but once you get used to it, it does make sense.
mv american-english words
We can look at the first 5 lines of the file using
head -n5 words
Or we can look at the last 10 lines of the file using
tail -n10 words
We could even page through the file using less
. (Remember to hit q
to exit the pager.)
less words
We can even find how many lines there are in the file by using
wc -l words
We’ll come back to this file in the next section when we explore regular expressions.
In bioinformatics, we often have to deal with compressed files as it is far easier to transmit large files across network connections if they are compressed. The most common types of compression you will see are:
File Suffix | Compression Command | Extraction Command | Useful Arguments |
---|---|---|---|
.zip | zip | unzip | -d, -c, -f |
.gz | gzip | gunzip zcat |
-d, -c, -f |
.tar.gz | tar | tar | -x, -v, -f, -z |
.bz2 | bzip2 | bunzip2 |
Often, files you download will be compressed (tar: tape archive) and archived (zipped).
If you see file name suffixes like .tar, .zip, .gz, and/or .bz2, among others, that is what these are.
To explore what these command-line options do, please check the man
or --help
pages.
A very useful trick is the use of the command zcat
.
This extracts a gzip
compressed file to stdout
leaving the original file unchanged.
From here we can pipe (|) into less
or head
to quickly look through a file before extracting.
Let’s try a new task. However, this time you’ll have to think of how to execute the commands yourself.
cd
command to make sure you are in the folder Bash_Workshop
wget
to download the compressed gff
file ftp://ftp.ensembl.org/pub/release-89/gff3/drosophila_melanogaster/Drosophila_melanogaster.BDGP6.89.gff3.gz
zcat
and head
along with the pipe symbol (|)gunzip
.
(Hint: After typing gunzip
, use tab auto-complete to add the file name.)dm6.gff
using the command mv
head
commandhead -n5
tail
less
wc -l
The file we have just downloaded begins with comment lines (starting with #
) then follows a column layout, with columns being separated by <tab>
markers.
As each line can be very long, they may wrap across more than one line.
Make sure you can spot this format though.
cut
)We can use the command cut
to cut one or more columns from this file.
Call up the man
page.
man cut
(And if you think that sounds violent, try man kill
or man killall
)
We introduced this earlier, and the options we’re interested in now are -f
and -s
.
Why is the -s
command relevant here?
The third column contains information about what type of feature is on each line. We can get just this column using the following.
cut -f3 -s dm6.gff
This will just dump all the information to stdout
.
In the next couple of sections we’ll sort then count these entries
sort
)This command is one of the most intuitive to use so requires little explanation.
Let’s pipe the out of the previous cut into the sort
command.
cut -f3 -s dm6.gff | sort
This has sorted all the type into alphabetic order
uniq
)Now we have these sorted we can use the command uniq
to just return a single entry for each feature type.
cut -f3 -s dm6.gff | sort | uniq
This has given us a simple list of the different feature types in this file.
We can also count these using the -c
option.
cut -f3 -s dm6.gff | sort | uniq -c
What would happen if we didn’t include the sort
step?
What does this tell us about how the uniq
command works?