In this exercise you learn how to
vg construct
,vg view
,vg sim
,vg index
,vg map
.First you will need to activate the conda environment that we will be using.
conda activate variation
vg is a large collection of tools that is under very active development, to ensure we have access easily to the lastest versions of the software, we use the Docker. It is already available on the course workstations. If you want to bulid it on your laptop, follow the instructions at the vg homepage (note that building vg and all submodules from source can take ~1h).
To test if things are working smoothly, you can run:
docker run quay.io/vgteam/vg date
The command above will run the vg “container” if it already exist on your machine, update it if there’s a newer version, and download it if this is the first time you are to use the container.
In this exercise, you will use small toy examples from the test
directory. So make sure you have checked out vg:
cd data
git clone https://github.com/vgteam/vg.git
Now create a directory to work on for this tutorial:
mkdir exercise1
cp vg/test/tiny/tiny* exercise1/
cd exercise1
How do we give docker access to our files?
To simplify our lives, we are going to create an alias for docker’s call to vg:
alias vg="docker run -v $PWD:/data quay.io/vgteam/vg vg"
The above command replaces the long command with vg
. However, note that it binds always the local directory to /data
path inside the container.
Like many other toolkits, vg is comes with many different subcommands. First we will use vg construct
to build our first graph. Run it without parameters to get information on its usage:
vg construct
Let’s construct a graph from just one sequence in file tiny/tiny.fa
, which looks like this:
>x
CAAATAAGGCTTGGAAATTTTCTGGAGTTCTATTATATTCCAACTCTCTG
To construct a graph, run
vg construct -r tiny.fa -m 32 > tiny.ref.vg
This will create a (very boring) graph that just consists of a linear chain of nodes, each with 32 characters.
The switch -m
tells vg to put at most 32 characters into each graph node. (You might want to run it with different values and observe the different results.) To visualize a graph, you can use vg view
. Per default, vg view
will output a graph in GFA format. By adding -j
or -d
, you can generate JSON or DOT output.
vg view /data/tiny.ref.vg
vg view -d /data/tiny.ref.vg
To work with the JSON output the tool jq comes in handy. To get all sequences in the graph, for instance, try
# Install jq:
conda install -c conda-forge jq
vg view -j /data/tiny.ref.vg | jq '.node[].sequence'
Next, we use graphviz to layout the graph representation in DOT format.
# Install dot:
conda install -c anaconda graphviz
vg view -d /data/tiny.ref.vg | dot -Tpdf -o tiny.ref.pdf
View the PDF and compare it to the input sequence. Now vary the parameter passed to -m
of vg construct
and visualize the result.
Ok, let’s build a new graph that has some variants built into it. First, take a look at at tiny/tiny.vcf.gz
, which contains variants in (gzipped) VCF format.
vg construct -r /data/tiny.fa -v /data/tiny.vcf.gz -m 32 > tiny.vg
Visualize the outcome.
Ok, that’s nice, but you might wonder which sequence of nodes actually corresponds to the sequence (tiny.fa
) you started from? To keep track of that, vg adds a path to the graph. Let’s add this path to the visualization.
vg view -dp /data/tiny.vg | dot -Tpdf -o tiny.pdf
You find the output too crowded? Option -S
removes the sequence labels and only plots node IDs.
vg view -dpS /data/tiny.vg | dot -Tpdf -o tiny.pdf
Another tool that comes with the graphviz package is Neato. It creates force-directed layouts of a graph.
vg view -dpS /data/tiny.vg | neato -Tpdf -o tiny.pdf
For these small graphs, the difference is not that big, but for more involved cases, these layouts can be much easier to read.
A better solution for larger graphs is Bandage, which is designed for visualizing assembly graphs. It can’t display path information, but it can scale to multi-megabase graphs with ease. Adjust the --nodewidth
parameter (e.g. --nodewidth 100
, up to 1000) if you are rendering a large graph and the resulting image appears blank.
vg view /data/tiny.vg > x.gfa
Bandage image /data/x.gfa /data/x.gfa.png
You can also apply vg viz
to obtain a special kind of linear layout that scales well without losing path information.
It requires the xg
index, which we’ll build first.
vg index -x /data/tiny.xg /data/tiny.vg
vg viz -x /data/tiny.xg -o /data/tiny.svg
If the graph is very big, you’ll need to view tiny.svg in chrome or chromium web browser.