tophat install and usage note

###install tophat

I got our RNA-seq all data a couple of days ago, after playing with python keggrest package grabing KEGG anotaion data some while, I decided to try installing tophat on my PC and run mapping with real project data.

Generally just follow the instructions on tophat’s offical page is enough. Some points are listed below:

  • bowtie. Download and unpack the latest bowtie2, copy it to somewhere you wish, and most importanly add ‘bowtie2’ ‘bowtie2-build’ and ‘bowtie2-inspect’ to your $PATH like this: export PATH=$PATH:/home/popcui/bio/bowtie* A better way to do this is echo previous command to ‘~/.bashrc’

  • You can either download the precompiled binary release which is more convinent but mybe a little bit slow, or you build tophat from source. And before that, you need to install SAM tools

  • SAM tools. First download SAM tools, then make it(you may want to check the INSTALL file for more info) You can add make razip option note: do not forget to add copy SAM tools misc bcftools to a path you wish and add it to your $PATH.

Now if everything is OK, download the test data and fire up tophat on it to make sure all work as expected.

###prepare your reference

If you have bowtie2 installed and want to use it with tophat2, you must create Bowtie2 indexes for your data using Bowtie2-build. For me, as I already have ref.gtf and ref.fa in hand, all I need to do is bowtie2-build ./rna-data/ref.fa ./bt2/ref.fa Reference genome .fa file comes first, and the indexes I want to build will be saved to ‘./bt2/ref.fa’

Now we have everything ready, let’s begin mapping right away.tophat2 -p 4 -G ./rna-data/ref.gtf -o col_th ./bt2/ref.fa ./rna-data/Col_0.clean.fq ‘-p’ specify the number of threads you want to use, since I have an i3-380 cpu which support virtual threading, I use 4 cpus.’-G’ stands for genes and transcripts anotation. ‘-o’ obviously imply the path you want to have output files saved, then comes reference genome file and input clean data.