The Linux operating system. •Many 'flavors' of Linux (Ubuntu, fedora, CentOS,
openSUSE, .... easyqsub.pl -a "bowtie -q -n 2 -S $index $reads > $samp.sam".
Linux and RNA-Seq read alignment Brian J. Knaus USDA Forest Service Pacific Northwest Research Station
1
Outline •Intro to Linux •Reference types •Read filtering •Short read alignment
2
The Linux operating system •Many ‘flavors’ of Linux (Ubuntu, fedora, CentOS, openSUSE, Slackware). •Frequently includes a GUI (Gnome, KDE). •Strength is in the shell, a programmer’s OS. •Permissions. •Multiple shells (bash, tcsh, ksh). •Text editors (gedit, vi, emacs). •Finding help.
3
Interacting with a server (PC options) Putty: http://www.chiark.greenend.org.uk/~sgtatham/putty/ Xming: http://www.straightrunning.com/XmingNotes/
Shell commands ls ls –lh cd ~ cd .. pwd mv cp mkdir df rm rmdir rm –rf # Will delete everything without asking. cat filename.txt head filename.txt less filename.txt gedit filename.txt & top chmod u+x filename.txt tar –xvzf file.tar.gz (Google ‘linux cheat sheet’)
Shell commands Tab completion history
Finding help with Linux $ man command $ info command Google ‘Linux what you need help on’. O’reilly books (http://oreilly.com/).
7
Reference types •From a genome project (model organisms). •De novo or from cDNA. Are all isoforms present? How will exon skipping affect inference of regulation?
8
What’s in a name? •Bowtie truncates reference names at spaces. •Some characters don’t mix well with the sequence ontologies. http://www.sequenceontology.org/resources/gff3.html
Note the difference between sequence ontology and gene ontology. http://www.geneontology.org/
SAM file format @HD VN:1.0 SO:sorted @PG TopHat VN:1.0.13 CL:/local/cluster/bin/tophat -p 4 --solexa1.3-quals ../indexes/psme_ref ../psme_seqs.fq ILLUMINA-3AB384_0001:6:24:19059:8781#GATT 0 0_54_255 1 255 80M * 0 0 TCTTCTTCATGTTTGGCACGTGTATTCGGGCCTACTTCGCCTTTCCTTCACAGTAGGCGCCTTATCATTATTGGTCAGTT CCCCCCCCCCCCCCCCDCCCCCCCC@CBCBBCCBCCCCCCCCCCCCCCCCCCCDCD@C@CCCC4=CCBCCCCAC>B>BBC NM:i:1 HWI-EAS121_0024_FC61F8DAAXX:7:101:7452:15154#CTGT 0 0_54_255 17 255 76M * 0 0 CACGTGTATTCGGGCCTACTTCGCCTTTCCTTCACAGTAGGCGCCTTGTCATTATTGGTCAGTTATGACCTTAATT GGGGGGGGGGFEGFFGFEEFFBEECEFFFFFGGDGFDDGE:FBBFEGFFD?DEDEFB=DDD=ECCC=EAACDEDC= NM:i:0 @header line1 – file format version @header line2 – program which created the file 1 Query (read) name 2 flag 3 Reference name 4 Leftmost mapping position 5 Mapping quality 6 CIGAR string 7 Reference name of mate 8 Position of the mate 9 Template length 10 Fragment sequence 11 Fragment quality