The Linux operating system. •Many 'flavors' of Linux (Ubuntu, fedora, CentOS,
openSUSE, .... -a "bowtie -q -n 2 -S $index $reads > $samp.sam".
Linux and RNA-Seq read alignment Brian J. Knaus USDA Forest Service Pacific Northwest Research Station
Outline •Intro to Linux •Reference types •Read filtering •Short read alignment
The Linux operating system •Many ‘flavors’ of Linux (Ubuntu, fedora, CentOS, openSUSE, Slackware). •Frequently includes a GUI (Gnome, KDE). •Strength is in the shell, a programmer’s OS. •Permissions. •Multiple shells (bash, tcsh, ksh). •Text editors (gedit, vi, emacs). •Finding help.
Interacting with a server (PC options) Putty: Xming:
Shell commands ls ls –lh cd ~ cd .. pwd mv cp mkdir df rm rmdir rm –rf # Will delete everything without asking. cat filename.txt head filename.txt less filename.txt gedit filename.txt & top chmod u+x filename.txt tar –xvzf file.tar.gz (Google ‘linux cheat sheet’)
Shell commands Tab completion history
Finding help with Linux $ man command $ info command Google ‘Linux what you need help on’. O’reilly books (
Reference types •From a genome project (model organisms). •De novo or from cDNA. Are all isoforms present? How will exon skipping affect inference of regulation?
What’s in a name? •Bowtie truncates reference names at spaces. •Some characters don’t mix well with the sequence ontologies.
Note the difference between sequence ontology and gene ontology.
SAM file format @HD VN:1.0 SO:sorted @PG TopHat VN:1.0.13 CL:/local/cluster/bin/tophat -p 4 --solexa1.3-quals ../indexes/psme_ref ../psme_seqs.fq ILLUMINA-3AB384_0001:6:24:19059:8781#GATT 0 0_54_255 1 255 80M * 0 0 TCTTCTTCATGTTTGGCACGTGTATTCGGGCCTACTTCGCCTTTCCTTCACAGTAGGCGCCTTATCATTATTGGTCAGTT CCCCCCCCCCCCCCCCDCCCCCCCC@CBCBBCCBCCCCCCCCCCCCCCCCCCCDCD@C@CCCC4=CCBCCCCAC>B>BBC NM:i:1 HWI-EAS121_0024_FC61F8DAAXX:7:101:7452:15154#CTGT 0 0_54_255 17 255 76M * 0 0 CACGTGTATTCGGGCCTACTTCGCCTTTCCTTCACAGTAGGCGCCTTGTCATTATTGGTCAGTTATGACCTTAATT GGGGGGGGGGFEGFFGFEEFFBEECEFFFFFGGDGFDDGE:FBBFEGFFD?DEDEFB=DDD=ECCC=EAACDEDC= NM:i:0 @header line1 – file format version @header line2 – program which created the file 1 Query (read) name 2 flag 3 Reference name 4 Leftmost mapping position 5 Mapping quality 6 CIGAR string 7 Reference name of mate 8 Position of the mate 9 Template length 10 Fragment sequence 11 Fragment quality