Bioinformatics Outline ● What is bioinformatics? – Who are bioinformaticians? ● Hardware ● Software
What is bioinformatics?
What is bioinformatics? Someone to analyze my data The boring stuff I do Someone to help me between experiments People sitting in a dark think about my data A p e r c s o o m n room analyzing data p w l h e o x a w l r g i t o e perl python R linux java r s i t h m s C++ bash ruby HTML s w o n k o h w n o s r e p s A i M That bloke who fixes M Someone who builds H n a t a h w websites my computer
Who are bioinformaticians? ● Scientists trying to get tenure, get grants, publish papers, train students ● Scientists trying to help others analyze their data
Who are bioinformaticians? YOU!
Hardware
Torrent Server Recommended ● Torrent Server – Processors - Two Six-core processors – RAM - 48 GB RAM – HDD Capacity - Eight 2 TB Hard drives in RAID 5 with 12 TB usable – Network – Quad port gigabit NIC – GPU - NVIDIA Graphic Processor Unit – Chassis – Dell Precision T7500 tower. No rack mount available. – Monitor⁄Keyboard – not included – fjle access available via SSH or web service $12,500
Computers ● My cluster – 51 node cluster – most nodes: 16 cpus, 8 cores each,132 GB RAM, 1TB local storage (/usr/data), infjniband interconnects – (6,528 cores; 6,732 GB RAM; 50 TB scratch storage) ● 192 TB lustre FS – connected to most nodes via infjniband
Computers ● rambox – 24 processors with 6 cores each – 198 MB RAM ● edwards.sdsu.edu – lab web server – 24 processors, 6 cores each – 50M RAM – 19TB RAID 6 storage – 18TB USED
Computers ● fjle servers and back up servers – 4 secret servers! – 48TB backups and archival storage
Software
Software ● Locally installed software ● Remote (web) software
Local Software Muscle bioperl groopm ● ● ● PEAR biopython idba_ud ● ● ● phylip ● bowtie2 jellyfjsh ● ● prinseq ● cdhit jellyfjsh ● ● qiime ● crass last ● ● qudaich ● diamond masurca ● ● rapsearch ● fastQC mauve ● ● scafgold_builder ● focus metabat ● ● seed-servers ● FOCUS metagenemark spades ● ● ● FragGeneScan mira tagcleaner ● ● ● tRNAscan-SE genemark MUMmer ● ● ● velvet ●
Metagenomics Processing F s u d n P a c e r t r i e o d p n n a r e l o - A d c s e s e r i i g a s n p s m i e n e g n r g e t s M Taxonomic assignments Contamination removal C n o o n i t t c B i g i d i C n e l r u n P s i t e n e n r g e i n G g r e a d s
Metagenomics ● Quality control – ● Statistics Prinseq – STAMP ● Deconseq ● Population genomes ● Annotation – crAss – FOCUS – metabat – ContigClustering – Real time metagenomics – mg-rast – Super FOCUS
Metagenomics Processing Contig clustering Preprocessing Gene Prediction FragGeneScan AbundanceBin FASTQC GlimmerMG FastX Toolkit CompostBin MetaGeneAnnotator fjtGCP concoct MetaGeneMark crAss NGS QC Toolkit MetaGun Non-pareil tetra Orphelia Prinseq Prodigal QC-Chain Streaming Trim Taxonomic assignment Functional assignment CLAMS Sequedex CARMA myTaxa DiScRIBinATE SORT-ITEMS FOCUS PhylopythiaS genometa SPANNER KRAKEN phymmbl GSMer SPHINX LMAT RAIphy PPLACER TaxSOM MEGAN TACOA RTMg Treephyler Metaplan Taxy
Recommend
More recommend