Linux for Biology DEDAN GITHAE, BIOINFORMATICIAN BECA-ILRI HUB

Impo Importanc nce o e of c f comput mputer ers t s to bi biology û Availability of vast research data shared online. û Automated analysis leading to generation of massive data û Interaction with other research communities and shared databases û Speed and efficiency in processing, storage and data mining

BIG BIG Da Data: V : Volume me, V , Vari riety ty, V , Velocity ty & & Ve Veracity Volume: ◦ More content already generated and ◦ is available over open access ◦ More content being generated per run ◦ as a result of technology advancement ◦ Costs cheaper over time

Velocity: ◦ Technology making data generation faster and higher efficiency Variety ◦ Sequences, annotation, structures, image processing Veracity ◦ Some ambiguities, Inconsistencies, incomplete, model approximations

Ot Other er computational task sks: s: Analysi sis s and interp erpretation Biology activities: ◦ Prediction – functional and structural ◦ Pattern recognition: Domains, homology ◦ Sequence alignments ◦ Statistical analysis ◦ Structural modelling ◦ Genetic diversity and interactions between organisms, between populations

Lin Linux

Wha hat i is s lin linux a family ◦ of free and open-source software ◦ operating system ◦ distributions built around the Linux kernel.

Wha hat i is s lin linux a family Ubuntu? Fedora? Mint? Debian? openSUSE? ◦ of free anyone is freely licensed to use, copy, study, and change the software in any way ◦ and open-source software the source code is openly shared so that people are encouraged to voluntarily improve the design of the software ◦ operating system system software that manages computer hardware and software resources and provides common services for computer programs. ◦ distributions built around the Linux kernel. part of the operating system that mediates access to system resources eg input/output requests from software, translating them into data-processing instructions for the central processing unit

Ke Kernel

Som Some ap applic lication ions t to b o biologic iological t al tas asks Repetitive tasks – processing several sequences Automating analysis processes – scripts / piping to programs Text processing Regex; grep; sed; ◦ extracting fields using cut / awk ◦ We’ll see more of this on the tutorial

Th The I ILRI RI H High gh P Perfor ormance Com Computing (H g (HPC) Cl C) Cluster

Th The I ILRI RI H High gh P Perfor ormance Com Computing (H g (HPC) Cl C) Cluster users log into HPC (the master) To log in: ssh userX@hpc.ilri.cgiar.org then “jump”to the rest of the cluster (computing servers). To do this, type interactive

Soft Softwar ares: To know whether a software, and version you need to use is installed, type module avail To use a software, eg BLAST, type module load blast To see what softwares are ready for use (loaded), type module list

SL SLURM: M: Si Simple Linux Utility for r Reso source ce Ma Managem emen ent Interactive jobs have a time limit of 8 hours. if you are running a longer job, write a batch script to schedule it. How do we write scripts?

Writing a Slurm script ◦ Available options, type sbatch –u [ man sbatch for detailed explanation of usage ]

Ex Exampl ple of a ba batch h scri ript #!/usr/bin/env bash #SBATCH -p batch #SBATCH -J blastn #SBATCH -n 4 # load the blast module module load blast/2.6.0+ # run the blast with 4 CPU threads (cores) blastn -query ~/data/sequences/drosoph_14_sequences.seq -db nt To Run the script, type sbatch [ scriptname.sbatch ]

Be Best practice; overview Run the job on the computing node interactive Make a directory in the scratch space; and “go” there mkdir –p /var/scratch/userX ; cd $_ Create the script Run the script sbatch [scriptname.sbatch]

Enjoy!

Linux for Biology DEDAN GITHAE, BIOINFORMATICIAN BECA-ILRI HUB - PowerPoint PPT Presentation

Linux for Biology DEDAN GITHAE, BIOINFORMATICIAN BECA-ILRI HUB Impo Importanc nce o e of c f comput mputer ers t s to bi biology Availability of vast research data shared online. Automated analysis leading to generation of massive

Linux Kung Fu Introduction What is Linux? Why Linux? What is the difference between a client

connections between cs and biology computing science and biology (1) biology is the science

Linux from Sensors to Servers ! When is Linux Not Linux? ! 1 1 Linux runs across a huge range

Computing using Linux: The Good and the Bad Christoph Lameter HPC and Linux Most of the

Linux Overview Amir Hossein Payberah payberah@gmail.com 1 Agenda Linux Overview Linux

Interviews with Different Users About Their Linux Setups Steven Ovadia My Linux Rig

Introduction to Linux Fundamentals of Computer Science Outline Operating Systems Linux

Introduction to Linux Aline Abler Aline Abler Linux, whats that? The pieces of a Linux

GNU/Linux Why use it? What is Linux? Linux is a UNIX-like, GPL-licensed open-source kernel. The

Basics of Molecular biology Molecular biology is the study of biology at molecular level.

Linux You Can Drive My Car Embedded Linux Conference

2019-20 DNA Biology New Products RNA Biology PROTEIN Biology MOLECULAR Biology Plant DNA

Linux For Beginners April 26, 2016 Dualboot Linux and Windows Dualboot Linux and Windows

Introduction to Fetal Medicine: Genetics and Embryology Question: What do cancer biology,

Linux Kung Fu Stephen James UBNetDef, Spring 2017 Introduction What is Linux? What is the

AOS Linux Tutorial Introduction to Linux Michael Havas Dept. of Atmospheric and Oceanic Sciences

The State of the Linux Desktop An OSDL Perspective John Cherry OSDL Desktop Linux (DTL)

Linux Basics 2 Pre-Lab Everyone installed Linux on their

Power Your Car with Automo0ve Grade Linux Automo&ve Linux

Mostafa Z. Ali Mostafa Z. Ali mzali@just.edu.jo 1 1 The Linux Utilities Linux did not

Overview of Exam Statistics Paper Biology CS(Bio) Mean : 21 out of 35 Mean : 12 out of 24 1A (MC)

Virtual Memory and Linux Alan Ott Embedded Linux Conference April 4-6, 2016 About the Presenter

RCU Usage in Linux History of Concurrency in Linux Multiprocessor support 15 years ago - via

What is GNU/Linux? GNU/Linux is a free operating system Other operating systems: