Programming in Python Lecture 8: Python Online Michael Schroeder - PowerPoint PPT Presentation

Programming in Python Lecture 8: Python Online Michael Schroeder Melissa Adasme 1

Motivation: Access to Web Resources Wildcards possible? Can I filter somewhere? Can I combine two different searches? In most cases NO, since Web GUIs are simplified access points to the data!

Solution: Programmatic Access (Use programmatic access via power user gateways) 1 https://www.ebi.ac.uk/chembl/api/data/molecule?molecule_properties__mw_f reebase__lte=300&pref_name__iendswith=nib 3 2 Example Query (URL) 1 Filtering by selected properties 2 Combination of different criteria 3 Wildcards / Search for substrings Schema of ChEMBL data https://www.ebi.ac.uk/chembl/api/data/molecule/schema

HTTP/REST • HTTP (Hypertext Transfer Protocol) is a protocol/architecture for the internet • specifies how data can be transferred between machines in a network • defines several methods, e.g. GET and POST, DELETE • REST (Representational State Transfer) describes how the architecture of HTTP can/should be used as a uniform interface • REST or REST-like structures available in many web services APIs • Usually defined by URL (web address) and HTTP method (action on that address) http://biowebsitexyz.com/pug/proteins GET List all proteins POST Create new protein entry (with data sent to server) Data is sent separately here, server creates new URL http://biowebsitexyz.com/pug/proteins/p21 GET Get the data for protein 21 DELETE Delete entry for protein 21 on the server

Where can I use it? Non-biologial databases and services etc. Biological databases and services • Uniprot (Sequences) • ENRICHR (Ontology Enrichment) • PubMed (Literature) • PubChem, ChEMBL (chemical structures) • PDB (Structures) • etc.

Constructing Queries http://biowebsitexyz.com/pug/proteins Just the base URL for service GET List all proteins http://biowebsitexyz.com/pug/proteins? Simple filter num_aa_gte=100 GET List all proteins with more than 100 amino acids http://biowebsitexyz.com/pug/proteins? num_aa_gte=100&organism=homo_sapiens Multiple criteria GET List all human proteins with more than 100 amino acids We will focus on GET queries since you mostly will need to just read data from servers

Revision: XML files <Article> <Journal> <ISSN> 0270-7306 </ISSN> <JournalIssue> <Volume> 19 </Volume> ■ We can store any data in XML, <Issue> 11 </Issue> <PubDate> the eXtensible Mark-up <Year> 1999 </Year> Language, e.g. Medline <Month> Nov </Month> </PubDate> ■ Logical data organisation: yes, </JournalIssue> XML schema, which is enforced </Journal> ■ Physical data organisation: None , <ArticleTitle> Differential regulation of the cell wall integrity we cannot optimise retrieval for mitogen-activated protein kinase pathway in budding yeast by the protein tyrosine phosphatases Ptp2 and common queries Ptp3. ■ Hierarchical organization </ArticleTitle> ■ Commonly used as an exchange <Pagination> <MedlinePgn> 7651-60 </MedlinePgn> format for data </Pagination> <Abstract> <AbstractText> Mitogen-activated protein kinases (MAPKs) are inactivated by dual-specificity and protein tyrosine phosphatases (PTPs) in yeasts. In Saccharomyces cerevisiae, two PTPs, Ptp2 and Ptp3, inactivate the MAPKs, Hog1 and Fus3, with different specificities... </AbstractText> </Abstract> <Affiliation> Department of Chemistry, University of Colorado, Boulder, Colorado 80309-0215, USA. </Affiliation> … See also lecture 2

Application I: What‘s the most recent article from the Schroeder group? https://www.ncbi.nlm.nih.gov/pubmed https://www.ncbi.nlm.nih.gov/home/develop/api/

Application I: What‘s the most recent article from the Schroeder group? 1 First we run the main query to obtain all articles from the group (with the author name Michael Schroeder) https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi? db=pubmed&term=Michael+Schroeder%5Bauthor%5D Documentation at https://www.ncbi.nlm.nih.gov/pmc/tools/developers/

Application I: What‘s the most recent article from the Schroeder group? 1 First we run the main query to obtain all articles from the group (with the author name Michael Schroeder) https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi? db=pubmed&term=Michael+Schroeder%5Bauthor%5D ID of the last article published! Documentation at https://www.ncbi.nlm.nih.gov/pmc/tools/developers/

Application I: What‘s the most recent article from the Schroeder group? 2 Then, using the article ID we get the details for it https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi? db=pubmed&id=31811259&format=xml

Application I: What‘s the most recent article from the Schroeder group? 2 Then, using the article ID we get the details for it https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi? db=pubmed&id=31811259&format=xml Title

Application II: ChEMBL Find compounds with desired properties 1 https://www.ebi.ac.uk/chembl https://chembl.gitbook.io/chembl-interface-documentation/web-services/chembl-data-web-services 2 Not the same for all web services!!

Application II: ChEMBL Find compounds with desired properties 1 Let‘s find compounds ending with rin with a MW between 150 and 200

Application II: ChEMBL Find compounds with desired properties 1 Let‘s find compounds ending with rin with a MW between 150 and 200 https://www.ebi.ac.uk/chembl/api/data/molecule? molecule_properties__mw_freebase__gte=150& molecule_properties__mw_freebase__lte=200& pref_name__iendswith=rin Aspirin!!

Application II: ChEMBL Find compounds with desired properties 1 Let‘s find compounds ending with rin with a MW between 150 and 200 : https://www.ebi.ac.uk/chembl/api/data/molecule? molecule_properties__mw_freebase__gte=150& molecule_properties__mw_freebase__lte=200& pref_name__iendswith=rin CC(=O)Oc1ccccc1C(=O)O Canonical SMILES

Application II: ChEMBL Find compounds with desired properties 2 Let‘s find another molecule with aspirin as a substructure: https://www.ebi.ac.uk/chembl/api/data/substructure/CC(=O)Oc1ccccc1C(=O)O (XML result data not shown) Aspirin CC(=O)Oc1ccccc1C(=O)O Documentation at https://www.ebi.ac.uk/chembl/ws

Application II: ChEMBL Find compounds with desired properties 2 Let‘s find another molecule with aspirin as a substructure: https://www.ebi.ac.uk/chembl/api/data/substructure/CC(=O)Oc1ccccc1C(=O)O (XML result data not shown) Aspirin Second hit (CHEMBL7666) CC(=O)Oc1ccccc1C(=O)O Documentation at https://www.ebi.ac.uk/chembl/ws

Important Information With great power comes great responsibility! • Read the document of each service you are using • Sometimes you will need keys to have access • Don‘t send too many requests to the server (you could crash it or be blocked) • some services don‘t allow parallel requests USAGE POLICY: Please note that PUG REST is not designed for very large volumes (millions) of requests. We ask that any script or application not make more than 5 requests per second, in order to avoid overloading the PubChem servers. If you have a large data set that you need to compute with, please contact us for help on optimizing your task, as there are likely more efficient ways to approach such bulk queries. https://pubchem.ncbi.nlm.nih.gov/pug_rest/PUG_REST.html

Web Resources in Python Part I: Choosing your tools • urllib library for fetching web resources • lxml for parsing XML result files Simple example: Extract all authors for a paper From urllib.request import urlopen #module to open the url from lxml import etree #module to read xml files baseurl = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?" query = "db=pubmed&id=27626687&format=xml“ url = baseurl+query f = urlopen(url) #opens the url with urlopen module resultxml = f.read() #reads the url content xml = etree.XML(resultxml) #parses the content into xml format resultelements = xml.xpath("//LastName") #search for all tags with given xpath for element in resultelements print ([element.text])

Web Resources in Python Part I: Choosing your tools • urllib library for fetching web resources • lxml for parsing XML result files Simple example: Extract all authors for a paper From urllib.request import urlopen Import the libraries from lxml import etree baseurl = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?" query = "db=pubmed&id=27626687&format=xml“ url = baseurl+query f = urlopen(url) #opens the url with urlopen module resultxml = f.read() #reads the url content xml = etree.XML(resultxml) #parses the content into xml format resultelements = xml.xpath("//LastName") #search for all tags with given xpath for element in resultelements print ([element.text])

Programming in Python Lecture 8: Python Online Michael Schroeder - PowerPoint PPT Presentation

Programming in Python Lecture 8: Python Online Michael Schroeder Melissa Adasme 1 Motivation: Access to Web Resources Wildcards possible? Can I filter somewhere? Can I combine two different searches? In most cases NO, since Web GUIs are

HPC Python Programming Ramses van Zon July 10, 2019 Ramses van Zon HPC Python Programming July

First Tool: Python! Introduction to python programming Gholamhossein Tavasoli @ ZNU First Tool:

Python for Data Science Overview of Python Why Python Installing Python Installing Python Modules

Python 1 Python Python is high-level programming language for general-purpose programming.

Python Tidbits Python created by that guy ---> Python is named after Monty Pythons

+ f(x) = Python Functional Programming Python Functional Programming Functional Programming by

Looping through Python data structures Justin Kiggins Product Manager DataCamp Python for

Numerical Python Hans Petter Langtangen Intro to Python programming Simula Research Laboratory

Intro to Python programming Dept. of Informatics, Univ. of Oslo May 2010 Numerical Python

Getting Started with Python The Python Interpreter A piece of software that executes

Introduction to Functional Programming in Python David Jones drj@ravenbrook.com Python and

Python Programming: An Introduction to Computer Science Chapter 7 Decision Structures Python

We already know Java. Why learn Python? Using Python to Implement Algorithms Python has far less

= Introduction to Computer Programming Python Basics CSCI-UA 2 High-level programming

C Extensions for Python Were all here because we like Python, the programming language. Today

Scientific Programming in mpags-python.github.io Steven Bamford An introduction to scientific

Superintendent Kankakee School District 111 Woman in the Mirror Mindset, Purpose and Courage to

Promoting High Quality Chemotherapy and Biotherapy Delivery Through a Standardized Provincial

Cost-benefit analysis Tyler Moore CSE 5/7338 Computer Science & Engineering Department, SMU,

Optimal filter and Cost-Benefit Analysis Tyler Moore CSE 7338 Computer Science & Engineering

DOLORS: Versatile Strategy for Internal Labeling and Domain Localization in Electron Microscopy

Momoko Tajiri MICHIGAN TECH Momoko Taji jiri , mtajiri@mtu.edu RESEARCH FORUM TECHTALKS

The spatial sub-cellular proteome Methods and considerations Kathryn Lilley Cambridge Centre for

Mechanism of Flow Diverter Healing Matthew Gounis, PhD Professor, Department of Radiology New

Programming in Python Lecture 8: Python Online Michael Schroeder - PowerPoint PPT Presentation

Programming in Python Lecture 8: Python Online Michael Schroeder Melissa Adasme 1 Motivation: Access to Web Resources Wildcards possible? Can I filter somewhere? Can I combine two different searches? In most cases NO, since Web GUIs are

HPC Python Programming Ramses van Zon July 10, 2019 Ramses van Zon HPC Python Programming July

First Tool: Python! Introduction to python programming Gholamhossein Tavasoli @ ZNU First Tool:

Python for Data Science Overview of Python Why Python Installing Python Installing Python Modules

Python 1 Python Python is high-level programming language for general-purpose programming.

Python Tidbits Python created by that guy ---&gt; Python is named after Monty Pythons

+ f(x) = Python Functional Programming Python Functional Programming Functional Programming by

Looping through Python data structures Justin Kiggins Product Manager DataCamp Python for

Numerical Python Hans Petter Langtangen Intro to Python programming Simula Research Laboratory

Intro to Python programming Dept. of Informatics, Univ. of Oslo May 2010 Numerical Python

Getting Started with Python The Python Interpreter A piece of software that executes

Introduction to Functional Programming in Python David Jones drj@ravenbrook.com Python and

Python Programming: An Introduction to Computer Science Chapter 7 Decision Structures Python

We already know Java. Why learn Python? Using Python to Implement Algorithms Python has far less

= Introduction to Computer Programming Python Basics CSCI-UA 2 High-level programming

C Extensions for Python Were all here because we like Python, the programming language. Today

Scientific Programming in mpags-python.github.io Steven Bamford An introduction to scientific

Superintendent Kankakee School District 111 Woman in the Mirror Mindset, Purpose and Courage to

Promoting High Quality Chemotherapy and Biotherapy Delivery Through a Standardized Provincial

Cost-benefit analysis Tyler Moore CSE 5/7338 Computer Science &amp; Engineering Department, SMU,

Optimal filter and Cost-Benefit Analysis Tyler Moore CSE 7338 Computer Science &amp; Engineering

DOLORS: Versatile Strategy for Internal Labeling and Domain Localization in Electron Microscopy

Momoko Tajiri MICHIGAN TECH Momoko Taji jiri , mtajiri@mtu.edu RESEARCH FORUM TECHTALKS

The spatial sub-cellular proteome Methods and considerations Kathryn Lilley Cambridge Centre for

Mechanism of Flow Diverter Healing Matthew Gounis, PhD Professor, Department of Radiology New

Python Tidbits Python created by that guy ---> Python is named after Monty Pythons

Cost-benefit analysis Tyler Moore CSE 5/7338 Computer Science & Engineering Department, SMU,

Optimal filter and Cost-Benefit Analysis Tyler Moore CSE 7338 Computer Science & Engineering