From Content Publishing to Data Solutions via Machine Learning - PowerPoint PPT Presentation

From Content Publishing to Data Solutions via Machine Learning Presentation to Los Angeles Machine Learning Meetup 2019-02-19 Bradley Allen, Chief Architect, Elsevier

Twenty-three years ago “ It’s hard to imagine a sweeter business than publishing academic journals. The editorial content is contributed free of charge by scholars desperate to publish to get tenure. School libraries are automatic customers—professors insist on it. ... Is the party over? It may be nearing its end. The Internet is closing in. ” - Forbes, December 18, 1995

Today: from content publishing to data solutions Read Search Do this this this Cell ScienceDirect Sherpath Fundamentals Scopus Mendeley Gray‘s Anatomy ClinicalKey Knovel Reaxys

Our five main customer segments Clinicians ‘You could use this treatment to save a life’ Researchers ‘This article answers your questions’ Governments ‘This is the research to invest in’ Pharmaceutical ‘This is the cancer treatment you should pursue’ companies Nursing students ‘This is the area you need to improve to qualify’

The challenges our customers face Global research spend is growing every year. 1 Researchers lack the tools they need to be effective. 2 Studies: Predictedspend 3.4 % $1.9TN 70-80% of research asksthe from2015 research in2016 wrong questions or cannot be reproduced Life-saving drugs are expensive to develop. 3 Health providers cannot save lives without the best information. 4 1 / 20 $2.5 BN Preventable medicalerrors: Third largest cause of death in theUS successrate medianpharmaceutical ofdrugs spend perdrug 611k 585k 225k 149k Heart Cancer Medical Respiratory Disease Error Illness 1. Industrial Research Institute 2. The Lancet 3. Tufts 4. World Health Organization

The assets we have at hand Content Technology Chemistry database 1,000 technologists employed by 500m published experimental facts Elsevier User queries Machine reading 13m monthly users on ScienceDirect 475m facts extracted from ScienceDirect Books Semantic Enhancement 35,000 published books Knowledge on 50m chemicals captured as 11B facts Machine learning Drug Database Over 1,000 predictive models trained on 1.5 100% of drug information from pharmaceutical billion electronic health care events companies updated daily Collaborative filtering: Research 1bn scientific articles added by 2.5m 16% of the world’s research data and researchers analyzed daily to generate over articles published by Elsevier 250m article recommendations

How we think about delivering data solutions If we don’t have data we need, Determine the acquire what we’re missing If we have question (including that data, use case and reuse it personae) Describe the data that If not, use the needs to be produced data we have to to address the question create it From Justin O’Beirne, “Google Maps’ Moat – How far ahead of Apple Maps is Google Maps?”, 2017-12. Retrieved from https://www.justinobeirne.com/google-maps-moat on 2018- 05-31.

Breaking it down into eight simple steps 1 • Market Definition : Determine target market personae & product 3 • Data & Query Specification : Describe features data schemas & features to support 2 use cases • Use Case Definition : Describe tasks performed by personae yielding use cases 5 • Data Enhancement : Extract entities, attributes & relations, map entities to 4 • Data Acquisition : Acquire content & ontologies & taxonomies data in multiple formats from multiple 8 6 • Knowledge Delivery : Deliver query sources • Data Linking : Link extracted entities & visualisation of data with other entities in existing enterprise data 7 • Knowledge Graph Construction : Store mapped & linked data for access & discovery

Knowledge graphs make it all hang together I really believe that the key battleground in any industry is that of its knowledge graph. Google has it for media/advertising, Netflix has it for filmed entertainment, Uber has it for inner city transportation, Facebook has it across social media as well as messaging and the multiples speak for themselves. Tony Askew, Founder/Partner at REV (personal communication, September 29, 2016)

The role that machine learning (ML) plays • Our goal is to drive business by enabling better outcomes through: − Delivery of timely, appropriate advice for decision making & problem solving − Enhanced discovery and query over massive amounts of information • We plan to achieve this by using ML to build knowledge graphs that enable the rapid development of data solutions − Implementing entity/object extraction, relation extraction, entity disambiguation, classification, and sentiment analysis − Based on the scientific & medical literature, experimental data, and the data exhaust associated with the practice of scientific communication & medical practice

Breaking down our ML efforts • Early wins − Deployed systems adding value to existing products and solutions • Roofshots − Task-specific use of ML to improve discoverability, knowledge delivery • Practicalities − Human-in-the-loop NLP pipelines augmented with ML components to scale entity and relation extraction, entity linking for knowledge graph construction • Moonshots − Use of multi-task learning architectures to develop a general-purpose approach to question answering from the scientific and medical literature and from experimental data

Early win: Recognizing decision graphs in medical content • Clinical Key is Elsevier’s flagship medical reference search product • Clinicians prefer “answers” in the form of tables or flowcharts − Eliminates need to page through retrieved content to find actionable information • Clinical Key provides a sidebar section displaying answers, but this feature depends on very labor-intensive manual curation • Solution: automatically classify images in medical content corpus at index time • Benefits: lower cost and improved user experience 12

Early win: Recognizing decision graphs in medical content • Perfect fit for transfer learning approach − Input to the classifier is a classifier image and output is one of 8 classes: Photo, Radiological, Data graphic, Illustration, Microscopy, Flowchart, Electrophoresis, Medical decision graph − Image dataset is augmented by producing variations of the training images by rotating, flipping, transposing, jittering, etc. − Reusing all but the last two Dense layers of a pre-trained model (VGG-CNN, available from Caffe’s “model zoo”) − VGG-CNN was trained on Imagenet (14 million images from the Web, 1000 general topic classes e.g., Cat, Airplane, House) − Last layer is a multinomial logistic regression (or softmax) classifier • Model trained on 10,167 images with a 70/30 train/test split • Achieves 93% test set accuracy − Evaluated image + caption text model but did not get a big performance boost • Searchable image base used to support training set and model development

Early win: Generating topic pages from scientific content Definition Take a ScienceDirect article Find occurrences of concepts Take a taxonomy Snippet(s)

Early win: Generating topic pages from scientific content

Roofshot: Extracting clinically useful relationships from medical content

Roofshot: Extracting clinically useful relationships from medical content Three clinical symptoms were considered to be highly suggestive of PE : recent dyspnoea , recent chest pain and unusual tachycardia >75/min. has Clinical Finding Pulmonary Dyspnea Embolism

Roofshot: Extracting clinically useful relationships from medical content CNN implemented in Keras Input: relationship labels and syntax paths linking relation arguments, semantic tagging Syntactic analysis with spaCy on Apache Spark Embed in 64-dimensional space(like Word2vec) Compute 1-dimensional convolution to learn path structure Semantic analysis using FPE annotation Perform final softmax activation to predict one of N relations

Roofshot: Extracting clinically useful relationships from medical content

Roofshot: Assistants for the interpretation of pathology imagery What we need What we have Images with their captions Annotated Raw Images Notice the multiple subependymal nodules in fig 3.

Roofshot: Assistants for the interpretation of pathology imagery

Practicality: Building continuous modeling and quality control into our deployment workflows

From Content Publishing to Data Solutions via Machine Learning - PowerPoint PPT Presentation

From Content Publishing to Data Solutions via Machine Learning Presentation to Los Angeles Machine Learning Meetup 2019-02-19 Bradley Allen, Chief Architect, Elsevier Twenty-three years ago Its hard to imagine a sweeter business than

Top Trends in Trade Publishing Jane Tappuni, Publishing Technology Chris McCrudden, Midas PR

DigiLabs Leading Photo Publishing Solutions DigiLabs Leading Photo Publishing Solutions who is it

A P A P A Proposal for Publishing Data A Proposal for Publishing Data l f l f P bli hi P bli

The centre Mersenne for Open Scientific Publishing An academic-led open access publishing

publishing your research Mischa Richter, The New Yorker publishing your research WHY? the

Onelight.com Publishing c 2010 World Population 2 Onelight.com Publishing c 2010 3

Agenda Move to Digital Publishing Author benefits Using the template Content and

Publishing ALICE data & CVMFS infrastructure monitoring Costin.Grigoras@cern.ch Publishing

ReDSS Durable Solutions Framework Understanding progress towards durable solutions CONTENT 1.

Vizor A platform for creating, publishing and discovering VR content on the Web. Jaakko

Learning about the Histories of Computerizing Publishing and Desktop Publishing, 20172019 See:

Are you ready to Publish? Understanding the publishing process Presenter: Aisling Murphy May,

MELCOM 2016 Market Situation - TURKEY Turkish Publishing Sector in 2015 The publishing

Publishing For and By Researchers But why Library Publishing? Sofie Wennstrm Analyst,

Internet publishing recommended software, encoding HTML, publishing on webserver Petr Zmostn

INFORMATION-PUBLISHING CENTER The information-publishing center Modern Building Constructions

HSPC Skin/Wound Project Susan Matney, PhD, RNC-OB, FAAN Lindy Buhl, MSN, RN Skin Assessment

ACQUISITION OF OWNERSHIP ACQUISITION OF OWNERSHIP Chapter 8 Chapter 8 Original v derivative

Aortic Dissection 16 th Annual Toronto Perioperative TEE Symposium 2018.11.10 Azad Mashari MD

First Aid Notes Introduction Aims of First Aid The Five Ps 1. Preserve life and limb of

Slides of Anatomy Please note: These slides are Dr. Maher Hadidis slides of spring 2016 and

10/10/19 Current Evidence for the I have no relevant financial relationships with any companies

Collecting Cancer Data: Breast NAACCR 2015-2016 Webinar Series 1 Q&A Please submit all

3/7/2018 18 th Multidisciplinary Management of Cancers: A Casebased Approach 18 th

Sambuz

Useful Links

Newsletter

Mail Us

From Content Publishing to Data Solutions via Machine Learning - PowerPoint PPT Presentation

From Content Publishing to Data Solutions via Machine Learning Presentation to Los Angeles Machine Learning Meetup 2019-02-19 Bradley Allen, Chief Architect, Elsevier Twenty-three years ago Its hard to imagine a sweeter business than

Top Trends in Trade Publishing Jane Tappuni, Publishing Technology Chris McCrudden, Midas PR

DigiLabs Leading Photo Publishing Solutions DigiLabs Leading Photo Publishing Solutions who is it

A P A P A Proposal for Publishing Data A Proposal for Publishing Data l f l f P bli hi P bli

The centre Mersenne for Open Scientific Publishing An academic-led open access publishing

publishing your research Mischa Richter, The New Yorker publishing your research WHY? the

Onelight.com Publishing c 2010 World Population 2 Onelight.com Publishing c 2010 3

Agenda Move to Digital Publishing Author benefits Using the template Content and

Publishing ALICE data &amp; CVMFS infrastructure monitoring Costin.Grigoras@cern.ch Publishing

ReDSS Durable Solutions Framework Understanding progress towards durable solutions CONTENT 1.

Vizor A platform for creating, publishing and discovering VR content on the Web. Jaakko

Learning about the Histories of Computerizing Publishing and Desktop Publishing, 20172019 See:

Are you ready to Publish? Understanding the publishing process Presenter: Aisling Murphy May,

MELCOM 2016 Market Situation - TURKEY Turkish Publishing Sector in 2015 The publishing

Publishing For and By Researchers But why Library Publishing? Sofie Wennstrm Analyst,

Internet publishing recommended software, encoding HTML, publishing on webserver Petr Zmostn

INFORMATION-PUBLISHING CENTER The information-publishing center Modern Building Constructions

HSPC Skin/Wound Project Susan Matney, PhD, RNC-OB, FAAN Lindy Buhl, MSN, RN Skin Assessment

ACQUISITION OF OWNERSHIP ACQUISITION OF OWNERSHIP Chapter 8 Chapter 8 Original v derivative

Aortic Dissection 16 th Annual Toronto Perioperative TEE Symposium 2018.11.10 Azad Mashari MD

First Aid Notes Introduction Aims of First Aid The Five Ps 1. Preserve life and limb of

Slides of Anatomy Please note: These slides are Dr. Maher Hadidis slides of spring 2016 and

10/10/19 Current Evidence for the I have no relevant financial relationships with any companies

Collecting Cancer Data: Breast NAACCR 2015-2016 Webinar Series 1 Q&amp;A Please submit all

3/7/2018 18 th Multidisciplinary Management of Cancers: A Casebased Approach 18 th

Sambuz

Useful Links

Newsletter

Mail Us

Publishing ALICE data & CVMFS infrastructure monitoring Costin.Grigoras@cern.ch Publishing

Collecting Cancer Data: Breast NAACCR 2015-2016 Webinar Series 1 Q&A Please submit all