An Ontology-Based Approach for Facilitating Information Retrieval from Disparate Sources: Patent System as an Exemplar Kincho H. Law Professor of Civil and Environmental Engineering Engineering Informatics Group Stanford University Collaborators: Jay P. Kesan , Professor , College of Law, UIUC Siddharth Taduri (Former Student), Stanford University Gloria Lau , Consulting Assoc. Professor, Stanford University Ontology Summit March 10, 2016 Ref: S. Taduri, Information Retrieval Across Multiple Information Sources Using Knowledge- Based Approach, Engineering Degree Thesis, Stanford University, March, 2012.
Motivation Patents: Can we obtain all relevant (validity, enforceability, and infringement) information related to patent(s) in a particular sector/category/market segment and analyze that information? In the patent context: What are the issued patents in a given space? What is the legal scope of protection for same/similar patents? Who are the competitors? Have any same/similar patents been challenged in court? Are there any relevant scientific literature, prior court decisions, laws and regulations that can potentially be used to challenge and to invalidate some patent claims? Focus: Biomedical Patents Other Similar Problems: integrating administrative agencies, courts, technical/scientific literature, and technical product literature in a host of law and science areas (Pharmaceuticals; Biofuels ;….)
Problem Statement Issued Patents and Applications File Wrappers Court Cases Technical Publications Regulations and Laws Patent Validity and Infringement/Enforcement Questions involves analysis of documents in various domains – Patents, USPTO File Wrappers, Court Documents, Scientific/Technical Publications, and Technical Product Literature Owned by disparate public (government) and private sectors The information is often available online, but siloed into several diverse information sources Today, the analysis is done manually and poorly by companies offering various patent research and strategy services
Use-Case: Erythropoietin (Repository) Synthetic production of the hormone has made it possible to treat diseases such as Anemia Core patents – U.S. Patents 5,621,080, 5,756,349, 5,955,422, 5,547,933, 5,618,698 135 directly related patents and over 3000 related publications Around 30 court cases, patent litigation involving major companies including Amgen, Hoechst Marion Roussel, Inc., Transkaryotic Therapies, Inc. Over 162,000 full-text scientific publications from 49 prominent journals in biomedicine from the TREC 2007 Genome Dataset (http://ir.ohsu.edu/genomics/2007protocol.html) Comprehensive domain knowledge available
Domain Terminology is Everywhere Excerpt from scientific publication Excerpt from U.S. Patent# 5,441,868 end-stage Regional variability in the incidence of renal disease : an epidemiological approach. Title: Production of recombinant erythropoietin … . end-stage Regional variability in the incidence of Abstract renal disease ( ESRD ) in Austria is reported. Our aim was … . low rates in the state of Tyrol . … . Disclosed are novel polypeptides possessing part or all ESRD incidence data were obtained from … . of the primary structural conformation and one or more … . of the biological properties of mammalian Between 1995 and 1999, 4811 new cases of ESRD were recorded; erythropoietin ("EPO") which are characterized in the state of Tyrol (T) … . incidence of ESRD patients with type 2 preferred forms by being the product of procaryotic or diabetes mellitus … . the difference in the overall ESRD eucaryotic host expression of an exogenous DNA incidence … . prevalence of DM , a highly significant correlation was sequence. Illustratively, genomic DNA, cDNA and Excerpt from court case – Amgen, Inc. v/s Chugai found between ESRD incidence and DM . manufactured DNA sequences coding for part or all of Pharm. … . the sequence of amino acid residues of EPO or for variability in the ESRD incidence in Austria is explained mainly by analogs thereof are incorporated into autonomously On June 30, 1987, the United States Patent and regional differences in DM-2 . Data from similar studies … . allocation replicating plasmid or viral vectors employed to Trademark Office (PTO) issued to Dr. Rodney Hewick for ESRD … . transform or transfect suitable procaryotic or U.S. Patent 4,677,195, entitled "Method for the … . eucaryotic host cells such as bacteria, yeast or Purification of Erythropoietin and Erythropoietin vertebrate cells in culture. Upon isolation from culture Compositions" (the '195 patent). The patent claims both media or cellular lysates or fragments, products of homogeneous EPO and compositions thereof and a expression of the … method for purifying human EPO using reverse phase high performance liquid chromatography . The method claims are not before us.
Problem Statement Knowledge Knowledge Issued Source 1: Source 2: Patents and Patent System Bio Ontology Applications File Ontology Wrappers Specific Technical Court Cases Domain Technical Regulations Publications and Laws Integration Sources are diverse in structure, formats, semantics and syntax How to retrieve patent information in a particular technological space? A knowledge-driven (Ontology-based) approach • Knowledge of scientific/technical domain • Knowledge of patent system domain
Why Ontology? An ontology is an explicit description of a domain: concepts properties and attributes of concepts constraints on properties and attributes An ontology defines a common vocabulary a shared understanding
Domain (Bio) Ontologies Bio Ontologies serve as standards for terminology in Bio-Medical (Science) domain (Ref: Bioportal.bioontology.org, accessed March 2012)
Using Concept Hierarchy to Determine Relevancy Bio Ontology Doc 1 … erythropoietin Hematopoietic … colony Growth Factor Use of super class stimulating factor … concept for relevancy Colony No direct similarity Stimulating Factor Erythropoietin EPO Doc 2 … EPO … growth factor … Direct term based matching cannot relate the two documents Bio-ontology reveals that EPO and erythropoietin are synonymous Class hierarchy provides concepts (such as colony simulating factor) useful for determining relevance between documents (with appropriate weighting scheme)
Expanded Query (with domain ontology) Origin inal al Term: m: Erythr hropoietin ietin Synonyms: Erythropoietin, Recombinant Erythropoietin, erythropoietin receptor binding, Hematopoietin, Recombinant EPO, Erythrocyte Colony Stimulating Factor, Epoetin, EPO … Children: Darbopoietin Alfa, Epoetin Alfa, Epoetin Beta … Parents: Colony Stimulating Factors, cytokine receptor binding, recombinant hematopoietic growth factors… Grand-Parents: hematopoietic growth factor, receptor binding, recombinant growth factor … An appropriate ranking function is applied to balance the more general terms. Heuristically, we assign a higher weight to synonyms, and a lower weight as we traverse away from the concept node Resulting Query: “original term” OR [ synonyms ]^weight OR [ children ]^weight OR ….
Patent System Ontology (patent documents, court cases, file wrappers) Competency Questions Patent Domain: • Return all patent documents which contain the phrase ‘recombinant erythropoietin receptor’ in the claims • Return all the patent documents which contain the phrase ‘recombinant erythropoietin receptor’ , at least 3 claims, issued before 02-02-1999 and assigned to Genetics Inc. Court Case Domain: • Return all court cases which contain the term – ‘erythropoietin’ • Return all court cases which involve the company Amgen Inc. either as the plaintiff or defendant, and from the District Court of Massachusetts Multi-domain: • Return all patents which contain the term – ‘erythropoietin’ in their claims, which are involved in at least one court litigation. • Return all court cases with the term ‘erythropoietin’ . From these court cases, return the patents involved. From these patents, follow the backward and forward citations to identify more important patents.
Patents Documents Around 8+ million U.S. patents (2.2 million in force today) In 2009, 485,312 patent applications were filed Information is contained in various sections of the documents; a full-text search alone is not sufficient – - other metrics such as classification, citations etc... need to be considered Documents are available in HTML Format and can be easily parsed
Patent System Ontology Conceptual View of Patent Documents
Recommend
More recommend