Towards Employing Multilingual Term Resources for Intelligent Patents Search Galia Angelova and Irina Temnikova Institute of Information and Communication Technologies (IICT) Bulgarian Academy of Sciences
Presentation Structure ● Introduction\Motivations ● Related Work ● Our Approach 2
Work Motivations 3
Introduction/Motivations 1 1. 40 millions patents available electronically 2. Search mainly limited to: – keywords – boolean operators – proximity – truncation/wildcards 3. Manual query formulation (average of 5 minutes per query) 4
Introduction/Motivations 2 4. Up to 40 hours for 15 queries in 100 documents (average 12 hours) [ Joho, H., 2010] 5. Limited contribution of Natural Language Pro- cessing (NLP) 6. Multilingual search restricted to category numbers 5
Related work 6
Related Work 1 1. NLP for Patents – Various applications : – Patents language analysis [ Sheremetyeva, S., 2003; Lamirel, J.-Ch. et al., 2003; Hsin-hung Lin, D. et al., 2010] – Patents readability improvement [ Shinmori, A. et al., 2010 ] – Patents text generation [ Sheremetyeva, S. et al., 1996 ] – Text classification [ Nanba, H. el al., 2009] – Patents translation [ Waeschle, K., et al., 2011; Orsnes, B. et al., 1996; Choi, S.-K., et al., 2007 ] 7
Related Work 2 1. NLP for Patents - Patents search: – Document retrieval using Differential Latent Semantic Index and Template Matching Technique [ Chen, L.. et al., 2001 ] – Patents, publications and persons network detection [ Li, H., et al., 2011 ] – The role of spelling errors in patents search [ Stein, B., et al., 2012 ] – Query term distillation [ Itoh, H., et al., 2005 ] – Transforming a patent application into a search query [ Xue, X., et al., 2009 ] – Query expansion with synonyms using WordNet [ Magdy, W., et al., 2011] 8
Related Work 3 ● Use of Wikipedia in NLP – Text Simplification and Machine Translation [Coster, W. et al., 2011; Wubben, S., 2012] – Question Answering [Dornescu, I., 2012] – Word Sense Disambiguation in Wikipedia [Ratinov, L., 2011] – Untangling Wikipedia cross-lingual links [De Melo, G., et al., 2010] – Relation extraction [Yan, Y. et al., 2009] – Named Entity disambiguation using W. [Cucerzan, S., 2007] 9
Our Approach 10
Our Approach 1 ● Aims: – Improve patents search – Make possible multilingual search ● How: – By applying NLP techniques – With the help of a large external multilingual terminological resource (extracted from Wikipedia) 11
Our Approach 2 ● Expanding search by: – Annotation/indexing of patent applications – Adding term equivalents to the search query itself ● Types of terms added: – Synonyms – Paraphrases – Multilingual equivalents 12
Our Approach 3 ● Recognition of the correct sense of the term: – Patent classification labels + – Wikipedia title pages controlled by experts + – Wikipedia articles long texts 13
Our Approach 5 Patent terms disambiguation using Wikipedia 14
Our Approach 4 International Patent Classification (IPC) terms 15
Our Approach 5 The Wikipedia “Computer file” article 16
Our Approach 6 NLP term enrichment details: 1.Extraction of synonyms • textual markers “also known as” • Wikipedia “redirect” pages {{Redirect|term}} 2.Noun phrase paraphrases • “Date of birth”= “Birthdate” 17
Our Approach 7 NLP term enrichment details: 3.Multilingual equivalents En: Childbirth • French: accouchement, travail, naissance, parturition • Italian: parto 18
Our Approach 8 Multilingual indexation of patents terms. 19
Summary 1 ● Improving patents search with NLP techniques ● Using a large terminological controlled and regularly updated resource (Wikipedia) ● Challenges: term sense disambiguation – Solutions: ● Patents classification categories ● Wikipedia articles large texts 20
Summary 2 ● Patent applications annotation + search ex- pansion with: – Synonyms – NP paraphrases – Multilingual equivalents 21
Main References D. Hunt, D., L. Nguyen, and M. Rodgers (Eds.). Patent searching: tools and techniques. Wiley, 2007. Joho, H., L. A. Azzopardi and W. Vanderbauwhede. A survey of patent users: an analysis of tasks, behavior, search functionality and system requirements. In Proceedings of the third symposium on Information interaction in context, ACM, 2010, pp. 13-24. Lupu, M., K. Mayer, J. Tait, and A. J. Trippe. (Eds.) Current challenges in patent information retrieval. The Information Retrieval Series, Vol. 29, Springer, 2011. 22
Thank you! Any comments/advices? 23
Recommend
More recommend