towards employing multilingual term resources for
play

Towards Employing Multilingual Term Resources for Intelligent - PowerPoint PPT Presentation

Towards Employing Multilingual Term Resources for Intelligent Patents Search Galia Angelova and Irina Temnikova Institute of Information and Communication Technologies (IICT) Bulgarian Academy of Sciences Presentation Structure


  1. Towards Employing Multilingual Term Resources for Intelligent Patents Search Galia Angelova and Irina Temnikova Institute of Information and Communication Technologies (IICT) Bulgarian Academy of Sciences

  2. Presentation Structure ● Introduction\Motivations ● Related Work ● Our Approach 2

  3. Work Motivations 3

  4. Introduction/Motivations 1 1. 40 millions patents available electronically 2. Search mainly limited to: – keywords – boolean operators – proximity – truncation/wildcards 3. Manual query formulation (average of 5 minutes per query) 4

  5. Introduction/Motivations 2 4. Up to 40 hours for 15 queries in 100 documents (average 12 hours) [ Joho, H., 2010] 5. Limited contribution of Natural Language Pro- cessing (NLP) 6. Multilingual search restricted to category numbers 5

  6. Related work 6

  7. Related Work 1 1. NLP for Patents – Various applications : – Patents language analysis [ Sheremetyeva, S., 2003; Lamirel, J.-Ch. et al., 2003; Hsin-hung Lin, D. et al., 2010] – Patents readability improvement [ Shinmori, A. et al., 2010 ] – Patents text generation [ Sheremetyeva, S. et al., 1996 ] – Text classification [ Nanba, H. el al., 2009] – Patents translation [ Waeschle, K., et al., 2011; Orsnes, B. et al., 1996; Choi, S.-K., et al., 2007 ] 7

  8. Related Work 2 1. NLP for Patents - Patents search: – Document retrieval using Differential Latent Semantic Index and Template Matching Technique [ Chen, L.. et al., 2001 ] – Patents, publications and persons network detection [ Li, H., et al., 2011 ] – The role of spelling errors in patents search [ Stein, B., et al., 2012 ] – Query term distillation [ Itoh, H., et al., 2005 ] – Transforming a patent application into a search query [ Xue, X., et al., 2009 ] – Query expansion with synonyms using WordNet [ Magdy, W., et al., 2011] 8

  9. Related Work 3 ● Use of Wikipedia in NLP – Text Simplification and Machine Translation [Coster, W. et al., 2011; Wubben, S., 2012] – Question Answering [Dornescu, I., 2012] – Word Sense Disambiguation in Wikipedia [Ratinov, L., 2011] – Untangling Wikipedia cross-lingual links [De Melo, G., et al., 2010] – Relation extraction [Yan, Y. et al., 2009] – Named Entity disambiguation using W. [Cucerzan, S., 2007] 9

  10. Our Approach 10

  11. Our Approach 1 ● Aims: – Improve patents search – Make possible multilingual search ● How: – By applying NLP techniques – With the help of a large external multilingual terminological resource (extracted from Wikipedia) 11

  12. Our Approach 2 ● Expanding search by: – Annotation/indexing of patent applications – Adding term equivalents to the search query itself ● Types of terms added: – Synonyms – Paraphrases – Multilingual equivalents 12

  13. Our Approach 3 ● Recognition of the correct sense of the term: – Patent classification labels + – Wikipedia title pages controlled by experts + – Wikipedia articles long texts 13

  14. Our Approach 5 Patent terms disambiguation using Wikipedia 14

  15. Our Approach 4 International Patent Classification (IPC) terms 15

  16. Our Approach 5 The Wikipedia “Computer file” article 16

  17. Our Approach 6 NLP term enrichment details: 1.Extraction of synonyms • textual markers “also known as” • Wikipedia “redirect” pages {{Redirect|term}} 2.Noun phrase paraphrases • “Date of birth”= “Birthdate” 17

  18. Our Approach 7 NLP term enrichment details: 3.Multilingual equivalents En: Childbirth • French: accouchement, travail, naissance, parturition • Italian: parto 18

  19. Our Approach 8 Multilingual indexation of patents terms. 19

  20. Summary 1 ● Improving patents search with NLP techniques ● Using a large terminological controlled and regularly updated resource (Wikipedia) ● Challenges: term sense disambiguation – Solutions: ● Patents classification categories ● Wikipedia articles large texts 20

  21. Summary 2 ● Patent applications annotation + search ex- pansion with: – Synonyms – NP paraphrases – Multilingual equivalents 21

  22. Main References D. Hunt, D., L. Nguyen, and M. Rodgers (Eds.). Patent searching: tools and techniques. Wiley, 2007. Joho, H., L. A. Azzopardi and W. Vanderbauwhede. A survey of patent users: an analysis of tasks, behavior, search functionality and system requirements. In Proceedings of the third symposium on Information interaction in context, ACM, 2010, pp. 13-24. Lupu, M., K. Mayer, J. Tait, and A. J. Trippe. (Eds.) Current challenges in patent information retrieval. The Information Retrieval Series, Vol. 29, Springer, 2011. 22

  23. Thank you! Any comments/advices? 23

Recommend


More recommend