Presentation of OpenNLP Presenter : Dr Ir Robert Viseur What is - PowerPoint PPT Presentation

[ RMLL 2013, Bruxelles – Thursday 11 th July 2013 ] Presentation of OpenNLP Presenter : Dr Ir Robert Viseur

What is OpenNLP ? • Toolkit for the processing of natural language text. • Project of the Apache Foundation. • Developped in Java. • Under Apache License, Version 2. • Download and documentation: http://opennlp.apache.org/ . 2

What are the features ? • For common NLP tasks : • tokenization, • sentence segmentation, • part-of-speech tagging, • named entity extraction, • chuncking. 3

What is the part-of-speech tagging ? • Example : • See more: http://opennlp.apache.org/documentation/1.5.3 /manual/opennlp.html . 4

What is the named entity extraction ? • Example : • See more: http://opennlp.apache.org/documentation/1.5.3 /manual/opennlp.html . 5

How does it work ? (1/2) • The features are associated to pre-trained models. • Each pre-trained model is created for one language and for one type of use. • Supported languages: da, de, en, es, nl, pt, se. • Warnings : – The functional coverage varies with languages. – The french language is not supported ! • See http://opennlp.sourceforge.net/models- 1.5/ . • Use in command line or as a Java library. • Warning : loading time of models with CLI. 6

How does it work ? (2/2) • Example (English vs Spanish languages) : 7

What are the criteria of choice ? • Support of the product. • License. • Available languages. • Precision / Recall. • Speed of text processing. 8

Are there free (as freedom) alternative tools ? • Other light tools : • Stanford Log-linear Part-Of-Speech Tagger (POST), • Stanford Named Entity Recognizer (NER), • TagEN, • Java Automatic Term Extraction toolkit. • Frameworks : • In Java : UIMA (Java), GATE (Java). • In other languages : NLTK (Python). 9

Example: tag cloud creation (1/6) • Starting point: website. • Example: www.adacore.com . • What we want (from website content): • common tag cloud, • circular tag cloud. • Main steps : crawl, cleaning of HTML documents, named entities (person) and terminology extractions (+ merge) and display (tag cloud). 10

Example: tag cloud creation (2/6) • Cleaning: • Remove the HTML tags and keep only the useful content. • Warnings: • NLP tools are sensitive to noise in raw data. • Pay attention to the language of the document. • Use of HTML boilerplate tool (HTML -> TXT). • Tool: Boilerpipe. • See http://code.google.com/p/boilerpipe/ . • Next: normalization of the text. 11

Example: tag cloud creation (3/6) • Named entities extraction. • Standard in OpenNLP : OpenNLP adds tags in text. • Here : extraction of Person NE. • Terminology extraction. • First : part-of-speech tagging (POST). • Next : identification et filtering (threshold) of : • collocations (i.e: Name_Name, Adjective_Name,...), • proper names (often: brands or people). 12

Example: tag cloud creation (4/6) • Process : Website Crawl Website (local) (Internet) ---- --- -- ----. Raw HTML Conversion --- -- -- -- ---- document to text --- -- ----. Normalization _--- _-- _-- _ ---- --- -- ----. POS _---- _--. --- -- -- -- ---- tagging _--- _-- _-- _-- --- -- ----. Terminology NE extraction extraction _____ _____ _____ _____ Merge _____ _____ Tags Tag cloud (for a website) 13

Example: tag cloud creation (5/6) • Result: common tag cloud. 14

Example: tag cloud creation (6/6) • Result: circular tag cloud. 15

Thanks for your attention. Any questions ? 16

Contact Dr Ir Robert Viseur Email (@CETIC) : robert.viseur@cetic.be Email (@UMONS) : robert.viseur@umons.ac.be Phone : 0032 (0) 479 66 08 76 Website : www.robertviseur.be This presentation is covered by « CC-BY-ND » license. 17

Presentation of OpenNLP Presenter : Dr Ir Robert Viseur What is - PowerPoint PPT Presentation

[ RMLL 2013, Bruxelles Thursday 11 th July 2013 ] Presentation of OpenNLP Presenter : Dr Ir Robert Viseur What is OpenNLP ? Toolkit for the processing of natural language text. Project of the Apache Foundation. Developped in

IXA pipes: Efficient and Ready to Use Multilingual NLP tools Rodrigo Agerri IXA NLP Group,

Literary Data: Some Approaches Andrew Goldstone http://www.rci.rutgers.edu/~ag978/litdata April

Dependency Parsing Lecture 2 Overview Nivre's Arc-Eager / Arc-Standard Algorithm

Investor Presentation Investor Presentation Investor Presentation Investor Presentation

INVESTOR PRESENTATION | 3 INVESTOR PRESENTATION | 4 INVESTOR PRESENTATION | 5 INVESTOR

INVESTOR PRESENTATION | 2 INVESTOR PRESENTATION | 3 INVESTOR PRESENTATION | 4 INVESTOR

Presentation Skills -Week 10- Presentation Skills Structure of presentation Preparing a

Presentation Presentation Presentation Presentation Presentation Abstract No Topic Title

John A. Deithloff W5FFS Amateur Radio License since 1954 HF Presentation HF Presentation HF

Presentation Presentation Presentation Presentation Presentation Abstract No Topic Title

Presentation Now: Prepare a Perfect Presentation in Presentation Now: Prepare a Perfect

Corporate Presentation Corporate Presentation Corporate Presentation Corporate Presentation

RESULTS RESULTS RESULTS RESULTS PRESENTATION PRESENTATION PRESENTATION PRESENTATION 17 17

INVESTOR PRESENTATION INVESTOR PRESENTATION INVESTOR PRESENTATION INVESTOR PRESENTATION June ,

Investor Presentation Investor Presentation Investor Presentation Investor Presentation

Plan4 Media Presentation For Paul Kyle Consultants Presentation outline Web presentation

Chapter 1 Theory of Demand Ali Mazyaki, Ph.D. Institute for Management and Planning Studies

Voting and Social in those situations where cardinal measurement of individual welfare is either

Ordinal social ranking : simulations for CP-majority rule Nicolas Fayard 1 and Meltem ztrk 1 1

PHPE 400 Individual and Group Decision Making Eric Pacuit University of Maryland 1 / 22 The

Evidence Towards a Swampland Conjecture Eran Palti University of Heidelberg 1602.06517 (JHEP

Effective computations of HasseWeil zeta functions Edgar Costa ICERM/Dartmouth College 20th

The Two Hyperplane Conjecture David Jerison (MIT) In honor of Steve Hofmann, ICMAT, May 2018

A conjecture regarding optimality of the dictator function under Hellinger distance Chandra Nair

Sambuz

Useful Links

Newsletter

Mail Us