matching needs and resources how nlp can help theoretical
play

Matching needs and resources: How NLP can help theoretical - PowerPoint PPT Presentation

Matching needs and resources: How NLP can help theoretical linguistics Alexis Dimitriadis Utrecht institute of Linguistics OTS Utrecht University Motivation Insight: Linguistic research could be carried out more efficiently (or: with


  1. Matching needs and resources: How NLP can help theoretical linguistics Alexis Dimitriadis Utrecht institute of Linguistics OTS Utrecht University

  2. Motivation • Insight: Linguistic research could be carried out more efficiently (or: with better results) if all that NLP horsepower could be brought to bear. But how? • Some linguists have both theoretical and NLP knowledge, and can choose the techniques to apply. But most theoretical linguists are not self-sufficient in this respect: • Theoretical linguists should want help from NLP . • Computational linguists should want to help.

  3. How NLP could help I Best-case scenario • Collaboration, to tackle a research question interesting to both sides. Examples: Language genealogy, learning algorithms (e.g., learning OT constraint ranking). • However: suitable research problems are limited. • Many theoretical projects have modest (i.e., boring) computational needs. A large corpus of English is enough for a lot of linguists...

  4. Some collaborative research topics • Computational cladistics for Indo-European language families. (Ringe, Warnow and Taylor 2002) • Algorithms for learning constraint rankings for Optimality-Theoretic systems (phonology, stress, syntax...) (Tesar and Smolensky 2000) • Cognitive modeling of various linguistic phenomena (cf. Workshop 5) • Studying the semantics of lexical entailments, using a new custom-developed corpus. (Winter et al., ongoing)

  5. How NLP could help II Lending a helping hand • Freebies Use of existing corpora, parsers etc. • Altruism The computational linguist undertakes to help the theorist. Examples: Linguist’s Search Engine (Resnik et al. 2005), Natural Language Toolkit (Bird et al. 2009). Problems: Existing resources are often hard to use for the uninitiated. Altruism is a limited resource. How is it most effective?

  6. How NLP could help III Learning from NLP methodology • Data-driven orientation • Reproducibility, inter-annotator consistency • Testing against a corpus of data • Evaluation metrics

  7. Utilizing concrete resources and technical stumbling blocks • Many existing, boring tools and resources would be useful to a theoretical linguist: corpora, parsers, web crawlers... • However, linguists typically lack the necessary technical know-how, compilers, or even knowledge that such tools exist.

  8. The Linguist’s Search Engine Resnik and Elkiss (2005) • Searching the web by word, POS and syntactic structure. Search engine results are parsed and filtered with enriched queries. • Stored core corpus for fast results; supplemented with real-time crawling and parsing if desired. • User-friendly, easy to use web interface, designed for the theoretical linguist. Graphical query-by-example query construction.

  9. The Linguist’s Search Engine Resnik and Elkiss (2005) • Searching the web by word, POS and syntactic structure. Search engine results are parsed and filtered with enriched queries. • Stored core corpus for fast results; supplemented with real-time crawling and parsing if desired. • User-friendly, easy to use web interface, designed for the theoretical linguist. Graphical query-by-example query construction. • Defunct. Too much work to create and support such special tools?

  10. The Natural Language Toolkit Bird et al. (2009) • A collection of Python modules for text analysis and various NLP tasks • Interactive command-line environment for interactive linguistic exploration • Relatively little technical skill required • Documented in a very accessible book (Bird et al. 2009) targeted to the “ordinary working linguist”

  11. Benefits of the no-frills approach • Command line tools are easier to write and maintain. Also better for scripting, creating workflows, etc. • An integrated tool is less flexible in the tasks it can carry out. A little scripting unlocks the power of computational techniques. • Command line tools still need to be reasonably easy to install and use, and should be well documented at an accessible level.

  12. Benefits of the no-frills approach • Command line tools are easier to write and maintain. Also better for scripting, creating workflows, etc. • An integrated tool is less flexible in the tasks it can carry out. A little scripting unlocks the power of computational techniques. • Command line tools still need to be reasonably easy to install and use, and should be well documented at an accessible level. • The NLTK and similar resources will still be beyond the reach of linguists unable, or unwilling, to make the required time investment. • Is this a big problem? I don’t believe it is.

  13. Conclusions I • To benefit from NLP , theoretical linguists must incorporate its methods and values in their work habits. • Joint research is the best scenario, but is limited in reach. Many needs are simpler. • Tool creation is not a strength of theoretical linguists. Making resources available is great– and the tools don’t need to be point-and-click.

  14. Conclusions II • We theoretical linguists need to help ourselves, by adopting methodological insights and learning to use the associated techniques and tools. • Hopefully, computationally savvy theorists will go on to formulate more research questions suitable for real collaboration with computational linguists.

Recommend


More recommend