milos.jakubicek@sketchengine.co.uk Focusing on Tighter Integration of CAT Tools and Corpora Miloš Jakubíček Translating and the Computer 38 London, November 17, 2016 Miloš Jakubíček (Lexical Computing) CAT Tools and Corpora Nov 17, 2016 1 / 12
Background Sketch Engine • online service, since 2003 • large collections of texts (text corpora) • built-in tools for creating user corpora • effjcient search & advanced analysis • as of 2016: • corpora for 85 languages • annotation tools (PoS tagging, lemmatization) for ca 30 of them • 10,000s of users: • lexicographers • linguists • teachers and students • copywriters, terminologists and translators Miloš Jakubíček (Lexical Computing) CAT Tools and Corpora Nov 17, 2016 2 / 12
CAT tools landscape as of 2016 • many advanced tools on the market • main focus on: • wide support of document formats, import/export options • project management and accounting • translation memory management • text formatting • …not so much on the language itself • in fact CAT tools are decade(s) behind the development in natural language processing Miloš Jakubíček (Lexical Computing) CAT Tools and Corpora Nov 17, 2016 3 / 12
CAT tools and NLP Why? • because they are not accurate enough? • because they are not available? • because the community is not aware of them? • because they are hard to use? Miloš Jakubíček (Lexical Computing) CAT Tools and Corpora Nov 17, 2016 4 / 12
CAT tools and NLP • accuracy: not perfect, but good enough to actually help • availability: big issue (including IP issues) • awareness: I don’t know! • user-friendliness: big issue Miloš Jakubíček (Lexical Computing) CAT Tools and Corpora Nov 17, 2016 5 / 12
NLP and translation NLP is doing much more for machine translation than for human translation! Miloš Jakubíček (Lexical Computing) CAT Tools and Corpora Nov 17, 2016 6 / 12
NLP and translation • most NLP tools are more useful in a semi-automatic setting than a fully-automated one • community focus on MT + post-editing rather than exploiting NLP resources and tools that MT is built on to help translators directly Miloš Jakubíček (Lexical Computing) CAT Tools and Corpora Nov 17, 2016 7 / 12
NLP offers • data • huge amounts, both monolingual and bilingual • used to some extent • tools • tokenizers/segmenters • part-of-speech taggers, morphological analyzers • parsers (of any kind) • language modelling and prediction tools • … Miloš Jakubíček (Lexical Computing) CAT Tools and Corpora Nov 17, 2016 8 / 12
Challenges • technical interoperation: solvable • legal issues: hopefully solvable too • workfmow adaptation and integration: a real challenge Miloš Jakubíček (Lexical Computing) CAT Tools and Corpora Nov 17, 2016 9 / 12
Workfmow adaptation and integration • urgent need for seamless integration of state-of-the-art NLP tools that will be: • transparent to the user • fit the workfmow • bringing measurable savings Miloš Jakubíček (Lexical Computing) CAT Tools and Corpora Nov 17, 2016 10 / 12
Sketch Engine integration with CAT tools • Sketch Engine & CAT tools: a typical example of all the aforementioned issues • just released: Sketch Engine plugin for SDL Trados Studio • Sketch Engine workshop: Friday, 2pm–5pm, Education Room • free 3-month trial for all participants Miloš Jakubíček (Lexical Computing) CAT Tools and Corpora Nov 17, 2016 11 / 12
Conclusions • NLP tools and resources can do A LOT for human translators • if integrated well into the translator environment ⇒ this is what needs to be worked on Miloš Jakubíček (Lexical Computing) CAT Tools and Corpora Nov 17, 2016 12 / 12
Recommend
More recommend