Towards Transparent Linguistic Analysis of Dutch Newspaper Article - PowerPoint PPT Presentation

Towards Transparent Linguistic Analysis of Dutch Newspaper Article Genres using Machine Learning Erik Tjong Kim Sang , Kim Smeenk , Aysenur Bilgin, Tom Klaver, Laura Hollink, Jacco van Ossenbruggen, Frank Harbers and Marcel Broersma CLIN29, Groningen, 31/01/2019

Task Task: automatically predict genres of Dutch newspaper articles Data: 2,930 Dutch newspaper Academic articles with 16 different genre researchers labels Examples of genre labels: news, column, editorial, interview

Results Previo ious s work: k: Harber ers and d Lonij ij (2017) obtain ained ed 65% accuracy acy on this is task Our method: d: machi chine e learnin ing Academic researchers (MLP, NB, RF, SVM) Result: : 70% accuracy acy with SVM (interan annot otat ator agreem ement ent: : 77%)

Application We want to use the distribution of genres over time (1955-1995) to study the effects of depillarization of Dutch newspapers Academic researchers The quality of the proposed genre labels should be very good, in particular: their predicted distributions should be excellent

Question Can you convince us that the genre prediction system works well enough to base our future studies on?

Approach 1. Open the genre classification system 2. Look for components that could introduce bias 3. Improve the transparency of the system with data visualizations We have built a platform supporting step 3

Dealing with OCR errors VOOR AAN DE RADIO t TWEEDE DIVISIE A i Portu„a__Psv Hilversum -EDO _ Enschede rviv RCH — Graafschap 5 Go ZFC — Zwolse Boys f ADO — Telstii. Heerenveen — Wageningen .. . ï DWS — Sitter__, Zwartemeer — AGOVV i VlVV — HerVclés Vitesse — Spel. Cambuur 5 Sparta — Nac PEC-FC Zaanstreek EERSTE- DIVISIE ' ' Haarlem~Tubantia i SS- ar.™ TWEEDE DIVISIE B 'f Willen, H -lve'lov Fortuna Vl.- Xerxes •' VW— Blanw » Baronie — 't Gooi tSSB-S&» gfcfZe.DvS ■ :.::::. i 'SS:3S""'U" ■ ' ■ >'* ""' ■ ' ■ " ■ &£-__e*i_- ■:::::::: !• Helmondia— Limburgia «t zijn opgenomen in de sport-toto. De curfl.--.j_. ' '" drukte z'.l') reserve-wedstrijden. j"""v «__. ' *A- - - -v"-'^"-"JV-_-_-__r_-__^-».---I^v-"--__nj_- Paper version Digital version

Example of important features for genre class comparisons: Interview (blue) vs Reportage (red)

Visual explanation of genre class choice based on feature values

Visual explanation of genre class accuracies and genre class confusion

Gold standard data Machine labeled data

Current state of the project The domain scientists regard the current quality of the predicted genre labels as too low to be used as a basis for further study This involves both the label accuracy and the provided explanations for the labels

Directions of current work 1. 1. Colle llect t mor ore trai aini ning ng data a to improve ve mode del l accur urac acy 2. 2. Employ ploy word vector tors to overcom ome lack of trai aini ning ng data ta 3. 3. Look ok for bett tter featu tures, to generate ate bett tter explan anati ations ons 4. Evalu 4. luate ate alte ternat native ve more advanc anced d machi hine ne learne ners

Concluding remark Improving the transparency of our classifier has improved the insights in the classification task, both for domain scientists and computer scientists

Towards Transparent Linguistic Analysis of Dutch Newspaper Article - PowerPoint PPT Presentation

Towards Transparent Linguistic Analysis of Dutch Newspaper Article Genres using Machine Learning Erik Tjong Kim Sang , Kim Smeenk , Aysenur Bilgin, Tom Klaver, Laura Hollink, Jacco van Ossenbruggen, Frank Harbers and Marcel Broersma CLIN29,

The Dutch Satellite Data Portal The Dutch Satellite Data Portal as part of the Dutch space policy

Annual General Meeting 2019 Royal Dutch Shell plc May 21, 2019 #makethefuture Royal Dutch

Annual General Meeting 2018 Royal Dutch Shell plc May 22, 2018 #makethefuture Royal Dutch

IFRS16 update call Royal Dutch Shell plc March 28, 2019 #makethefuture Royal Dutch Shell March

Annual General Meeting 2017 Royal Dutch Shell plc May 23, 2017 #makethefuture Royal Dutch

Dutch Relief Alliance (DRA) The Dutch Relief Alliance is a coalition of 16 Dutch humanitarian

Annual General Meeting 2016 Royal Dutch Shell plc May 24, 2016 Royal Dutch Shell | May 24, 2016

Master EmLex CiTIUS Design and use of linguistic tools Introduction Linguistic Analysis

Simulating Transparent Migration in Java Java doesnt provide transparent migration. non

Transparent Assessment Providing transparent goals and expectations for students Jonathon Adams

developments in the Netherlands Investment briefing: The Dutch residential market Dutch embassy

Third quarter 2018 results Delivering a world-class investment case Royal Dutch Shell plc

First quarter 2019 results Delivering a world-class investment case Royal Dutch Shell plc May 2,

Fourth quarter 2018 results Delivering a world-class investment case Royal Dutch Shell plc

The Dutch Health Care System May 18, 2011 Dutch Health Care Jeroen Kuijlen, director commercial

Annual roundtable Socially Responsible Investors Royal Dutch Shell plc April 24, 2017

FORM TEACHERS BRIEFING 17 JANUARY 2020 Engaging Learners, Nurturing Leaders, Empowering Givers

Gerardo Schneider 1 Gerardo Schneider Software Technology Division Department of Computer

Post-War Economics Micro-Level Evidence from the African Great Lakes Region Olivia DAoust

Revisiting Lattice Attacks on overstretched NTRU parameters P. Kirchner & P-A. Fouque

Alternative Representations Propositions and State-Variables Literature Malik Ghallab, Dana

Minimal Representations of Order Types by Geometric Graphs Aichholzer 1 , Balko 2 , Hoffmann 3 ,

Higher-Order Concurrent Separation Logic: Why and How Lars Birkedal Aarhus University Nijmegen,

Multicast Address-Set Claim (MASC) Implemen tation P a vlin Radosla v o v (USC/ISI)

Towards Transparent Linguistic Analysis of Dutch Newspaper Article - PowerPoint PPT Presentation

Towards Transparent Linguistic Analysis of Dutch Newspaper Article Genres using Machine Learning Erik Tjong Kim Sang , Kim Smeenk , Aysenur Bilgin, Tom Klaver, Laura Hollink, Jacco van Ossenbruggen, Frank Harbers and Marcel Broersma CLIN29,

The Dutch Satellite Data Portal The Dutch Satellite Data Portal as part of the Dutch space policy

Annual General Meeting 2019 Royal Dutch Shell plc May 21, 2019 #makethefuture Royal Dutch

Annual General Meeting 2018 Royal Dutch Shell plc May 22, 2018 #makethefuture Royal Dutch

IFRS16 update call Royal Dutch Shell plc March 28, 2019 #makethefuture Royal Dutch Shell March

Annual General Meeting 2017 Royal Dutch Shell plc May 23, 2017 #makethefuture Royal Dutch

Dutch Relief Alliance (DRA) The Dutch Relief Alliance is a coalition of 16 Dutch humanitarian

Annual General Meeting 2016 Royal Dutch Shell plc May 24, 2016 Royal Dutch Shell | May 24, 2016

Master EmLex CiTIUS Design and use of linguistic tools Introduction Linguistic Analysis

Simulating Transparent Migration in Java Java doesnt provide transparent migration. non

Transparent Assessment Providing transparent goals and expectations for students Jonathon Adams

developments in the Netherlands Investment briefing: The Dutch residential market Dutch embassy

Third quarter 2018 results Delivering a world-class investment case Royal Dutch Shell plc

First quarter 2019 results Delivering a world-class investment case Royal Dutch Shell plc May 2,

Fourth quarter 2018 results Delivering a world-class investment case Royal Dutch Shell plc

The Dutch Health Care System May 18, 2011 Dutch Health Care Jeroen Kuijlen, director commercial

Annual roundtable Socially Responsible Investors Royal Dutch Shell plc April 24, 2017

FORM TEACHERS BRIEFING 17 JANUARY 2020 Engaging Learners, Nurturing Leaders, Empowering Givers

Gerardo Schneider 1 Gerardo Schneider Software Technology Division Department of Computer

Post-War Economics Micro-Level Evidence from the African Great Lakes Region Olivia DAoust

Revisiting Lattice Attacks on overstretched NTRU parameters P. Kirchner &amp; P-A. Fouque

Alternative Representations Propositions and State-Variables Literature Malik Ghallab, Dana

Minimal Representations of Order Types by Geometric Graphs Aichholzer 1 , Balko 2 , Hoffmann 3 ,

Higher-Order Concurrent Separation Logic: Why and How Lars Birkedal Aarhus University Nijmegen,

Multicast Address-Set Claim (MASC) Implemen tation P a vlin Radosla v o v (USC/ISI)

Revisiting Lattice Attacks on overstretched NTRU parameters P. Kirchner & P-A. Fouque