Chair of Network Architectures and Services Department of Informatics Technical University of Munich An empirical aproach towards analysis of discussions on mailing lists Simon Klimek March 21, 2018 Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Chair of Network Architectures and Services Department of Informatics Technical University of Munich Motivation Related Work Approach Evaluation Future Work Bibliography S. Klimek – Discussion Analysis 2
Chair of Network Architectures and Services Department of Informatics Technical University of Munich Background - IETF Figure 1: IETF Logo • Development of standards • 121 active working groups • RFCs (Request For Comments) S. Klimek – Discussion Analysis 2
Chair of Network Architectures and Services Department of Informatics Technical University of Munich Motivation • Discussions are held via mailing lists. • Can we analyze them automatically? • Can the gained data help us to better understand IETF processes? S. Klimek – Discussion Analysis 3
Chair of Network Architectures and Services Department of Informatics Technical University of Munich Related Work • Conversational Speech S. Klimek – Discussion Analysis 4
Chair of Network Architectures and Services Department of Informatics Technical University of Munich Related Work • Conversational Speech • Formal Speech S. Klimek – Discussion Analysis 4
Chair of Network Architectures and Services Department of Informatics Technical University of Munich Related Work • Conversational Speech • Telephone Conversations (human to human) • Online Chats • Plan recognition in dialogues • Formal Speech S. Klimek – Discussion Analysis 4
Chair of Network Architectures and Services Department of Informatics Technical University of Munich Related Work • Conversational Speech • Telephone Conversations (human to human) • Online Chats • Plan recognition in dialogues • Formal Speech • Q&A Forum S. Klimek – Discussion Analysis 4
Chair of Network Architectures and Services Department of Informatics Technical University of Munich Conversational Speech • Dialogue Acts labeling [6] on the Switchboard corpus [3] • Online Chat between multiple participants [7] • Plan recognition in dialogues [1] S. Klimek – Discussion Analysis 5
Chair of Network Architectures and Services Department of Informatics Technical University of Munich Formal Speech • Question - Answer Forums [2] S. Klimek – Discussion Analysis 6
Chair of Network Architectures and Services Department of Informatics Technical University of Munich Previous Work Nikolai Schwellnus’ bachelor thesis "A Heat Map for IETF Standardiza- tion Activities" [5] S. Klimek – Discussion Analysis 7
Chair of Network Architectures and Services Department of Informatics Technical University of Munich Previous Work Nikolai Schwellnus’ bachelor thesis "A Heat Map for IETF Standardiza- tion Activities" [5] mails messageid text file text key integer date timestamp with time zone date_local timestamp list sender_addr text name text receiver text announce boolean subject text id integer inreply text spam boolean spamscore numeric sender_name text person bigint list:name sentimentvalues fast_person bigint file text key integer leaf:messageid messageid:messageid polarity real mail_threads mail_on_list subjectivity real mostusedword text leaf varchar(788) messageid text list text sentencecount integer depth integer Figure 2: Database Schemata S. Klimek – Discussion Analysis 7
Chair of Network Architectures and Services Department of Informatics Technical University of Munich Finding Discussions Finding discussion threads? S. Klimek – Discussion Analysis 8
Chair of Network Architectures and Services Department of Informatics Technical University of Munich Finding Discussions Finding discussion threads? [Doh] operational considerations Eliot Lear Re: [Doh] operational considerations Re: [Doh] operational considerations Martin J . Dürst Patrick McManus Re: [Doh] operational considerations Re: [Doh] operational considerations Re: [Doh] operational considerations Re: [Doh] operational considerations Jim Reid Eliot Lear. Jim Reid Eliot Lear Re: [Doh] operational considerations Patrick McManus Re: [Doh] operational considerations Hewitt, Rory Re: [Doh] operational considerations Eliot Lear Re: [Doh] operational considerations Patrick McManus Figure 3: Thread Structure S. Klimek – Discussion Analysis 8
Chair of Network Architectures and Services Department of Informatics Technical University of Munich Finding Discussions Finding discussion threads? 10 6 10 6 10 5 10 5 10 4 occurences 10 4 10 3 10 3 10 2 10 2 10 1 10 1 10 0 10 0 0 20 40 60 80 0 100 200 300 number of mails in one thread number of replies S. Klimek – Discussion Analysis 9
Chair of Network Architectures and Services Department of Informatics Technical University of Munich Finding Discussions Finding discussion threads? • In-Reply-To • Thread-View MHonArc 1 1 https://www.mhonarc.org S. Klimek – Discussion Analysis 10
Chair of Network Architectures and Services Department of Informatics Technical University of Munich WITH RECURSIVE r e p l i e s ( messageid , spam, sender_addr , receiver , depth , i n r e p l y ) as ( SELECT messageid , spam, sender_addr , receiver , 1 as depth , i n r e p l y FROM mails WHERE spam IS FALSE and i n r e p l y IS NULL UNION ALL SELECT m. messageid , m. spam, m. sender_addr , m. receiver , tm . depth+1 as depth , tm . i n r e p l y FROM r e p l i e s tm , mails m WHERE m. i n r e p l y = tm . messageid ) S. Klimek – Discussion Analysis 11
Chair of Network Architectures and Services Department of Informatics Technical University of Munich Processing Mails Extract text from mails. 1. Multipurpose Internet Mail Extensions (MIME) [4] S. Klimek – Discussion Analysis 12
Chair of Network Architectures and Services Department of Informatics Technical University of Munich Processing Mails Extract text from mails. 1. Multipurpose Internet Mail Extensions (MIME) [4] • text • text/plain • text/html • multipart • mixed • alternative S. Klimek – Discussion Analysis 12
Chair of Network Architectures and Services Department of Informatics Technical University of Munich Processing Mails Extract text from mails. 1. Multipurpose Internet Mail Extensions (MIME) [4] • text • text/plain • text/html • multipart • mixed • alternative 2. Remove HTML-tags, decode S. Klimek – Discussion Analysis 12
Chair of Network Architectures and Services Department of Informatics Technical University of Munich Quotation and Referencing S. Klimek – Discussion Analysis 13
Chair of Network Architectures and Services Department of Informatics Technical University of Munich Quotation and Referencing S. Klimek – Discussion Analysis 13
Chair of Network Architectures and Services Department of Informatics Technical University of Munich Quotation and Referencing S. Klimek – Discussion Analysis 13
Chair of Network Architectures and Services Department of Informatics Technical University of Munich Processing of Mail-Blocks • Tokenization • Lexical Analysis S. Klimek – Discussion Analysis 14
Chair of Network Architectures and Services Department of Informatics Technical University of Munich Further Analysis • sentence based • dialogue acts S. Klimek – Discussion Analysis 15
Chair of Network Architectures and Services Department of Informatics Technical University of Munich Further Analysis • sentence based • dialogue acts • mail-block based • subjectivity • polarity S. Klimek – Discussion Analysis 15
Chair of Network Architectures and Services Department of Informatics Technical University of Munich Framework Overview • Finding mail-threads • Reading a single mail thread • Pipeline for Preprocessing • Decoding • Mail-block chunking • Tokenization • Quotation/Referencing • Polarity/Subjectivity • Analyzer S. Klimek – Discussion Analysis 16
Chair of Network Architectures and Services Department of Informatics Technical University of Munich Results - Influential People dotis@mail-abuse.org nico@cryptonector.com john-ietf@jck.com ted.lemon@nominum.com alexandru.petrescu@gmail.com bclaise@cisco.com fred@cisco.com fluffy@cisco.com henrik@levkowetz.com magnus.westerlund@ericsson.com mnot@mnot.net martin.thomson@gmail.com j.schoenwaelder@jacobs-university.de kent@bbn.com alexey.melnikov@isode.com paul.hoffman@vpnc.org dhc@dcrocker.net trac@tools.ietf.org harald@alvestrand.no pekkas@netcore.fi stpeter@stpeter.im stephen.farrell@cs.tcd.ie christer.holmberg@ericsson.com touch@isi.edu moore@cs.utk.edu jari.arkko@piuha.net brian.e.carpenter@gmail.com julian.reschke@gmx.de bidulock@openss7.org notifications@github.com 1 , 000 1 , 500 2 , 000 2 , 500 3 , 000 3 , 500 4 , 000 4 , 500 5 , 000 5 , 500 6 , 000 6 , 500 # final says S. Klimek – Discussion Analysis 17
Recommend
More recommend