analysis of wikileaks cables using nlp
play

Analysis of Wikileaks Cables Using NLP Techniques CS671: Natural - PowerPoint PPT Presentation

Analysis of Wikileaks Cables Using NLP Techniques CS671: Natural language Processing Arpit Jain Sugam Anand Mentor : Dr . Amitabha Mukerjee Why Wikileaks ? Wikileaks embassy cables revelations covered a huge dataset of official documents


  1. Analysis of Wikileaks Cables Using NLP Techniques CS671: Natural language Processing Arpit Jain Sugam Anand Mentor : Dr . Amitabha Mukerjee

  2. Why Wikileaks ?  Wikileaks embassy cables revelations covered a huge dataset of official documents counting around 251,287 , from more than 250 worldwide US embassies and consulates.  The cables show the extent of US spying on its allies and the UN; turning a blind eye to corruption and human rights abuse in "client states"; backroom deals with supposedly neutral countries; lobbying for US corporations; and the measures US diplomats take to advance those who have access to them.  Such a huge, rich and structured dataset can be analyzed with natural language and Information retrieval techniques.

  3. Distribution of cables http://wikileaks.org/cablegate.html

  4. Structure of Cables  Cable contains : Source : Embassy which sent the cable: Destination : Target Embassies Date : Sending date Body : Containing the raw text Tags : Containing meta information regarding cable like classified,unclassified or secret etc.

  5. Objective  Diplomats communicated about some topics referencing people,places ,organizations.  Extract out these entities from the wikileaks.  Guess what is the topic ?  What is the Opinion of the diplomats (extends to america also) towards the topic.  Map these over the timelines.

  6. Methodology Get cables for multiple time periods for given embassies.  Extract out the entities using NLTK Named Entity Recognizer  or Stanford CoreNLP Toolkit Score these entities using their occurency frequency over  the different cables for a particular time frame. Guess the topics using topic modelling approach like LDA,  PLSA or LSI

  7. Progress  For Iran RPO Dubai Total 3853 entities like 'IRIG','supreme leader  Khameni','Khatami','Mousavi','Islamic Revolution','Middle East'.  For Islamabad  'Kashmir','Balochistan','Musharraf','North West Frontier Province'  For New Delhi  'PM Manmohan Sibgh','BJP','NSSP','Tsunami Relief'

  8. LDA Results for Islamabad Relief operation by UN ['0.211*"usaid/dart" + 0.178*"relief" + 0.115*"water" + 0.114*"earthquake" + 0.113*" shelter“ + 0.112*"tents" + 0.103*"october “ + 0.101*"u.n." + 0.097*"sanitation" + 0.095*"food"'] Existence of extremists in madrassa ["0.018*ssp + 0.016*( + 0.012*2005 + 0.010*groups + 0.010*domestic + 0.010*leaders + 0.010*extremist + 0.010*madrassa + 0.009*'s + 0.008*its", '0.000*rns. + 0.000*opened + 0.000*increase + 0.000*2005. + 0.000*receiving + 0.000*viable + 0.000*shows + 0.000*rebuilding + 0.000*e. + 0.000*jalil']

  9. LDA Results for New Delhi Nuclear Deal ['0.115*"saran" + 0.113*"bjp" + 0.109*"nuclear" + 0.107*"congress" + 0.105*"jaishankar" + 0.103*"king" + 0.099*"pakistan" + 0.097*"nssp “ + 0.094*"nepal" + 0.080*"iraq"']

  10. References @InProceedings{ oconnor-stewart-smith-13_extracting-intl-relations-from-political- context, author={O'Connor, Brendan and Stewart, Brandon M. and Smith, Noah A.}, title = {Learning to Extract International Relations from Political Context}, booktitle = {Proc. 51st ACL (Long papers)}, month = {August}, year = {2013}, pages = {1094--10104}, url = {http://www.aclweb.org/anthology/P13-1108} annote = { } }

Recommend


More recommend