A DOCUMENT SUMMARIZER FOR NOVICES REX RUBIN WHY A DOCUMENT - PowerPoint PPT Presentation

A DOCUMENT SUMMARIZER FOR NOVICES REX RUBIN

WHY A DOCUMENT SUMMARIZER?  Getting into a field of research is:  Daunting with the amount of information presented  Difficult to discern what is important and what isn’t  How a summarizer will help:  Present the most relevant information and remove the excess

EXTRACTION VS ABSTRACTION  Extraction[1]  Abstraction[1]  Pulls sentences straight  Creates sentences by from the input joining several together  Does not make its own  Works better for several sentences documents at once

TEXTRANK  Extraction based[2]  Creates a web of sentences  This web is used as an input for PageRank  PageRank will rank the sentences[3]  Gives the summary as the output

HOW TO IMPROVE THIS MODEL?  It is important to note the glossary should be of relevant terms compared to the original document  The way TextRank works, the glossary will allow for similar sentences to connect and score higher  This will help by giving more informative sentences  It is important to know that more informative does not mean easier to read

MY TEXTRANK MODIFICATION

RESEARCH QUESTION  Will including a glossary of related terms in the original document bring about more informative sentences?

HYPOTHESIS  Having a glossary included in the original document will bring out more informative sentences in the final summary

EXPERIMENT OVERVIEW  Two experimental groups:  Control Group (Y)  Test Group (X)  Have the groups take a test on the original document

MY SUMMARY  My summary was made using a document focused on cybersecurity and the glossary was filled with similar cybersecurity terms

PARTICIPANTS  Participants:  Union College students aged 18-22  Mixed group of CS students and non-CS students  2 Groups:  Control(Y) read the summary that was made through the original TextRank program  Test (X) read the summary that was made through my modified TextRank program

TEST GIVEN TO PARTICIPANTS  The test given to participants was based on the main points of the original document  Why the main points?  The main points should be in the summary  Question types  3 Multiples Choice  3 Open Answer

AVERAGE SCORES OF QUESTIONS 3.5 3.22 3.06 3 2.5 2 1.5 0.94 0.94 0.89 0.89 1 0.56 0.5 0.44 0.39 0.5 0.33 0.22 0.19 0.06 0 Question 1 Question 2 Question 3 Question 4 Question 5 Question 6 Total Score Multiple Choice: Open Answer: Data on the left is Y 3 1 and the right is X 4 2 6 5

AVERAGE SCORES OF QUESTIONS OUTLIERS REMOVED 4 3.625 3.5 3 3 2.5 2 1.5 1 1 1 1 1 0.5625 0.5 0.5 0.428571 0.5 0.375 0.1875 0.0714286 0 0 Question 1 Question 2 Question 3 Question 4 Question 5 Question 6 Total Score

DIFFERENCES IN RESULTS X-Y 0.4 0.33 0.3 0.2 0.16 0.13 0.11 0.11 0.1 0 0 Question 1 Question 2 Question 3 Question 4 Question 5 Question 6 Total Score -0.1 -0.2 -0.3 -0.4 -0.45 -0.5

DIFFERENCES X-Y OUTLIERS REMOVED 0.8 0.625 0.571429 0.6 0.375 0.4 0.2 0.1160714 0.0625 0 0 Question 1 Question 2 Question 3 Question 4 Question 5 Question 6 Total Score -0.2 -0.4 -0.5 -0.6

WAS MY HYPOTHESIS CORRECT? With these results, I can say my hypothesis is incorrect

SOMETHING ELSE?  Differences in 4 and 6 were significant Question 4 Question 6 1 1 0.89 0.89 0.9 0.9 0.8 0.8 0.7 0.7 0.56 0.6 0.6 0.5 0.45 0.5 0.44 0.4 0.4 0.33 0.3 0.3 0.2 0.2 0.1 0.1 0 0 X Average Y Average Difference Y-X X Average Y Average Difference X-Y

CITATIONS [1]Jan Pedersen Kupiec, Julian and Francine Chen. A trainable document summarizer. ACM SIGIR conference on Research and development in information retrieval, (15):68 – 73, 1995 [2] Paul Tarau Rada Mihalcea. Textrank: Bringing order into texts. 2011. [3] Herwig Unger Mario Kubek. Topic detection based on the pagerank’s clustering property. 2011.

A DOCUMENT SUMMARIZER FOR NOVICES REX RUBIN WHY A DOCUMENT - PowerPoint PPT Presentation

A DOCUMENT SUMMARIZER FOR NOVICES REX RUBIN WHY A DOCUMENT SUMMARIZER? Getting into a field of research is: Daunting with the amount of information presented Difficult to discern what is important and what isnt How a

Welcome to this session on experts, novices, development, learning and memory. As novices gain

A Symbolic Summarizer Fabrizio Gotti, Guy Lapalme Universit de Montral Luka Nerima, ric

Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis Jen-Yuan Yeh 1 ,

Chinese Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis Jen-Yuan Yeh

NYU at Cold Start 2015: Experiments on KBC with NLP Novices Yifan He Ralph Grishman

how novices model business processes Jan Recker | Niz Safrudin | Michael Rosemann Business

Modeling Experts and Novices in Citizen Science Data Jun Yu, Weng-Keen Wong, Rebecca Hutchinson

Is that spam in my ham? A novices inquiry into classification. Lorena Mesa | EuroPython 2016

Most R novices will start with the introductory session in An Introduction to R Appendix A: A

Creating Data-driven Feedback for Novices in Goal-driven Programming Projects Thomas Price

A Contextual Query Expansion Approach by Term Clustering for Robust Text Summarization Massih

Multi-Document Summarization DELIVERABLE 3: CONTENT SELECTION AND INFORMATION ORDERING TARA

Federal Labor Standards Training Dan Narber Project Manager Demographics Contracting

PERFORMANCE METRICS FOR SERIOUS GAMES: WILL THE (REAL) EXPERT PLEASE STEP FORWARD? Loh, C. S.

ARCHITECTING A SOUNDSCAPE: A Spatial Interface for Designing a Dynamic Sonic Environment Alex

June 28, 2018 The webinar will begin at 12:00 PM ET. Please listen through the audio on your

Effective Latent Space Graph-based Re-ranking Model with Global Consistency Hongbo Deng, Michael

WITH RAPIDS Joe Eaton, Ph.D. Technical Lead for Graph Analytics AGENDA Introduction - Why

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Accelerated

An Empirical Study of the Mexican Banking Systems Network and its Implications for Systemic

Traffic Analysis The Most Powerful and Least Understood Attack Methods Raven Alder, Riccardo

Kaleidoscope : Graph Analytics on Evolving Graphs Steffen Maass, Taesoo Kim Georgia Institute of

On Finding Power Method in Spreading Activation Search Jn Suchal Supervisor: Prof. Pavol

Will Nathan like Camille? Will Nathan vote for candidate T.? 2