name dropping in 18th century public discourse
play

Name-dropping in 18th Century Public Discourse Aleksi Jalavala 1 , - PowerPoint PPT Presentation

Name-dropping in 18th Century Public Discourse Aleksi Jalavala 1 , Annika Pensola 1 , Bruno Sartini 3 , David Rosson 2 , Peeter Tinits 5 , Selina Lehtoranta 1 , Sophie Schneider 4 , Veera Oksala 1 mon Hengchen 1 , Tanja Sily 1 Team leaders: Si 1


  1. Name-dropping in 18th Century Public Discourse Aleksi Jalavala 1 , Annika Pensola 1 , Bruno Sartini 3 , David Rosson 2 , Peeter Tinits 5 , Selina Lehtoranta 1 , Sophie Schneider 4 , Veera Oksala 1 mon Hengchen 1 , Tanja Säily 1 Team leaders: Si 1 University of Helsinki 2 Aalto University 3 University of Bologna 4 Potsdam University of Applied Sciences 5 Tallinn University

  2. Research Questions Which personal names are frequently mentioned in 18th century British publications? On the basis of the frequency and co-occurrence of individual names, what kind of patterns can we detect that are characteristic of genres and time periods ?

  3. Data Eighteenth Century Collections Online (ECCO) High representativeness: ca. 50% of all 18th century British printed texts ● (180,000 titles, 32 million pages) OCR issues, unreliable metadata ●

  4. OCR example They do not ddfcovcr much taste or ingenuity in building their hoifllts; though the defe& is rather in the delign than the execution. Those of the lower people are poor huts, thole of the better are larger and more comfortable. Their houfcs, properly speaking, are thatched roofs or sheds supported by pofgs and r:fters dilpofed in a tolerably judicious manner.

  5. Data Eighteenth Century Collections Online (ECCO) High representativeness: ca. 50% of all 18th century British printed texts ● (180,000 titles, 32 million pages) OCR issues, unreliable metadata ● 3 distinct subsets (history, religion, social afgairs) combining the metadata ● and keyword analysis

  6. Keyword analysis Keyness analysis

  7. Final subset from ECCO

  8. Methods 1) Data Extraction - subsets from ECCO Corpus 2) Named Entity Extraction from the subsets 3) NER Validation: both qualitative (manual checking) and quantitative (automatic match with DBPedia) 4) Sampling of the data 5) Visualization of the data through Networks and metadata filtering 6) Qualitative Examination of the results in the Visualization 7) Refinement of the quantitative techniques based on the feedback of the qualitative examination (Repeat step 3 until we are satisfied)

  9. Methods 1) Data Extraction - subsets from ECCO Corpus 2) Named Entity Extraction from the subsets 3) NER Validation: both qualitative (manual checking) and quantitative (automatic match with DBPedia) 4) Sampling of the data 5) Visualization of the data through Networks and metadata filtering 6) Qualitative Examination of the results in the Visualization 7) Refinement of the quantitative techniques based on the feedback of the qualitative examination (Repeat step 3 until we are satisfied)

  10. Methods 1) Data Extraction - subsets from ECCO Corpus 2) Named Entity Extraction from the subsets 3) NER Validation: both qualitative (manual checking) and quantitative (automatic match with DBPedia) 4) Sampling of the data 5) Visualization of the data through Networks and metadata filtering 6) Qualitative Examination of the results in the Visualization 7) Refinement of the quantitative techniques based on the feedback of the qualitative examination (Repeat step 3 until we are satisfied)

  11. Methods 1) Data Extraction - subsets from ECCO Corpus 2) Named Entity Extraction from the subsets 3) NER Validation: both qualitative (manual checking) and quantitative (automatic match with DBPedia) 4) Sampling of the data 5) Visualization of the data through Networks and metadata filtering 6) Qualitative Examination of the results in the Visualization 7) Refinement of the quantitative techniques based on the feedback of the qualitative examination (Repeat step 3 until we are satisfied)

  12. Methods 1) Data Extraction - subsets from ECCO Corpus 2) Named Entity Extraction from the subsets 3) NER Validation: both qualitative (manual checking) and quantitative (automatic match with DBPedia) 4) Sampling of the data 5) Visualization of the data through Networks and metadata filtering 6) Qualitative Examination of the results in the Visualization 7) Refinement of the quantitative techniques based on the feedback of the qualitative examination (Repeat step 3 until we are satisfied)

  13. Methods 1) Data Extraction - subsets from ECCO Corpus 2) Named Entity Extraction from the subsets 3) NER Validation: both qualitative (manual checking) and quantitative (automatic match with DBPedia) 4) Sampling of the data 5) Visualization of the data through Networks and metadata filtering 6) Qualitative Examination of the results in the Visualization 7) Refinement of the quantitative techniques based on the feedback of the qualitative examination (Repeat step 3 until we are satisfied)

  14. Methods 1) Data Extraction - subsets from ECCO Corpus 2) Named Entity Extraction from the subsets 3) NER Validation: both qualitative (manual checking) and quantitative (automatic match with DBPedia) 4) Sampling of the data 5) Visualization of the data through Networks and metadata filtering 6) Qualitative Examination of the results in the Visualization 7) Refinement of the quantitative techniques based on the feedback of the qualitative examination (Repeat step 3 until we are satisfied)

  15. Quiz Go to www.menti.com and use the code: 91 83 31

  16. Results Identified the most common people mentioned in texts ●

  17. Results Identified the most common people mentioned in texts ● Looked at this over time (20-year periods) ● Religion more static, others more dynamic ○ Classics and religious figures, e.g. Jesus, Cicero, Virgil, are mentioned across genres & ○ remain constant throughout the century hints at name-dropping as a proof of one’s education ■

  18. Results Identified the most common people mentioned in texts ● Looked at this over time (20-year periods) ● Religion more static, others more dynamic ○ Classics and religious figures, e.g. Jesus, Cicero, Virgil, are mentioned across genres & ○ remain constant throughout the century hints at name-dropping as a proof of one’s education ■ The people mentioned in books do reveal similarity in content ● Cluster similar to modules in ECCO ○ A way to approximate genres? ○

  19. Results Identified the most common people mentioned in texts ● Religion more static, others more dynamic ○ Classics and religious figures, e.g. Jesus, Cicero, Virgil, are mentioned across genres & ○ remain constant throughout the century hints at name-dropping as a proof of one’s education ■ The people mentioned in books do reveal similarity in content ● Cluster similar to modules in ECCO ○ A way to approximate genres? ○ Patterns in types of people referred to by genre. ●

  20. Future research Automatic genre classification based on named entity networks ● Improve NER and subsets by using domain-specific resources ● Examine less popular entities and their role for certain genres ● Focus on 1st editions, inspecting specific sections of books ● Creating interactive visualizations ●

  21. Future visualization

  22. Public outreach

  23. TWITTER @GenreAndStyle ● Regular tweeting on process and ● related things Audience: academic community, ● DHH19 participants

  24. Blogs - medium.com and blogs.helsinki.fi @GenreAndStyle One post a day, each group ● member wrote once Audience: academic community, ● DHH19 participants

  25. INSTAGRAM Personal accounts Stories, links to blog ○ Audience: personal ○ contacts within as well as outside of the academic community

  26. Thank you! Questions?

Recommend


More recommend