who investigating the social entities in a corpus
play

Who? Investigating the social entities in a corpus Max Kemman - PowerPoint PPT Presentation

Who? Investigating the social entities in a corpus Max Kemman University of Luxembourg December 7, 2015 Doing Digital History: Introduction to Tools and Technology Today Final assignment From Hermeneutics to Data to Networks


  1. Who? Investigating the social entities in a corpus Max Kemman University of Luxembourg December 7, 2015 Doing Digital History: Introduction to Tools and Technology

  2. Today • Final assignment • From Hermeneutics to Data to Networks • Preparing the data with Palladio • Visualising with Palladio • Reflections on the tools • Next time

  3. Final assignment Sources in Moodle: Sonja Kmec (2004) Noblewomen and Family Fortunes in Seventeenth Century England and France. A Study of the Lives of the Countess of Derby and her sister-in-law, the Duchess de La Trémoïlle Two collections of letters: 1. Letters by André Rivet (sent between 1606-1646) 2. Letters by Abraham Rambour (sent between 1619-1650) Two protestant preachers in France, writing about daily life

  4. The assignment 1. Prepare the sources so they can be analysed as data Describe all the letters in Google spreadsheet (see link in Moodle) This is a group effort: around 200p of letters so approx. 12 pages per person Check the work of at least one other person Once the Google Sheet is done, make your own copy so you can annotate it further 2. Analyse the letters with the W-questions 3. Reflect upon your analysis

  5. W questions 1. What? 3. When? What are the letters about? When were the letters sent? How does this change over How do the letters change over time? time? 4. Who? 2. Where? Who are the letters sent from & to? Where are the letters sent Who are the people mentioned in the from & to? letters, and how do they relate to the Where are the locations writer & reader? mentioned in the letters? What does this say about the social What does this say about the perspective of the writer? (inter)national perspective of the writer? Can you come up with more W questions?

  6. The report Work in pairs of two or three Include a link to your Google Sheet (via the Share button) or other sources Hand in the assignment in HTML, include your name and a decent profile photo 3000-4000 words, in English

  7. Grading Grading of the course • Weekly assignments (30%) • Final group project (70%) Grading of the final assignment • 1pt for the HTML • 1pt for CSS • 2pts for documentation of your process • 4pts for discussion of the W questions • 2pts for critical reflection

  8. Deadline Send in your assignment before Sunday January 31th 2016 23:59 Send them to max.kemman@uni.lu as usual

  9. From Hermeneutics to Data to Networks Today's lecture is based on Marten Düring's tutorial From Hermeneutics to Data to Networks: Data Extraction and Network Visualization of Historical Sources Available from http://programminghistorian.org/lessons/creating-network- diagrams-from-historical-sources Tools we will be using: Google Sheets and Palladio

  10. Structured data Last week we used letters as a network • Nodes: senders & receivers • Edges: the sending of a letter • Attribute of nodes: location An Excel sheet of letters is what we call structured data But what if the data is unstructured ?

  11. Anything goes When the data does not itself define the relations, we can come up ourselves with the relations we are interested in For example: nodes can besides people be “a film, a place, a job title, a point in time, a venue” Likewise, edges can besides direct connections represent how “two theaters could be connected by a film shown in both of them, or by co- ownership, geographical proximity, or being in business in the same year” The nature of the nodes and edges thus depends on your research interests

  12. Network Data Extraction It is more difficult to extract network data from unstructured text The challenge is to “systematize text interpretation” The data will not represent the full complexity of the source, but acts as a model of the relationships you are interested in Any data you produce will only be as clear as your coding scheme

  13. Developing a coding scheme First task: decide who should be part of the network, and which relations between actors are to be coded Questions to ask: 1. Which aspects of relationships between two actors are relevant? 2. Who is part of the network? Who is not? 3. Which attributes matter? 4. What do you aim to find?

  14. Düring's research Marten Dürings PhD concerned the covert support networks during WWII Three research questions: 1. To what extent can social relationships can help explain why ordinary people took the risks associated with helping? 2. How did such relationships enable people to provide these acts of help given that only very limited resources were available to them? 3. How did social relationships help Jewish refugees to survive in the underground? Case study: first person narrative of Ralph Neuman, a Jewish survivor of the Holocaust. PDF: http://bit.ly/neumantext

  15. His answers to develop his coding scheme 1. Which aspects of relationships between two actors are relevant? “Any action which directly contributed to the survival of persecuted persons in hiding” 2. Who is part of the network? Who is not? “Anyone who is mentioned as a helper, involved in helping activities, involved in activities which aimed to suppress helping behaviour” 3. Which attributes matter? Concerning edges: “Rough categorizations of: Form of help, intensity of relationships, duration of help, time of help, time of first meeting (both coded in 6- months steps).” Concerning nodes: “Mainly racial status according to National Socialist legislation.” 4. What do you aim to find? “A deeper understanding of who helps whom how, and discovery of patterns in the data that correspond to network theory”

  16. Creating our own coding schema What do we know we will need to describe? • Nodes: givers & recipients of help • Relations: help given • Attributes: ? Let's create a Google Sheet with columns Giver and Recipient Consider the sentence: Alice gave Paul some food for the road, what can we describe? Another sentence: “In September 1944 Paul stayed at his friend Alice’s place; they had met around Easter the year before” We need at least two columns describing the attributes

  17. Coding the sample sentence “In September 1944 Paul stayed at his friend Alice’s place; they had met around Easter the year before”

  18. Values Notice that instead of text, the data contain numbers: easier to process afterwards Notice the 99 : this represents an unknown value What if we have multiple values? For example: “In September 1944 Paul stayed at his friend Alice’s place; Alice gave Paul forged documents for the road” Solution: Make another row to describe the second relation

  19. Describing the actors Now we know that Alice helped Paul, but what can we tell about these people? Remember: Düring was interested in the helping of Jews, and self- help In a new sheet , we can describe the actors

  20. Coding all sources Unfortunately, the source will rarely describe sentences like “Person A is connected to Persons B, C and D through relation X at time Y” So, a lot of close reading is required Moreover, when reading more sources, you will discover more actors and connections of interest, expanding your codes and forcing you to go back and update earlier coded sources

  21. Let's try Let's try with the case study: http://bit.ly/neumantext Look up p15, Living underground and describe codes for the first 3 paragraphs

  22. Preparing the data with Palladio To visualize the coded data, we will use Palladio: http://palladio.designhumanities.org/ First we need to prepare the data for Palladio We will use the sample data set from http://bit.ly/duringdata (No need to copy anything just yet)

  23. Loading the actors/nodes Select the Sheet Attributes which describes the actors Copy everything by ctrl+a ctrl+c (Windows) or cmd+a cmd+c (Apple) In the Palladio screen, paste and click Load Palladio now contains a primary table of actors When using Chrome: rename the table to something like People

  24. Adding relations Click on Person Click Add new table Go back to the Google Sheet, select the Relations sheet, and copy & paste all the data into Palladio, click Load , and click Done

  25. Connecting the two tables We now have two tables in Palladio We will link them by the names of the actors involved These names act as identifiers

  26. Connecting the two tables Select Giver in the second table At the bottom, select Extension and select the option (such as People) Click Done , and repeat the same for Recipient

  27. Temporal data For the When question, we can let Palladio use the Time data as temporal data Select Time Step Start and change the Data type to Date , and click Done Repeat the same for Time Step End You'll notice the data are not actual dates, but at least the data shows some chronology

  28. Ready? Now we have the data ready for visualisation Do not close or refresh the Palladio tab : you will have to start all over

  29. Visualising with Palladio Now let's look at the network by selecting Graph at the top bar As a source, choose the Source Giver and close the popup As a target, choose the Target Recipient Watch the result!

  30. Palladio Graph Settings Try the two Highlighting check-boxes Try Size nodes What can we learn from this graph?

Recommend


More recommend