Who? Investigating the social entities in a corpus Max Kemman University of Luxembourg December 6, 2016 Doing Digital History: Introduction to Tools and Technology
Where assignment How is the assignment going so far? Any questions about the tools?
Today • Final assignment • Networks • From Hermeneutics to Data to Networks • Next time
Final assignment 1. Analyse the 30k emails with the W-questions, or specify a subselection 2. Reflect upon your analysis
W questions 1. What? 3. When? What are the emails about? When were the emails sent? How does this change over How do the emails change over time? time? 4. Who? 2. Where? Who are the emails sent from & to? Where are the locations Who are the people mentioned in the mentioned in the emails? emails, and how do they relate to the What does this say about the writer & reader? (inter)national perspective of What does this say about the social the writer(s)? perspective of the writer? Can you come up with more W questions?
The report Work in groups of three or four (in group of 3: discuss 3 W questions) Include a link to your Google Sheet (via the Share button) or other sources Hand in the assignment in HTML, include your name and a decent profile photo 3000-5000 words, in English
Grading Grading of the course • Weekly assignments (40%) • Final group project (60%) Grading of the final assignment • 1pt for the HTML • 1pt for CSS • 2pts for documentation of your process • 4pts for discussion of the W questions • 2pts for critical reflection
Deadline Send in your assignment before 20 January 2017 23:59 (tentative) Send them to max.kemman@uni.lu as usual: I will confirm your submission
Networks Our final W question Historical research incorporates: • What - what happened? • Where - where did this happen? • When - when did this happen? • Who - who was involved?
How to describe the people Given a corpus, multiple ways of describing people • A list of all the people • Biographies • Classes of people • Genealogies • Networks of people
What is a network? Two components: 1. Actors - the people - represented as nodes 2. Relations - the connections - represented as edges (Images and information based on Martin Grandjean's tutorial)
What is a network? Attributes of nodes: 1. Label Here: Name 2. Colour Here: Gender 3. Size Number of connections Not in the data, but derived
What is a network? Attributes of edges: 1. Label 2. Colour 3. Size 4. Direction Networks can be directed or undirected Here: directed
Reading a network Imagine the connection here means "likes" • John likes many people, but no one likes John • Everybody likes Diana, but Diana doesn't like anyone • There are no 2 people who like each other • Everyone is connected • No isolated nodes
Types of network 1. Graphs - a web of relations including circles 2. Trees - no circles
Types of network 1. Graphs - a web of relations including circles 2. Trees - no circles 3. Bipartite - 2 sets of nodes with links between the sets but not within each set
Analysing the network Four types of centrality measures 1. Degree centrality - the numbers of connections 2. Closeness centrality - closeness to the entire network 3. Betweenness centrality - bridges 4. Eigenvector centrality - connection to well-connected nodes
Central nodes 1. Which node has the most connections? 2. Which node is the closest to the entire network? 3. Which node acts as a bridge between different communities? 4. Which node is connected to well- connected nodes? Besides nodes, we see communities
A network of letter writers For historical research, letters are an interesting corpus for network analysis We (usually) know: 1. Sender 2. Location of the sender 3. Receiver 4. Location of the receiver 5. Date of the letter 6. Contents of the letter
ePistolarium For example, ePistolarium or Six Degrees of Francis Bacon
From Hermeneutics to Data to Networks The following slides are based on Marten Düring's tutorial From Hermeneutics to Data to Networks: Data Extraction and Network Visualization of Historical Sources Available from http://programminghistorian.org/lessons/creating-network- diagrams-from-historical-sources
Structured data As mentioned, we can show letters (or emails) as a network • Nodes: senders & receivers • Edges: the sending of a letter • Attribute of nodes: location An Excel sheet of metadata of letters is what we call structured data But what if the data is unstructured ?
Anything goes When the data does not itself define the relations, we can come up ourselves with the relations we are interested in For example: nodes can besides people be “a film, a place, a job title, a point in time, a venue” Likewise, edges can besides direct connections represent how “two theaters could be connected by a film shown in both of them, or by co- ownership, geographical proximity, or being in business in the same year” The nature of the nodes and edges thus depends on your research interests
Network Data Extraction It is more difficult to extract network data from unstructured text The challenge is to “systematize text interpretation” The data will not represent the full complexity of the source, but acts as a model of the relationships you are interested in Any data you produce will only be as clear as your coding scheme
Developing a coding scheme First task: decide who should be part of the network, and which relations between actors are to be coded Questions to ask: 1. Which aspects of relationships between two actors are relevant? 2. Who is part of the network? Who is not? 3. Which attributes matter? 4. What do you aim to find?
Düring's research Marten Dürings PhD concerned the covert support networks during WWII Three research questions: 1. To what extent can social relationships can help explain why ordinary people took the risks associated with helping? 2. How did such relationships enable people to provide these acts of help given that only very limited resources were available to them? 3. How did social relationships help Jewish refugees to survive in the underground? Case study: first person narrative of Ralph Neuman, a Jewish survivor of the Holocaust. PDF: http://bit.ly/neumantext
His answers to develop his coding scheme 1. Which aspects of relationships between two actors are relevant? “Any action which directly contributed to the survival of persecuted persons in hiding” 2. Who is part of the network? Who is not? “Anyone who is mentioned as a helper, involved in helping activities, involved in activities which aimed to suppress helping behaviour” 3. Which attributes matter? Concerning edges: “Rough categorizations of: Form of help, intensity of relationships, duration of help, time of help, time of first meeting (both coded in 6- months steps).” Concerning nodes: “Mainly racial status according to National Socialist legislation.” 4. What do you aim to find? “A deeper understanding of who helps whom how, and discovery of patterns in the data that correspond to network theory”
Creating our own coding schema What do we know we will need to describe? • Nodes: givers & recipients of help • Relations: help given • Attributes: ? Let's create a Google Sheet with columns Giver and Recipient Consider the sentence: Alice gave Paul some food for the road, what can we describe? Another sentence: “In September 1944 Paul stayed at his friend Alice’s place; they had met around Easter the year before” We need at least two columns describing the attributes
Coding the sample sentence “In September 1944 Paul stayed at his friend Alice’s place; they had met around Easter the year before”
Values Notice that instead of text, the data contain numbers: easier to process afterwards Notice the 99 : this represents an unknown value What if we have multiple values? For example: “In September 1944 Paul stayed at his friend Alice’s place; Alice gave Paul forged documents for the road” Solution: Make another row to describe the second relation
Describing the actors Now we know that Alice helped Paul, but what can we tell about these people? Remember: Düring was interested in the helping of Jews, and self- help In a new sheet , we can describe the actors
Coding all sources Unfortunately, the source will rarely describe sentences like “Person A is connected to Persons B, C and D through relation X at time Y” So, a lot of close reading is required Moreover, when reading more sources, you will discover more actors and connections of interest, expanding your codes and forcing you to go back and update earlier coded sources
Let's try Let's try with the case study: http://bit.ly/neumantext Look up p15, Living underground and describe codes for the first 3 paragraphs
To Networks Now that we have structured data, we can create a network This is for next week!
For next time 13 December Who? Investigating the social entities in a corpus Reading: (see Moodle) • Weingart, S. (2013). Networks Demystified 8: When Networks are Inappropriate. http://www.scottbot.net/HIAL/?p=39600
Recommend
More recommend