The Codex BUILDING A GRAPH OF HISTORY
What is Codex? v Text-as-a-Graph with the aim to achieve the deep integration of text and data v Manages standoff property documents linked to graph entities v Graph meta-model defined by composition from statements and meta-relations v An easily extensible system where new annotations and new entities can be integrated v Text annotations are high resolution and multidimensional
Technology Neo4jClient API SPEED Y Neo4jClient-Vector extensions
Core entities q Texts/Standoff Properties : annotations q Agents : entities built up through composition q Statements : events, traits (“aspect-oriented ontology”) q Meta-Relations : dynamic, bi-directional, hierarchical (higher-order) q Properties: agent data-points (latitude, longitude, height, weight, etc) q Time : fuzzy dates (degrees of precision) q Concepts : shared vocabulary (the glue); hierarchical labels
Text as a graph The graph representation of the text can itself be stored in a graph database, like Neo4j. Using standoff properties, graph queries (using the Cypher language) are mappable directly to the parts of the text in question. Because properties are overlappable, a text can be graphed along many axes, whether it be entities, events, concepts, commentaries, or even ASTs generated by NLPs.
Text as a Graph The goal ◦ To fully annotate a text you need to be able to represent overlapping ranges The problem ◦ In HTML/XML a range is represented by a node or an element ◦ However, HTML/XML can only represent nodes in a tree structure, whereas annotations cross nod es and are better mapped to graphs, not trees ◦ XML elements are document-based rather than connected in a database The solution ◦ Separate the text and annotations, don’t embed properties in the text like HTML/XML does ◦ A property is a data-structure that represents a text range (e.g., 0 -> 10) and a type (e.g., ‘italics’, ‘place’, ‘person’) ◦ Properties that are not embedded in text are called ‘ standoff properties ’ ◦ Standoff property nodes act as connective tissue between the plain text and the graph meta-model
XML overlap
XML overlap
XML overlap
Standoff properties 1. Don’t suffer from the XML overlap problem 2. Stored externally from the text stream, which is left in a plain format 3. Supports annotations inside words , of single characters , and between characters 4. Annotation layers (or strata ) can be exported or imported 5. Supports multidimensional queries across overlapping annotations and layers
Standoff property graph
Meta- Meta-relations are Ø User-created relations Ø Bi-directional Ø No need to choose between “parent_of” or “child_of” in the model Ø One query can bring back both parties Ø Each part is a noun rather than a verb (“parent” vs “parent_of”) Ø Link to a relationship graph for higher-order relationships Ø Family: parent/child; sibling; married; son-in-law/mother-in-law Ø Friendship: friend; close friend; girlfriend/boyfriend; correspondent
Meta-relations - IsDominant: true/false
Statements A statement is a quasi-grammatical complex used to represent simple events or predicates, roughly resembling an RDF triple or WikiData SNAK. Ø Composed (optionally) of one or more Agents , a Concept , and a Time node Ø Construction is quasi-grammatical as parts are related mainly with prepositions (SUBJECT, OBJECT, WITH, AT, ON, UNDER, NEAR, etc) Ø E.g., Subject: The Arno action: was in flood at: Florence according-to: Luca Landucci on: 1498/11/24
Statements - AgentRole: SUBJECT; OBJECT; AT; WITH; ACCORDING-TO; etc
The statement complex can also be used to express ontological claims . Aspect- While the claim that the Renaissance preacher “ Girolamo Savonarola is a man ” can be represented with an “is a” oriented relationship, what about other aspects of Savonarola? An “is a” relation tells us nothing about who made the claim or ontology when it was made. It also conditions us towards class-type classifications which conform to ontologies. We can use trait statements to capture aspects …
Neo4jClient - Switches seamlessly between REST and BOLT interfaces - Cypher expression builder (keywords mostly) - Safe parameterisation of values - Powerful deserialiser turns JSON results into complex objects
Neo4jClient-Vector - Adds extension methods to Neo4jClient - Adds Vector<> class to generate Cypher paths - Node labels - Relationship types and directions - Reversible relationships (subset_of_concept <-> children_of_concept) - Overloadable relationships (text_has_standoff_property -> text_has_sentence_standoff_property) - Removes the noise to put the focus on the path patterns - (lex)-(head), (t)-(asp)-(a)
Neo4jClient-Vector If(condition, func) makes it simpler to branch Cypher expressions based on form input …
Pagination Extending the projection class from Search<T> enables the results to be paginated. An ORDER BY expression builder simplifies conditional ordering.
Contact If you have any questions, or would be interested in trialling Codex as an alpha-user, I can be contacted as below. Iian Neill ◦ Email: iian.d.neill@gmail.com ◦ Twitter: codexeditor
The Hunger Games Q1. What is a standoff property? Q2. What is a meta-relation? Q3. What is a Codex statement? https://docs.google.com/forms/d/e/1FAIpQLSeiY5YBj2ir_Jmi4f6GaIbyAzQcAZjYLsAJ4r- Wbr5zScJAww/viewform?usp=sf_link
Recommend
More recommend