mantistable
play

MantisTable an automatic approach for the Semantic Table - PowerPoint PPT Presentation

MantisTable an automatic approach for the Semantic Table Interpretation Marco Cremaschi, Roberto Avogadro, and David Chieregato Department of Computer Science, Systems and Communication (DISCo) University of Milano - Bicocca Semantic Table


  1. MantisTable an automatic approach for the Semantic Table Interpretation Marco Cremaschi, Roberto Avogadro, and David Chieregato Department of Computer Science, Systems and Communication (DISCo) University of Milano - Bicocca

  2. Semantic Table Interpretation: an example KNOWLEDGE GRAPH Mountain Mountain xsd:string xsd:integer Range TABLE Name Coordinates Height Range Schema level Entity level 45°49′57″N 06°51′52″E Mont Blanc 4808 Mont Blanc massif 45°49′57″N MontBlanc Lyskamm 45°55′20″N 07°50′08″E 4527 Pennine Alps Mont_Blanc 4808 06°51′52″E Massif Monte Cervino 45°58′35″N 07°39′31″E 4478 Pennine Alps georss:point dbo:elevation dbo:mountainRange Subject column (S-column) Named-Entity column (NE-column) A RDF* triple is a subject, predicate, and object Literal column (L-column) construct which makes data easily interlinked PREDICATE SUBJECT OBJECT 2 URI URI or Datatype

  3. Semantic Table Interpretation: enhanced approach (unsupervised, complete and automatic) 3 2 4 1 CONCEPT and 5 COLUMN PREDICATE DATA-PREPARATION DATATYPE ENTITY LINKING ANALYSIS ANNOTATION ANNOTATION 1. Data Preparation , which aims to prepare the data inside the table 2. Column Analysis , whose tasks are the semantic classification that assigns types to columns (NE-column or L-column), and the detection of the subject column (S-column) 3. Concept and Datatype Annotation , which deals with mappings between columns (or headers, if they are available) and semantic elements (concepts or datatypes) in a KG 4. Predicate Annotation , whose task is to find relations, in the form of predicates, between the main column and the other columns to set the overall meaning of the table 5. Entity Linking , which deals with mappings between cells and entities in a KG 3

  4. Semantic Table Interpretation: enhanced approach (unsupervised, complete and automatic) 3 2 4 1 CONCEPT and 5 COLUMN PREDICATE DATA-PREPARATION DATATYPE ENTITY LINKING ANALYSIS ANNOTATION ANNOTATION Data Preparation , which aims to prepare the data inside the table ● removal of HTML tags and stop Name Coordinates Height Range words ● transformation of the text into 45°49 ′ 57 ″ N 06° mont blanc 4808 mont blanc massif lowercase 51 ′ 52 ″ E ● resolution of acronyms and 45°55 ′ 20 ″ N abbreviation lyskamm 4527 pennine alps 07°50 ′ 08 ″ E ● normalization of units of 45°58 ′ 35 ″ N monte cervino 4478 pennine alps measurement by applying regular 07°39 ′ 31 ″ E expressions 4

  5. Semantic Table Interpretation: enhanced approach (unsupervised, complete and automatic) 3 2 4 1 CONCEPT and 5 COLUMN PREDICATE DATA-PREPARATION DATATYPE ENTITY LINKING ANALYSIS ANNOTATION ANNOTATION Column Analysis , whose tasks are the semantic classification that assigns types to columns (NE-column or L-column), and the detection of the subject column (S-column) ● Detection of L-columns by 16 regular S-column NE-column L-column expressions to identify regextype (e.g., geo coordinate, address, hex Name Coordinates Height Range color code, URL) 45°49 ′ 57 ″ N 06° mont blanc 4808 mont blanc massif ● Detection of S-column considers 51 ′ 52 ″ E different statistic features 45°55 ′ 20 ″ N lyskamm 4527 pennine alps 07°50 ′ 08 ″ E 45°58 ′ 35 ″ N monte cervino 4478 pennine alps 07°39 ′ 31 ″ E 5

  6. Semantic Table Interpretation: enhanced approach (unsupervised, complete and automatic) 3 2 4 1 CONCEPT and 5 COLUMN PREDICATE DATA-PREPARATION DATATYPE ENTITY LINKING ANALYSIS ANNOTATION ANNOTATION Concept and Datatype Annotation , which deals with mappings between columns (or headers, if they are available) and semantic elements (concepts or datatypes) in a KG ● Retrieval of a set of candidate entities Name Coordinates Height Range performing the entity-linking by 45°49 ′ 57 ″ N 06° searching the Knowledge Graph with mont blanc 4808 mont blanc massif 51 ′ 52 ″ E the content of a cell 45°55 ′ 20 ″ N ● Retrieval of abstract and concepts for lyskamm 4527 pennine alps 07°50 ′ 08 ″ E each item in the set of retrieved 45°58 ′ 35 ″ N entities monte cervino 4478 pennine alps 07°39 ′ 31 ″ E ● Application of heuristics for the identification of the most frequent concept of the column 6 MOUNTAIN PLACE HEIGHT MASSIF

  7. Semantic Table Interpretation: enhanced approach (unsupervised, complete and automatic) 3 2 4 1 CONCEPT and 5 COLUMN PREDICATE DATA-PREPARATION DATATYPE ENTITY LINKING ANALYSIS ANNOTATION ANNOTATION Concept and Datatype Annotation , which deals with mappings between columns (or headers, if they are available) and semantic elements (concepts or datatypes) in a KG Row of the Header of the Abstract of the entity table column inside the KG Text in the cell 7

  8. Semantic Table Interpretation: enhanced approach (unsupervised, complete and automatic) 3 2 4 1 CONCEPT and 5 COLUMN PREDICATE DATA-PREPARATION DATATYPE ENTITY LINKING ANALYSIS ANNOTATION ANNOTATION Predicate Annotation , whose task is to find relations, in the form of predicates, between the main column and the other columns to set the overall meaning of the table Name Coordinates Height Range ● The winning concept of the 45°49 ′ 57 ″ N 06° mont blanc 4808 mont blanc massif S-column are considered as the 51 ′ 52 ″ E subject of the relationship and 45°55 ′ 20 ″ N lyskamm 4527 pennine alps 07°50 ′ 08 ″ E annotations of the other columns as 45°58 ′ 35 ″ N monte cervino 4478 pennine alps objects 07°39 ′ 31 ″ E ● The Knowledge Graph is searched for the subject and the object to MOUNTAIN PLACE HEIGHT MASSIF collect possible predicates georss:point dbo:elevation e g n a R n i a t n u o 8 m : o b d

  9. Semantic Table Interpretation: enhanced approach (unsupervised, complete and automatic) 3 2 4 1 CONCEPT and 5 COLUMN PREDICATE DATA-PREPARATION DATATYPE ENTITY LINKING ANALYSIS ANNOTATION ANNOTATION Predicate Annotation , whose task is to find relations, in the form of predicates, between the main column and the other columns to set the overall meaning of the table [Zhang 2017] Predicate Contexts 9

  10. Semantic Table Interpretation: enhanced approach (unsupervised, complete and automatic) 3 2 4 1 CONCEPT and 5 COLUMN PREDICATE DATA-PREPARATION DATATYPE ENTITY LINKING ANALYSIS ANNOTATION ANNOTATION Entity Linking , which deals with mappings between cells and entities in a KG Name Coordinates Height Range ● Already discovered annotations are mont blanc mont blanc massif used to create a query for the 45°49 ′ 57 ″ N 06°51 ′ 52 ″ E 4808 dbr:Mont_Blanc dbr:Mont_Blanc_massif disambiguation of the cell content lyskamm 45°55 ′ 20 ″ N pennine alps ● If more than one entity is returned for 4527 dbr:Lyskamm 07°50 ′ 08 ″ E dbr:Pennine_Alps a cell, the one with a smaller edit monte cervino 45°58 ′ 35 ″ N pennine alps distance is taken dbr:Monte_Cervin 4478 07°39 ′ 31 ″ E dbr:Pennine_Alps o 10

  11. Semantic Table Interpretation: enhanced approach (unsupervised, complete and automatic) CTA CEA CPA Primary Secondary Primary Secondary Primary Secondary score score score score score score Round 1 .929 .933 Round 1 1 1 Round 1 .965 .991 Round 2 1.049 .247 Round 2 .614 .673 Round 2 .460 .544 Round 3 1.648 .269 Round 3 .633 .679 Round 3 .518 .595 Round 4 1.682 .322 Round 4 .973 .983 Round 4 .787 .841 Search for the path in the graph that links all the entities in the row 11

  12. ● Load tables in JSON format ● Download annotations (RDF/XML, N3, NTriples, Turtle and JSON-LD) ● Possibility to explore the output of each phase ● Manual annotation editing function ● Integration of the API provided by ABSTAT for auto-completion and suggestions MANTIS TABLE

  13. Department of Informatics, Systems and Communication (DISCo) Thank you Marco Cremaschi PhD Student@UNIMIB marco.cremaschi@unimib.it 13

Recommend


More recommend