howtokb mining howto knowledge from online communities
play

HowToKB: Mining HowTo Knowledge from Online Communities Cuong Chu, - PowerPoint PPT Presentation

HowToKB: Mining HowTo Knowledge from Online Communities Cuong Chu, Niket Tandon , Gerhard Weikum MPI Saarbruecken Allen Institute for AI MPI Saarbruecken task frame : How to paint a wall HowToKB: Mining HowTo Knowledge from Online


  1. HowToKB: Mining HowTo Knowledge from Online Communities Cuong Chu, Niket Tandon , Gerhard Weikum MPI Saarbruecken Allen Institute for AI MPI Saarbruecken task frame : How to paint a wall

  2. HowToKB: Mining HowTo Knowledge from Online Communities Cuong Chu, Niket Tandon , Gerhard Weikum MPI Saarbruecken Allen Institute for AI MPI Saarbruecken task frame : How to paint a wall A ttributes Edges

  3. Related work on HowTo knowledge acquisition Input Representation

  4. Related work on HowTo knowledge acquisition Generic Input - Tasks are not Yang et. al SIGIR’15 semantic frames Representation Syntactic OpenIE structures ConceptNet Semantic Reduced expressivity expressivity PropBank Domain specific

  5. Related work on HowTo knowledge acquisition Generic Input Yang et. al SIGIR’15 Representation Syntactic OpenIE structures ConceptNet Semantic Reduced expressivity expressivity PropBank VerbNet FrameNet Knowlywood Domain specific

  6. Related work on HowTo knowledge acquisition Generic Input Yang et. al HowToKB SIGIR’15 Representation Syntactic OpenIE Schank’75 Fillmore’76 structures Minsky’74 ConceptNet Semantic Reduced expressivity expressivity PropBank VerbNet FrameNet Knowlywood Domain specific

  7. Related work on HowTo knowledge acquisition Generic Input Yang et. al HowToKB SIGIR’15 Representation Syntactic OpenIE Schank’75 Fillmore’76 structures ConceptNet Minsky’74 Semantic Reduced expressivity expressivity PropBank VerbNet FrameNet - No phrases/tasks Knowlywood - manually populated Domain specific Message: HowToKB’s knowledge representation is different.

  8. Related work on HowTo knowledge acquisition Task Model

  9. Related work on HowTo knowledge acquisition Schema based Task Semantic Frame parsing Model Semantic Role Labeling Unsupervised Supervised OpenIE Syntactic Schema free structures extraction

  10. Related work on HowTo knowledge acquisition Schema based Task Semantic Frame parsing HowToKB Model Semantic Knowlywood Role Labeling - mapped to WordNet, closed sense repository Unsupervised Supervised OpenIE Syntactic Schema free structures extraction Message: our task is different.

  11. WikiHow: our input dataset

  12. WikiHow: our input dataset Task Sub task Previous task Sub task Next task … Participating objects

  13. WikiHow: our input dataset Task Sub task Images, Previous task videos Sub task Next task … Participating objects Message: WikiHow data is very rich, and can be exploited.

  14. System overview Frame Frame WikiHow HowToKB construction organization

  15. System overview Frame Frame WikiHow HowToKB construction organization Stage 1a: convert unstructured articles to structured task frame Stage 1b: sequencing task frames Novel knowledge representation

  16. System overview Frame Frame WikiHow HowToKB construction organization Stage 2: organize the sequenced task frames. Novel hierarchical organization with distributional senses

  17. KB construction: extraction § OpenIE naturally suits task frame construction; easy mapping to task attributes attribute OpenIE mapping location location time time Participating agent subject Participating object subject/object

  18. KB construction: extraction § OpenIE naturally suits task frame construction; easy mapping to task attributes § Attribute type-checking increases precision from 75% to 97% attribute OpenIE type-checking mapping head � WordNet location location WN-noun time time WN-time Participating agent subject WN-living Participating object subject/object WN-nonliving Message: Type checking helps to postprocess OpenIE results.

  19. KB construction: extraction § OpenIE naturally suits task frame construction; easy mapping to task attributes 1.2 million task frames § Attribute type-checking increases precision from 75% to 97% attribute OpenIE type-checking mapping WN: WordNet location location Noun phrase time time WN-time Participating agent subject WN-living Participating object subject/object WN-nonliving Message: 1.2M task frames are isolated from each other.

  20. Why KB organization? task paint wall task paint ceiling participating brush, paint, .. participating paint, roller, .. object object sub-task clean the surface, sub-task clean the surface, dip the roller.. dip the roller.. Message: KB organization is essential for: a) better redundancy: aggregated frames: paint wall, paint ceiling

  21. Why KB organization? Task use keyboard Category Iphone, Mac, Music listening Android Windows Music appreciation Visuals Message: KB organization is essential for: a) better redundancy: aggregated frames: paint wall, paint ceiling b) disambiguation of tasks: use keyboard– piano? or, computer?

  22. Approach to KB organization § For the 1.2 million frames, press keystrokes use keyboard the number of clusters is unknown. § Hierarchical clustering is natural, but expensive

  23. Approach to KB organization § For the 1.2 million frames, press keystrokes use keyboard the number of clusters is unknown. § Hierarchical clustering is natural, but expensive Expected organization use keyboard, use keyboard use keyboard, press keystrokes press keystrokes

  24. Approach to KB organization § For the 1.2 million frames, press keystrokes use keyboard the number of clusters is unknown. § Hierarchical clustering is natural, but expensive We propose a two stage-clustering, Stage 1: coarse-grained clustering Stage 2: fine-grained clustering use keyboard, use keyboard use keyboard, press keystrokes press keystrokes

  25. Preparing for clustering: Multi-dimensional similarity model Attribute Task frame f 1 Task frame f 2 ! "#$ %1 $'() , %2 $'() color the room Task title paint a wall ∗ ! "#$ (%1 ./0. , %2 ./0. ) ceiling Location house, wall, … bedroom, ceiling,… … … ... Category home & garden house decoration

  26. Preparing for clustering: Multi-dimensional similarity model Attribute Task frame f 1 Task frame f 2 ! "#$ %1 $'() , %2 $'() color the room Task title paint a wall ∗ ! "#$ (%1 ./0. , %2 ./0. ) ceiling Location house, wall, … bedroom, ceiling,… … … ... Category home & garden house decoration Finally, logistic regression over the attributes, A 234 ! 5 , ! # = 238493: (; < + > ; ? ! ? @ 5 , @ # ) ?B5

  27. Preparing for clustering: Multi-dimensional similarity model Attribute Task frame f 1 Task frame f 2 ! "#$ %1 $'() , %2 $'() color the room Task title paint a wall ∗ ! "#$ (%1 ./0. , %2 ./0. ) ceiling Location house, wall, … bedroom, ceiling,… … … ... Category home & garden house decoration Message: Our task frame pairs are dissimilar with an empirical Finally, logistic regression over the attributes, A confidence of 99.9% if a combination of their categorical and 234 ! 5 , ! # = 238493: (; < + > ; ? ! ? @ 5 , @ # ) lexical similarity is less than a threshold ?B5

  28. Coarse-grained clustering 1.2 million use use mac press Efficient task frames keyboard keystrokes keyboard Hash Based grouping use press Lexical grouping keyboard keystrokes 375K groups

  29. Coarse-grained clustering 1.2 million use use mac press Efficient task frames keyboard keystrokes keyboard Hash Based grouping use press Lexical grouping keyboard keystrokes 375K groups Fewer pairs, Distributional use keyboard, efficient top-k grouping press keystrokes similarity 200K groups Message: Pruning helps to efficiently reduce the search space.

  30. Fine-grained clustering 1.2 million use use mac press … task frames keyboard keystrokes keyboard Lexical grouping use press 375K groups keyboard keystrokes use keyboard, Distributional grouping press keystrokes 200K groups Allows fast, parallel Final clusters hierarchical clustering

  31. Recap of system architecture Frame Frame WikiHow HowToKB construction organization

  32. Resulting HowToKB § 0.5 million grouped task frames, § Avg. per frame: 12 attributes values, 2 images § Precision > 85% Wilson confidence intervals

  33. Resulting HowToKB § 0.5 million grouped task frames, As ground truth, turkers § § Avg. per frame: 12 attributes values, 2 images fill “very likely” attribute § Precision > 85% values for 150 frames Example: § In some context such as decorate the house, the most likely location when we paint a wall is ____ Wilson confidence intervals Message: HowToKB maintains high precision at large-scale.

  34. Usecase: finding YouTube videos for a HowTo task Task query YouTube video make caramel corn Gourmet Caramel Popcorn “Thanks Monique”

  35. Usecase: finding YouTube videos for a HowTo task Expansion using Task query frames (attributes, YouTube video edges) make caramel corn brown sugar ... popcorn ... Gourmet Caramel Popcorn syrup teaspoon.. ... “Thanks Monique” bake soda ... vanilla ..

Recommend


More recommend