knowledge management and applications
play

KNOWLEDGE MANAGEMENT AND APPLICATIONS David Snchez Department of - PowerPoint PPT Presentation

KNOWLEDGE MANAGEMENT AND APPLICATIONS David Snchez Department of Computer April 2013 Science and Mathematics Tarragona 2 The university 3 Created in 1991 52 programmes of study Over 12,000 students The faculty 4


  1. KNOWLEDGE MANAGEMENT AND APPLICATIONS David Sánchez Department of Computer April 2013 Science and Mathematics

  2. Tarragona 2

  3. The university 3  Created in 1991  52 programmes of study  Over 12,000 students

  4. The faculty 4  Engineering degress  Computer science  Telematics  Masters  Computer Security and Intelligent Systems  Artificial Intelligence  Security of the Information and Communication technologies  Doctoral program  Computer Engineering

  5. Research group 5  9 professors and lecturers  6 post doctoral researchers  7 Ph.D. students  7 Research assistants  Data privacy and electronic commerce  Privacy and security in mobile environments  Private information recovery and codes

  6. Contents 6  Introduction  Knowledge acquisition  Semantic operators  Applications to privacy

  7. Motivation 7  Numerical data is easy to manage and transform  3<4 = true  (1+2)/2 = 1.5  {3, 2, 5} -> {2, 3, 5}  A plethora of algorithms rely on aritmetical functions to deal with numerical data

  8. Motivation 8  What about text?  Car ¿>? bike  (apple + orange) / 2 = ??  {flu, cold, pneumonia} -> {?, ?, ?}  Arithmetical functions do not make sense  Text (words, noun phrases) refers to concepts  Concepts should be managed according to their formal semantics

  9. Ontologies 9  Provide a structured representation of a shared conceptualization  Elements  Classes (concepts)  Instances (individuals)  Semantics  Properties (semantic relationships)  Restrictions (logical definition of meanings)

  10. Contents 10  Introduction  Knowledge acquisition  Semantic operators  Applications to privacy

  11. Creating ontologies 11  Manually  Knowledge formalization is challenging  Knowledge can be subjective  Time consuming  Assisted  Proactive knowledge modelling tools  Wizards  Reasoners to check knowledge consistency  Knowledge engineering methods  101, METHONTOLOGY, On-To-Knowledge

  12. Ontology learning 12  Semantics are implicitly referred in text  Textual corpora can be analysed to acquire knowledge  Discover concepts and individuals  Discover and label relations  Taxonomic ( cancer is a disease )  Non-taxonomic ( cancer is treated with radiotherapy )  Attributes ( cancer is non-contagious )  Discover restrictions  Axioms ( Spain borders France -> France borders Spain )

  13. Ontology learning from the Web 13  Corpora: the Web  The largest electronic repository  Heterogenous  It approximates the distribution of information at a social scale  Availability of massive IR tools: Web search engines

  14. Knowledge discovery from text 14  NL processing tools to identify nouns, noun phrases and named entities  Concepts and individuals  Linguistic patterns to discover semantics  Taxonomic  “ cities such as (Nimes)”, “ cancers likes (melanoma)”  Non taxonomic  “ cancer is treated with (surgery)”  Attributes  “ camera has (10MP resolution)”, “ camera features (3x zoom)”  Axioms (functionality, transitivity, symmetry, reflexibity, etc.)  “ Spain borders France ”, “ France borders Spain ” -> Symmetry

  15. Retrieval of suitable corpora 15  Create appropriate web search queries  Taxonomic: “cities such as” […]  Non taxonomic: “cancer is treated with” […]  Attributes: “camera features” […]  Axioms: “Spain borders” & “France borders”

  16. Statistical assessment 16  Statistical assessor  WSE page count approximates query probabilities at a social scale  Use an association score to filter noisy extractions  Point-wise mutual information

  17. References 17  Taxonomic learning  David Sánchez, Antonio Moreno: Pattern-based automatic taxonomy learning from the Web. AI Commununications 21(1): 27-48 (2008)  Non-taxonomic learning  David Sánchez, Antonio Moreno: Learning non-taxonomic relationships from web documents for domain ontology construction. Data & Knowledge Engineering 64(3): 600-623 (2008)  Attribute learning  David Sánchez: A methodology to learn ontological attributes from the Web. Data & Knowledge Engineering 69(6): 573-597 (2010)  Axiom learning  David Sánchez, Antonio Moreno, Luis Del Vasto Terrientes: Learning relation axioms from text: An automatic Web-based approach. Expert Systems with Applications 39(5): 5792-5805 (2012)

  18. Contents 18  Introduction  Knowledge acquisition  Semantic operators  Applications to privacy

  19. Exploiting ontologies  Structured knowledge enables a semantically-coherent interpretation of textual data by  Defining semantically-grounded operators  Semantic similarity is the most basic operator  Similarity(apple, orange) > Similarity(apple, bike)

  20. Semantic similarity 20  Semantic similarity  Degree of taxonomical resemblance  e.g ., dogs and cats are similar as they are mammals  Semantic relatedness  Other non taxonomic relationships are also considered  e.g ., car and wheel or pencil and paper  Similarity measures can be grouped in several families according to  the type of knowledge exploited  the principles in which similarity estimation relies

  21. Ontology-based similarity 21

  22. Edge-counting measures = ( , ) | min_ ( , ) | Distance a b path a b 22

  23. IC-based measures = ( , ) ( ( , )) Sim a b IC LCS a b Least Common Subsumer (LCS) 23

  24. IC-based semantic similarity 24  IC calculus relies on probability assessments = − ( ) log ( ) IC c p c  Based on corpora  Requires general and heterogeneous corpora  Language ambiguity hampers results  Data sparseness produce weak statistics

  25. Ontology-based IC computation 25  Assumption: concepts with many hyponyms in an ontology are more probable to appear in corpora  Concept probabilities are intrinsically approximated according to taxonomic knowledge  Number of hyponyms ( ) log hyponyms c = − ( ) IC c ontology_size

  26. Feature-based measures common_features(a,b) = ( , ) Sim a b disjoint_features(a,b) 26

  27. References 27 Feature-based similarity measures   Montserrat Batet, David Sánchez, Aïda Valls: An ontology-based measure to compute semantic similarity in biomedicine. Journal of Biomedical Informatics 44(1): 118-125 (2011)  David Sánchez, Montserrat Batet, David Isern, Aïda Valls: Ontology-based semantic similarity: A new feature-based approach. Expert Systems with Applications 39(9): 7718-7728 (2012) IC-based similarity mesures   Based on corpora  David Sánchez, Montserrat Batet, Aïda Valls, Karina Gibert: Ontology-driven web-based semantic similarity. Journal of Intelligent Information Systems 35(3): 383-413 (2010)  Based on ontologies  David Sánchez, Montserrat Batet, David Isern: Ontology-based information content computation. Knowledge-Based Systems 24(2): 297-303 (2011)  David Sánchez, Montserrat Batet: A New Model to Compute the Information Content of Concepts from Taxonomic Knowledge. International Journal on Semantic Web and Information Systems 8(2): 34-50 (2012)  David Sánchez, Montserrat Batet: Semantic similarity estimation in the biomedical domain: An ontology-based information-theoretic perspective. Journal of Biomedical Informatics 44(5): 749-759 (2011)

  28. Other semantic operators 28  Semantic similarity/distance is the base to develop other semantically-grounded operators over a sample of textual data  Aggregation (mean/centroid)   n ∑ =   ( , ,..., ) arg min ( , ) Mean x x x distance c x 1 2 n c i   = i 1

  29. Aggregation 29 Sample colic lumbago lumbago migraine pain appendicitis gastritis Mean colic lumbago migraine appendicitis gastritis pain Sum candidates (1) (3) (2) (1) (1) (1) dist lumbago colic 0 3 3 4 4 1 24 migraine lumbago 3 0 2 5 5 2 19 migraine 3 2 0 5 5 2 21 appendicitis 4 5 5 0 2 3 34 gastritis 4 5 5 2 0 3 34 pain 1 2 2 3 3 0 17 ache 2 1 1 4 4 1 16 inflammation 3 4 4 1 1 2 27 symptom 2 3 3 2 2 1 22

  30. Sorting algorithm 30 Algorithm. Sorting procedure Inputs: P (dataset) Output: P ’ ( P sorted) 1 Compute the mean of all values in P 2 Consider the most distant value f to the mean 3 Add f to P’ and remove it from P 4 while (| P | > 0) do 5 Obtain the least distant value r to f 6 Add r to P’ and remove it from P 7 Output P’

  31. References 31  Sergio Martínez, Aïda Valls, David Sánchez: Semantically- grounded construction of centroids for datasets with textual attributes. Knowledge-Based Systems 35: 160-172 (2012)  Sergio Martínez, David Sánchez and Aida Valls: A semantic framework to protect the privacy of electronic health records with non-numerical attributes. Journal of Biomedical Informatics 46(2): 294-303  Josep Domingo-Ferrer, David Sánchez, Guillem Rufian- Torrell: Anonymization of Nominal Data Based on Semantic Marginality. Information Sciences. To Appear

  32. Contents 32  Introduction  Knowledge acquisition  Semantic operators  Applications to privacy

Recommend


More recommend