KNOWLEDGE MANAGEMENT AND APPLICATIONS David Sánchez Department of Computer April 2013 Science and Mathematics
Tarragona 2
The university 3 Created in 1991 52 programmes of study Over 12,000 students
The faculty 4 Engineering degress Computer science Telematics Masters Computer Security and Intelligent Systems Artificial Intelligence Security of the Information and Communication technologies Doctoral program Computer Engineering
Research group 5 9 professors and lecturers 6 post doctoral researchers 7 Ph.D. students 7 Research assistants Data privacy and electronic commerce Privacy and security in mobile environments Private information recovery and codes
Contents 6 Introduction Knowledge acquisition Semantic operators Applications to privacy
Motivation 7 Numerical data is easy to manage and transform 3<4 = true (1+2)/2 = 1.5 {3, 2, 5} -> {2, 3, 5} A plethora of algorithms rely on aritmetical functions to deal with numerical data
Motivation 8 What about text? Car ¿>? bike (apple + orange) / 2 = ?? {flu, cold, pneumonia} -> {?, ?, ?} Arithmetical functions do not make sense Text (words, noun phrases) refers to concepts Concepts should be managed according to their formal semantics
Ontologies 9 Provide a structured representation of a shared conceptualization Elements Classes (concepts) Instances (individuals) Semantics Properties (semantic relationships) Restrictions (logical definition of meanings)
Contents 10 Introduction Knowledge acquisition Semantic operators Applications to privacy
Creating ontologies 11 Manually Knowledge formalization is challenging Knowledge can be subjective Time consuming Assisted Proactive knowledge modelling tools Wizards Reasoners to check knowledge consistency Knowledge engineering methods 101, METHONTOLOGY, On-To-Knowledge
Ontology learning 12 Semantics are implicitly referred in text Textual corpora can be analysed to acquire knowledge Discover concepts and individuals Discover and label relations Taxonomic ( cancer is a disease ) Non-taxonomic ( cancer is treated with radiotherapy ) Attributes ( cancer is non-contagious ) Discover restrictions Axioms ( Spain borders France -> France borders Spain )
Ontology learning from the Web 13 Corpora: the Web The largest electronic repository Heterogenous It approximates the distribution of information at a social scale Availability of massive IR tools: Web search engines
Knowledge discovery from text 14 NL processing tools to identify nouns, noun phrases and named entities Concepts and individuals Linguistic patterns to discover semantics Taxonomic “ cities such as (Nimes)”, “ cancers likes (melanoma)” Non taxonomic “ cancer is treated with (surgery)” Attributes “ camera has (10MP resolution)”, “ camera features (3x zoom)” Axioms (functionality, transitivity, symmetry, reflexibity, etc.) “ Spain borders France ”, “ France borders Spain ” -> Symmetry
Retrieval of suitable corpora 15 Create appropriate web search queries Taxonomic: “cities such as” […] Non taxonomic: “cancer is treated with” […] Attributes: “camera features” […] Axioms: “Spain borders” & “France borders”
Statistical assessment 16 Statistical assessor WSE page count approximates query probabilities at a social scale Use an association score to filter noisy extractions Point-wise mutual information
References 17 Taxonomic learning David Sánchez, Antonio Moreno: Pattern-based automatic taxonomy learning from the Web. AI Commununications 21(1): 27-48 (2008) Non-taxonomic learning David Sánchez, Antonio Moreno: Learning non-taxonomic relationships from web documents for domain ontology construction. Data & Knowledge Engineering 64(3): 600-623 (2008) Attribute learning David Sánchez: A methodology to learn ontological attributes from the Web. Data & Knowledge Engineering 69(6): 573-597 (2010) Axiom learning David Sánchez, Antonio Moreno, Luis Del Vasto Terrientes: Learning relation axioms from text: An automatic Web-based approach. Expert Systems with Applications 39(5): 5792-5805 (2012)
Contents 18 Introduction Knowledge acquisition Semantic operators Applications to privacy
Exploiting ontologies Structured knowledge enables a semantically-coherent interpretation of textual data by Defining semantically-grounded operators Semantic similarity is the most basic operator Similarity(apple, orange) > Similarity(apple, bike)
Semantic similarity 20 Semantic similarity Degree of taxonomical resemblance e.g ., dogs and cats are similar as they are mammals Semantic relatedness Other non taxonomic relationships are also considered e.g ., car and wheel or pencil and paper Similarity measures can be grouped in several families according to the type of knowledge exploited the principles in which similarity estimation relies
Ontology-based similarity 21
Edge-counting measures = ( , ) | min_ ( , ) | Distance a b path a b 22
IC-based measures = ( , ) ( ( , )) Sim a b IC LCS a b Least Common Subsumer (LCS) 23
IC-based semantic similarity 24 IC calculus relies on probability assessments = − ( ) log ( ) IC c p c Based on corpora Requires general and heterogeneous corpora Language ambiguity hampers results Data sparseness produce weak statistics
Ontology-based IC computation 25 Assumption: concepts with many hyponyms in an ontology are more probable to appear in corpora Concept probabilities are intrinsically approximated according to taxonomic knowledge Number of hyponyms ( ) log hyponyms c = − ( ) IC c ontology_size
Feature-based measures common_features(a,b) = ( , ) Sim a b disjoint_features(a,b) 26
References 27 Feature-based similarity measures Montserrat Batet, David Sánchez, Aïda Valls: An ontology-based measure to compute semantic similarity in biomedicine. Journal of Biomedical Informatics 44(1): 118-125 (2011) David Sánchez, Montserrat Batet, David Isern, Aïda Valls: Ontology-based semantic similarity: A new feature-based approach. Expert Systems with Applications 39(9): 7718-7728 (2012) IC-based similarity mesures Based on corpora David Sánchez, Montserrat Batet, Aïda Valls, Karina Gibert: Ontology-driven web-based semantic similarity. Journal of Intelligent Information Systems 35(3): 383-413 (2010) Based on ontologies David Sánchez, Montserrat Batet, David Isern: Ontology-based information content computation. Knowledge-Based Systems 24(2): 297-303 (2011) David Sánchez, Montserrat Batet: A New Model to Compute the Information Content of Concepts from Taxonomic Knowledge. International Journal on Semantic Web and Information Systems 8(2): 34-50 (2012) David Sánchez, Montserrat Batet: Semantic similarity estimation in the biomedical domain: An ontology-based information-theoretic perspective. Journal of Biomedical Informatics 44(5): 749-759 (2011)
Other semantic operators 28 Semantic similarity/distance is the base to develop other semantically-grounded operators over a sample of textual data Aggregation (mean/centroid) n ∑ = ( , ,..., ) arg min ( , ) Mean x x x distance c x 1 2 n c i = i 1
Aggregation 29 Sample colic lumbago lumbago migraine pain appendicitis gastritis Mean colic lumbago migraine appendicitis gastritis pain Sum candidates (1) (3) (2) (1) (1) (1) dist lumbago colic 0 3 3 4 4 1 24 migraine lumbago 3 0 2 5 5 2 19 migraine 3 2 0 5 5 2 21 appendicitis 4 5 5 0 2 3 34 gastritis 4 5 5 2 0 3 34 pain 1 2 2 3 3 0 17 ache 2 1 1 4 4 1 16 inflammation 3 4 4 1 1 2 27 symptom 2 3 3 2 2 1 22
Sorting algorithm 30 Algorithm. Sorting procedure Inputs: P (dataset) Output: P ’ ( P sorted) 1 Compute the mean of all values in P 2 Consider the most distant value f to the mean 3 Add f to P’ and remove it from P 4 while (| P | > 0) do 5 Obtain the least distant value r to f 6 Add r to P’ and remove it from P 7 Output P’
References 31 Sergio Martínez, Aïda Valls, David Sánchez: Semantically- grounded construction of centroids for datasets with textual attributes. Knowledge-Based Systems 35: 160-172 (2012) Sergio Martínez, David Sánchez and Aida Valls: A semantic framework to protect the privacy of electronic health records with non-numerical attributes. Journal of Biomedical Informatics 46(2): 294-303 Josep Domingo-Ferrer, David Sánchez, Guillem Rufian- Torrell: Anonymization of Nominal Data Based on Semantic Marginality. Information Sciences. To Appear
Contents 32 Introduction Knowledge acquisition Semantic operators Applications to privacy
Recommend
More recommend