Search for Appropriate Textual Information Sources Adam ALBERT 1 , MARIE DUŽÍ 1 , Marek MENŠÍK 1 , Miroslav PAJR 2 , Vojtěch PATSCHKA 1 1 VSB-Technical University Ostrava, Department of Computer Science FEI 17. listopadu 15, 708 33 Ostrava, Czech Republic 2 Silesian University in Opava, Institute of Computer Science, Bezručovo nám. 13, 746 01 Opava, Czech Republic
Problem to be solved • One aspect of globalization is the dissemination of knowledge • There is a huge amount of information in the textual resources • Search for relevant information resources in the labyrinth of input textual data • For instance, by googling ‘cat’ we obtain: • Approximate number of results 3 180 000 000, of these types: • Computer-assisted translation • Well-known excavator brand • Animal • Too much information information overload
How to deal with the problem • Our system generates explications of the concept in question extracted from many textual resources • Background theory – Transparent Intensional Logic (TIL) • Procedural semantics • Concepts are defined as meaning procedures • Explication of a concept is a molecular procedure defining the object in question; • “Cat is a feline animal” ‘ Cat = w t x [[ ‘Feline ‘Animal ] wt x ] explicandum explication • Based on a chosen explication the system computes and recommends the most relevant textual resources • By applying a data mining method of association rules
Search for Appropriate Textual Information Sources • Input : an atomic concept (explicandum) + textual resources • Extraction and TIL formalization of sentences that mention the concept in question (explicandum) • Generating Carnapian explications • Machine-learning methods applied to the formalized sentences • Results -- molecular concepts, i.e. closed TIL constructions, that explicate the atomic concept • Evaluation of the results (relevant documents can be overlooked in large amount of data): • Checking inconsistencies and/or looking for similarities, etc. • Based on associations between the constituents of the molecular concepts the algorithm computes and recommends other relevant resources
Machine learning (generating explications) • Symbolic method of supervised machine learning • Based on positive / negative examples - inserting or adjusting constituents of a molecular concept • Three heuristic methods: • Negative example Specialization inserts negated concepts. • Positive example Refinement inserts new constituents into the molecular construction learned so far • Generalization adjusts the constituents .
Example; explication of Wild Cat [‘ Typ-p w t x [[‘ [‘ Weight wt x ] ‘11] [‘ [‘ Weight wt x ] ‘1.2]] [‘ Wild ‘ Cat ]] [‘ Req ‘ Mammal [‘ Wild ‘ Cat ]] [‘ Req ‘ Has-fur [‘ Wild ‘ Cat ]] [‘ Typ-p w t x [[‘ [[‘ Average ‘ Body-Length ] wt x ] ‘80] [‘ [[‘ Average ‘ Body-Length ] wt x ] ‘47]] [‘ Wild ‘ Cat ]] [‘ Typ-p w t x [‘= [[‘ Average ‘ Skull-Size ] wt x ] ’41.25] [‘ Wild ‘ Cat ]] [‘ Typ-p w t x [‘= [[‘ Average ‘ Height ] wt x ] ’37.6] [‘ Wild ‘ Cat ]]
Association rule 𝐵 ⟹ 𝐶 • Association between items occurring in a dataset that satisfies a predefined minimal support and confidence. • Support is an indication of how frequently the itemset appears in the dataset. 𝑢 ∈ 𝐸: 𝐵 ⊆ 𝑢 𝑡𝑣𝑞𝑞(𝐵) = 𝐸 • Confidence is an indication of how we can rely on the validity of the rule 𝑑𝑝𝑜𝑔 𝐵 ⟹ 𝐶 = 𝑡𝑣𝑞𝑞 𝐵 ∪ 𝐶 𝑡𝑣𝑞𝑞 𝐵 • By computing the rules that are valid at least with user-defined minimal confidence, the algorithm proposes other textual resources that might be relevant as well.
Incidence matrix 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 e 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 e 2 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 e 3 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 e 4 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 1 0 e 5 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 1 0 e 6 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 1 0 e 7 1 1 1 0 0 0 0 0 1 0 1 0 0 1 1 0 0 e 8 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 5. 𝜇𝑥 𝜇𝑢 𝜇𝑦 ′≥ ′𝐵𝑤𝑓𝑠𝑏𝑓 ′𝐶𝑝𝑒𝑧 − 𝑀𝑓𝑜𝑢ℎ 𝑥𝑢 𝑦 ′47 1. ′𝑁𝑏𝑛𝑛𝑏𝑚 6. 𝜇𝑥 𝜇𝑢 𝜇𝑦 ′≤ ′𝐵𝑤𝑓𝑠𝑏𝑓 ′𝐶𝑝𝑒𝑧 − 𝑀𝑓𝑜𝑢ℎ 𝑥𝑢 𝑦 ′80 2. ′𝐼𝑏s − fur 3. 𝜇𝑥 𝜇𝑢 𝜇𝑦 ′≤ ′𝑋𝑓𝑗ℎ𝑢 𝑥𝑢 𝑦 ′11 7. 𝜇𝑥 𝜇𝑢 𝜇𝑦 ′= ′𝐵𝑤𝑓𝑠𝑏𝑓 ′𝑇𝑙𝑣𝑚 − 𝑇𝑗𝑨𝑓 𝑥𝑢 𝑦 ′41.25 4. 𝜇𝑥 𝜇𝑢 𝜇𝑦 ′≥ ′𝑋𝑓𝑗ℎ𝑢 𝑥𝑢 𝑦 ′1.2 8. 𝜇𝑥 𝜇𝑢 𝜇𝑦 ′= ′𝐵𝑤𝑓𝑠𝑏𝑓 ′𝐼𝑓𝑗ℎ𝑢 𝑥𝑢 𝑦 ′37.6
Simple example of the computed results Wild-cat : a tomic concept that has been explicated • eight resources and thus eight explications • User voted for the first one ( e1 ) , biological explication (mammal, weight, body length, skull size, etc.) • Confidence = 0.66 • The system computed s 4 and s 7 describing behaviour of wild cats {′ 𝑁𝑏𝑛𝑛𝑏𝑚 ,′ 𝐼𝑏 s-fur} ⟹ 𝑓 1 { 𝜇𝑥 𝜇𝑢 𝜇𝑦 [[′ 𝑈𝑓𝑠 - 𝑁𝑏𝑠𝑙𝑗𝑜𝑥𝑢 𝑦 ′ 𝐷𝑚𝑏𝑥𝑗𝑜 ] ∨ [′ 𝑈𝑓𝑠 - 𝑁𝑏𝑠𝑙𝑗𝑜𝑥𝑢 𝑦 ′ 𝑉𝑠𝑗𝑜𝑏𝑢𝑗𝑜 ] ∨ [′ 𝑈𝑓𝑠 - 𝑁𝑏𝑠𝑙𝑗𝑜𝑥𝑢 𝑦 ′ 𝑀𝑓𝑏𝑤𝑓𝑡 - 𝐸𝑠𝑝𝑞𝑞𝑗𝑜𝑡 ]]}
Thank you for your attentio ion
Recommend
More recommend