bayesian networks and its overview
play

Bayesian Networks and ITS Overview Knowledge acquisition is hard - PowerPoint PPT Presentation

1 Bayesian Networks and ITS Overview Knowledge acquisition is hard in general, 2 and not well understood. It is time consuming, when everything is to be hand-coded. Can the machine automatically gather the needed information?


  1. 1 Bayesian Networks and ITS

  2. Overview ● Knowledge acquisition is hard in general, 2 and not well understood. ● It is time consuming, when everything is to be hand-coded. ● Can the machine automatically gather the needed information? ● Machine Learning ● A number of approaches now available. ● Do they work well?

  3. Machine learning outputs Quality? What to Other Core System Do? Observables Control Box Changes Inputs

  4. Machine Learning Rule Reinforcement Induction Learning Fully …...... Rote Case Automated Learning Based Example Discovery (being told) System Based Learning Version Space Learning Inductive Neural Logic Networks Programming

  5. Learn What? ● Domain knowledge – Correct knowledge – concepts, dependencies, rules, etc – Misconceptions – Perturbation model – Interventions ● Student model ● T utoring model

  6. …. ● Student model inducing from behaviour records , test records, etc. ● Interventions can improve with records of past cases. ● Perturbation model used to generate misconceptions. – Machine learning? ● Bayesian networks – a probability model of a domain, probabilities change with time... – Learning!

  7. Overview... ● Uncertainty is fundamental to education! ● Our knowledge of the learning process, access to learner's state of knowledge, and also where he/she is going. ● Strength of ITS is in effective prediction of learner's step, and choosing right action. ● Modelling uncertainty is critical. – What kind of uncertainty? – What kind of model?

  8. Uncertainty Fuzzy Certainty Logic Non-monotonic Factors Logics & reasoning Dempster Shafer Non-numeric theory Models Dependency Probability models Networks – Bayesian networks

  9. Uncertainty ● Given the current state of the world, form beliefs on the student knowledge level. ● Given knowledge level, decide action against a situation. ● Selection of next problem. ● Probability models may be a good start. – Bayesian networks

  10. Bayesian Networks ● Causal Concept networks with attached probability. ● Bayesian methods are capable of handling noisy and incomplete information. ● Bayes's theorem to save us from massive probability computations.

  11. 11 Basic rules  Probability P(A) >=0; P(A) = 1 – P(not A);  Conditional probability P(A | B)= P(A ∧ B) / P(B) if P(B)≠0  Product rule P(A ∧ B) = P(A|B) P(B)  Bayes’ Rule : P(A|B)= P(B|A).P(A) / P(B) P(A|B) = P(A ∧ B) / P(B)

  12. 12 Computing posterior probability from Full joint distribution P(Cavity|T oothache)= ?

  13. 13 ... P(A|B) = P(A ∧ B) / P(B) oothache) = P(Cavity ∧ T P(Cavity|T oothache) / P(T oothache) P(Cavity ∧ T oothache) = 0.04 P(T oothache) = ? P(C|T) = ?

  14. The problem ● P(I1|o1,o2,o3,o4...,on) – given current state of network, what is my estimate of using intervention I1? ● I need full joint probability of all variables. ● 1) Cannot derive P(A|B,C) from P(A|B) and P(A|C) – Unless .... ● 2) P(A|B) not easy in general – But P(B|A) may be easier

  15. 15 Independence ● T wo random variables A B are (absolutely) independent iff P(A^B)=P(A)P(B) – If n Boolean variables are independent, the full joint is P(X1,…,Xn)= Π iP(Xi) T wo random variables A, B given C are conditionally independent iff P(A^B|C)=P(A|C)*P(B|C)

  16. …. P(I|A,B) = P(A,B|I) * P(I) / P(A,B) = P(A|I)*P(B|I)*P(I) / P(A)*P(B)

  17. Data availability ● P(solves_p1 | knows_c) is easier than P(knows_c | solves_p1) ● Bayes' theorem provides to build one from the other.

  18. Bayesian Network ● Network of probability influencers! ● Nothing else will influence: markov assumption. – => all else are conditionally independent. ● Every node has associated CP distribution as a function of its parents. – P(~X) = 1 – P(X) ● Information can flow in any direction.

  19. 19 Network  Each concept is represented by a node in the graph.  A directed edge from one concept to another is added if knowledge of the former is a prerequisite for understanding the latter

  20. 20 CPD for For-loop P(For-Loop | Variable Asgn, Rel Ops, Incr/Decr Oper)

  21. 21 Belief network example Neighbors John and Mary promised to call if the alarm goes off. Sometimes alarm starts because of earthquake. If Alarm went off, what is the probability of burglary? Variables: Burglary , Earthquake , Alarm , JohnCalls , MaryCalls (n=5 variables) Network topology reflects “causal” knowledge

  22. 22 Belief network example – cont.

  23. 23 Semantics in belief networks ● In a BN, the full joint distribution is defined as the product of the local conditional distributions: P(X1,…,Xn)= Π P(Xi|Parents(Xi)) for i=1 to n e.g: P(J ∧ M ∧ A ∧¬ B ∧¬ E) is given by? = P( ¬ B)P( ¬ E)P(A| ¬ B ∧¬ E)P(J|A)P(M|A) = = 0.999 x 0.998 x 0.001 x 0.90 x 0.70 = = 0.000062 Each node is conditionally independent of its descendents given its parents

  24. Example... ● P(M|B)? ● = P(M|A)*P(A)+P(M|~A)*P(~A) ● P(A) = P(A|B,E) * P(B)*P(E) + P(A|B,~E) * P(B)*P(~E) + P(A|~B,E) * P(~B)*P(E)+ P(A|~B,~E) * P(~B)*P(~E) = P(A|B,E) *P(E) + P(A|B,~E) *P(~E)

  25. 25 Building a BBN ● Expert centric – Human expert creates the structure and the probability values – Can guess, where real value not available. – Hidden nodes are a problem! ● Data centric – Use population data from real trial, etc – Approaches vary on what is constructed from data. ● Efficiency centric – A combination, using domain knowledge to increase efficiency.

  26. Using a B.N. ● Diagnostic reasoning – Given leaf nodes, predict prob of intermediate or root nodes. ● Predictive reasoning – Given root nodes, etc predict prob of intermediate nodes and leaf nodes. ● Explaining away – Sibling propagation – earthquake knowledge helps “reduce” probability of burglary, given alarm.

  27. Andes’ Bayesian network ● Andes’ Bayesian networks encode two kinds of knowledge: – domain-general knowledge: encompassing general concepts and procedures that define proficiency in Newtonian physics – Need to stay across sessions. – task-specific knowledge: encompassing knowledge related to a student performance on a specific problem or example – Can be removed at end of task.

  28. 28 The domain-general part  The domain-general part of the stud. model consists of  Rule nodes  Context-Rule nodes  A student has mastered a rule when he/she is able to apply it correctly in all possible contexts (problems).  Rule nodes have binary values T and F, indicating the probability that each rule is mastered or not.  Context-Rule nodes represent mastery of physics rules in specific problem solving contexts .

  29. 29 The task-specific part ● The task-specific part of the Bayesian student model contains four types of nodes: – Fact, – Goal, – Rule-application and – Strategy nodes ● Fact and Goal nodes represent information that is derived while solving a problem by applying rules from the knowledge base. ● Goal and Fact nodes have binary values T and F indicating whether they are do-able (by the student). ● They have as many parents as there are ways to derive them.

  30. 30 In Andes ● Andes uses its rules to solve each physics problem in all possible ways, and accumulates all possible derivations of the correct answer. ● The derivations are collected in a data structure called solution graph. ● The consolidated solution graph for the full Andes system runs into thousands of nodes... too heavy for BN update, etc. ● BN can handle only propositional information, general solution graph nodes are first order.

  31. Dynamic BN ● Dynamic BN is when node scenario changes with time. ● Andes uses a version of BN for solving this problem. ● Each problem mapped to a different solution graph, and hence a different Bayesian network. – Fully propositional! ● The Bayesian networks for different tasks are completely distinct and share no nodes. ● However, the prior probabilities of the domain-general nodes are set to the probabilities of the domain general nodes from the network of the preceding exercise.

  32. 32 Rule-application nodes ● Rule-application nodes connect Context- Rule nodes, Strategy nodes and Proposition nodes to new derived Proposition nodes. ● The nodes have values indicating whether they are Doable or Not-doable. ● The node is Doable if the student has applied or can apply the corresponding Context-Rule correctly.

  33. 33 Strategy nodes  Strategy nodes represent points where the student can choose among alternative plans to solve a problem.  These are the only non-binary nodes in the network: they have as many values as there are alternative plans.  The node is always paired with a Goal node, and it is used when there is more than one mutually exclusive way to address the goal.

  34. 34 A physics problem and a segment of the corresponding solution graph

  35. 35 Probabilistic student modeling

  36. Updating SM ● Values may be changed depending on number and type of hints used. ● Number of mistakes made. – “Guess” probability? ● Skipped steps – how to attribute credit for the involved knowledge elements? – Use of other related elements can help. ● Multiple Rule applications for a node that has been reached. – Sharing credit among them...

Recommend


More recommend