[PPT] - A unifying computational framework for teaching and active learning PowerPoint Presentation

SLIDE 1

A unifying computational framework for teaching and active learning

Scott Cheng-Hsin Yang, Wai Keen Vong, Yue Yu & Patrick Shafto

SLIDE 2

World Learner

Active learning

SLIDE 3

Teacher

Teaching

World Learner

SLIDE 4

Self as teacher

Self-teaching

World Learner

SLIDE 5

World: h* Learner

Active learning

PL(x) x y PL(h|x,y) intervene

bserve

consequence update belief

h1 h2 h3 h4 h1 h2 h3 h4 step 0 step 1 h1 h2 h3 h4 step 2

active learning strategy

SLIDE 6

World: h* Learner

Teaching

x,y PL(h|x,y)

Teacher

PT(x,y|h*) PL(x) Shafto et al. 2008, 2014 teaching strategy active learning strategy update belief show Teacher knows y and h*; learner does not.

PL(h|x, y) ∝ PT (x, y|h)PL(h) PT (x, y|h) ∝ PL(h|x, y)PT (x, y)

learner’s inference teacher’s selection

SLIDE 7

World: h* Learner

Teaching (marginalize out y)

x y PL(h|x,y)

Teacher

PT(x|h*) PL(x) Yang & Shafto 2017 teaching strategy (y marginalized) active learning strategy show

bserve

consequence update belief

PL(h|x, y) ∝ P(y|x, h)PT (x|h)PL(h) PT (x|h) = X

y∈Y

PT (x, y|h)

learner’s inference teacher’s selection

SLIDE 8

World: h* Learner

Knowledgeability (marginalize out “h”)

Teacher

Shafto, Eaves, et al. 2012

h1 h2 h3 h4 g1 1/4 1/4 1/4 1/4 g2 1/4 1/4 1/4 1/4 g3 1/4 1/4 1/4 1/4 g4 1/4 1/4 1/4 1/4

δST(g|h) = PL(h): truth learner’s belief

PT (x|h) = X

g∈H

PT (x|g)δ(g|h) PT (x) = X

g∈H

PT (x|g)PL(g)

PT(x|h*)

h1 h2 h3 h4 g1 1 g2 1 g3 1 g4 1

δ(g|h): truth teacher’s belief teaching strategy (y marginalized) PL(x) active learning strategy PT(x) = PL(x) teaching strategy (y & h* marginalized) =

SLIDE 9

World: h* Learner

Self-teaching

PT(x) = PL(x) x y PL(h|x,y) self-teaching

PL(h|x, y) = P(y|x, h)PT (x)PL(h) P

h02H P (y|x, h0) PT (x)PL (h0)

PT (x) = X

g2H

PT (x|g)PL(g)

learner’s inference self-teacher’s selection

SLIDE 10

How is the Self-Teaching model different from the most common model of active learning objective —optimizing for expected information gain? Does the Self-Teaching model capture human’s active learning behavior?

SLIDE 11

Meta-reasons about oneself

as the teacher

Reasons about the world

EIG(x) = H(h) − X

y∈Y

PL(y|x)H(h|x, y)

<latexit sha1_base64="qafWQDz31JGEjQxfXGriyYGcUIM=">ACFnicbVDLSgMxFM3UV62vqks3wSK0YMuMFNSFUBSxgosK9iFtGTJp2oZmMkOSkQ7TfoUbf8WNC0Xcijv/xvSx0NYDFw7n3Mu9zg+o1KZ5rcRW1hcWl6JrybW1jc2t5LbOxXpBQKTMvaYJ2oOkoRTsqKkZqviDIdRipOr2LkV9IEJSj9+p0CdNF3U4bVOMlJbsZPby+irdz8AzWEx3MzALGzJw7ShsUA7vhyX7Jh0O+hntDfqHYcZOpsycOQacJ9aUpMAUJTv51Wh5OHAJV5ghKeuW6atmhISimJFhohFI4iPcQx1S15Qjl8hmNH5rCA+0oJtT+jiCo7V3xMRcqUMXUd3ukh15aw3Ev/z6oFqnzQjyv1AEY4ni9oBg8qDo4xgiwqCFQs1QVhQfSvEXSQVjrJhA7Bmn15nlSOclY+d3qbTxXOp3HEwR7YB2lgWNQAEVQAmWAwSN4Bq/gzXgyXox342PSGjOmM7vgD4zPH3RmnI=</latexit>

Self-Teaching Expected information gain

PT (x) = X

g∈H

X

y∈Y

PL(g|x, y)PT (x, y) Z(g) PL(g)

<latexit sha1_base64="cqcFkR/GCh4kNstgZEF+ICQVjI=">ACM3icbVBNS8MwGE7n9/yaevQSHMIGMloR1IMgehHxMG56TZKmqVdWJqWJBVL7X/y4h/xIgHRbz6H0xrDzp9IeR5n+d9SN7HCRmVyjSfjdLE5NT0zOxceX5hcWm5srJ6KYNIYNLCAQtEx0GSMpJS1HFSCcUBPkOI21ndJzp7RsiJA34hYpD0veRx6lLMVKasiunTfuidluHB7AnI9OvB7l8CQtujrnTnCoSTpn1W8+5ut+J6btJ3mlzXvHqaC3W7UjUbZl7wL7AKUAVFNe3KY28Q4MgnXGpOxaZqj6CRKYkbSci+SJER4hDzS1ZAjn8h+ku+cwk3NDKAbCH24gjn705EgX8rYd/Skj9RQjmsZ+Z/WjZS7108oDyNFOP5+yI0YVAHMAoQDKghWLNYAYUH1XyEeIp2P0jGXdQjW+Mp/weV2w9p7J/vVA+PijhmwTrYADVgV1wCE5AE7QABvfgCbyCN+PBeDHejY/v0ZJReNbArzI+vwASwqjF</latexit>

Uses only the rules of

probability

Also uses entropy and

subtraction

Hypothesis testing for

distinctive hypothesis

Overall uncertainty reduction

SLIDE 12

Self-teaching: confirming distinctive h

A distinctive hypothesis is

ne that is on average less

likely to be inferred if all interventions and

bservations are equally

likely to occur.

Z(g) = X

y∈Y

X

x∈X

PL(g|x, y)PT (x, y)

Distinctiveness Learner’s posterior h1 h2 h3 h4 x1 y0 x1 y1 x2 y0 x2 y1 x3 y0 x3 y1 Self-teaching probability x2 x3

*

x1

PT (x) = X

g∈H

PT (x|g)PL(g) = X

g∈H

X

y∈Y

PL(g|x, y)PT (x, y)PL(g)Z(g)−1

<latexit sha1_base64="ohAqT4yxP6i/L/LHqRm7pZ9r6b8=">ACznichVLdatswGJXdtWuzrUvby918LAwS2I9AksvCoXeBLqLDPLTLU6DrCipiCy7khyaOqa3e7d9QH2HpMdb0uTwT4QHJ1zPh39+RFnSjvOo2XvPNvde75/UHrx8tXh6/LRcU+FsS0S0IeyisfK8qZoF3NKdXkaQ48Dnt+7OLTO/PqVQsFB29iOgwFPBJoxgbahR+WcbRpBAB1Kowh3UAM7AU3GQs1PwmAvwPqGYG6IlrGl8LRlaWy1gvuc9n8/6v80Reb+te1lN8rLk3Se2OtbaSvc3/TAb4V6Dr54KalUbni1J28YBu4Baigotqj8g9vHJI4oEITjpUauE6khwmWmhFO05IXKxphMsNTOjBQ4ICqYZI/RwrvDOGSjNEBpydr0jwYFSi8A3zuzQalPLyH9pg1hPmsOEiSjWVJBV0CTmoEPI3hbGTFKi+cIATCQzewVygyUm2vyA7BLczSNvg97Hutuon35pVM6bxXsozfoLaoiF31C56iF2qiLiHVp3Vr3VmK37bmd2g8rq20VPSfoSdnfwGscqr</latexit>

SLIDE 13

How is the Self-Teaching model different from the most common model of active learning objective —optimizing for expected information gain? Does the Self-Teaching model capture human’s active learning behavior?

SLIDE 14

Boundary game

? ? ? task

SLIDE 15

Causal graph learning

? ? Coenen et al. 2015 task

SLIDE 16

Coenen, Rehder, & Gureckis. (2015). Strategies to intervene on causal systems are adaptively selected. Cognitive psychology, 79, 102-133.

Human choices Expected information gain

icti

Self-Teaching model Expected information gain

SLIDE 17

Collaborators

Wai Keen Vong Yue Yu Patrick Shafto Yang, Vong, Yu & Shafto. (2019). A unifying computational framework for teaching and active learning. Topics in Cognitive Science 11(2): 316-337.

Conclusions

We derived a Self-Teaching model, a novel form of active learning.
It depends on only the rules of probability (may have implications for

active machine learning).

It unifies teaching and active learning under a single learning mechanism.
It matches human’s active learning behavior in many cases.