Active Learning SPiNCOM reading group Sep. 30 th , 2016 Dimitris - PowerPoint PPT Presentation

Active Learning SPiNCOM reading group Sep. 30 th , 2016 Dimitris Berberidis 1

A toy example: Alien fruits  Consider alien fruits of various shapes  Train classifier to distinguish safe fruits from dangerous ones  Passive learning: Training data are given by uniform sampling and labeling  Our setting  Obtaining labels costly  Unlabeled instances easily available 2

A toy example: alien fruits  What if we sample fruits smartly instead of randomly?  can be identified with using far fewer samples 3

Active learning General Goal : For a given budget of labeled training data, maximize learner’s accuracy by actively selecting which instances (feature vectors) to label (“query”).  Active learning (AL) scenarios considered Query synthesis Selective sampling Pool-based sampling First to be considered, More general, Ideal for online settings often not applicable OUR FOCUS with streaming data 4

Roadmap  Uncertainty sampling  Searching the hypothesis space  Query by disagreement  Query by committee  Expected error minimization  Expected error reduction  Variance reduction  Batch queries and submodularity  Cluster-based AL  AL + semi-supervised learning  A unified view  Conclusions Burr Settles, “Active Learning”, Synthesis lectures on AI and ML , 2012. 5

Uncertainty sampling  Most popular AL method: Intuitive, easy to implement the key  Support vector classifier: uncertain about points close to decision boundary 6

Measures of uncertainty  Uncertainty of label as modeled by (e.g. for l.r.) where Least confident: Least margin: Highest entropy:  Limitation: Utility scores based on output of single (possibly bad) hypothesis. 7

Searching through the hypothesis space  Instance points in correspond to hyperplanes in  Version space : Subset of all hypotheses consistent with tr. data • Max. margin methods (e.g. SVMs) lead to hypotheses in center of • Labeling instances close to decision hyperplane approx. bisects • Instances that greatly reduce the volume of are of interest. 8

Query by disagreement  One of the oldest AL algorithms [Cohn et al., ‘94]  “Store” version space implicitly with following trick  Limitations: Too complex, all controversial instances treated equally 9

Query by committee  Independently train a committee of hypotheses.  Label instance most controversial among committee members Vote entropy: Soft vote entropy: KL divergence:  Key difference: VE cannot distinguish between case (a) and (b) 10

Information theoretic interpretation  Ideally maximize information between label r.v. and  Problem can be reformulated in more convenient form Measures disagreement  Uncertainty sampling focuses on maximizing  QBC approximates second term with and  Another alternative formulation (recall KL-based QBC) QBC approximates: 11

Bound on label complexity  Label complexity for passive learning ( assume ) To achieve one needs where is expected error rate and VC dimension measures complexity of  Dis. coef. : Quantifies how fast the reg. of disagreement shrinks  QBD achieves logarithmically lower label complexity (if does not explode ) 12

Alien fruit example: A problematic case  Candidate queries A and B both bisect (appear equally informative)  However, generalization error depends on the (ignored) distribution of input  Generally: Both unc. sampling and QBD may suffer high generalization error 13

Expected error reduction  Ideally select query by minimizing expected generalization error Retrained model using  Less stringent objective: Expected log-loss  (Extremely) high complexity required to retrain model for each candidate 14

Variance reduction  Learners expected error can be decomposed Noise Bias Variance  Noise is ind. of training data and bias is due to model class (e.g. linear model)  Focus on minimizing variance of predictions of unlabeled data  Question: Can we minimize variance without retraining?  Design of experiments approach (typically for regression) 15

Optimal experimental design  Fisher information matrix (FIM) of model Fisher score  Covariance of parameter estimates lower bounded by  A-optimal design: Additive property of FIM  Can easily be adapted to minimize variance of predictions Fisher information ratio  FIM can be efficiently updated using the Woodberry matrix identity 16

Batch queries and submodularity  Query a batch of instances  Not necessarily the individually best  Key is to avoid correlated instances  Submodularity property for functions over sets ( )  Greedy approach on submodular function guarantees:  Maximizing the variance difference can be submodular  For linear regression FIM is ind. of (offline computation !) 17

Density-weighted methods  Back to classification  Pathological case: Least confident (most uncertain) instance is an outlier  B in fact more informative than A  Error and variance reduction less sensitive to outliers but costly  Information density heuristic  Instances more representative of input distribution are promoted Information Similarity measure utility score (e.g. Eucledian distance) (e.g. entropy) 18

Hierarchical cluster-based AL  Assist AL by clustering the input space  Obtain data and find initial coarse clustering  Query instances from different clusters  Iteratively refine clusters so that they become more “pure”  Focus querying on more impure clusters  Working assumption: Cluster structure is correlated with label structure  If not, above algorithm degrades to random sampling 19

Active and semi-supervised learning  Two approaches are complementary  AL minimizes labeling effort by querying most informative instances  Semi-sup. learning exploits latent structure (unlabeled) to improve accuracy  Self training is complementary to uncertainty sampling [Yarowsky, ‘95]  Co-training complementary to QBD [Blum and Mitchel, ‘98]  Entropy regularization complementary to error reduction w. log-loss 20

Unified view (I)  Ideal: Maximize total gain in information  Since true label is unknown, one resorts to  Approximations lead to uncertainty sampling heuristic Uncertainty sampling 21

Unified view (II)  A different approximation Depends on current state of and is unchanged for all queries  Log-loss minimization and variance-reduction target the above measure  Approximation given by density weighted methods 22

Overview 23

Practical considerations  Real labeling costs Cost of annotating Cost of prediction specific query  Multi-task AL (multiple labels per instance)  Skewed label distributions (class imbalance)  Unreliable oracles (e.g. labels given by human experts)  When AL is used training data are biased to model class  If unsure about model, random sampling may be preferable 24

Conclusions  AL allows for sample (label) complexity reduction  Simple heuristics: Uncertainty sampling, QBD,QBC, cluster-based AL  High complexity near-optimal methods: Expected error/variance reduction  Encompasses optimal experimental design  Linked to semi-supervised learning Information-theoretic interpretations   Possible research directions  Use of AL methods in learning over graphs (GSP, classification over graphs)  Use o MCMC and IS to approx. posterior in complex models (e.g. BMRF) 25

Active Learning SPiNCOM reading group Sep. 30 th , 2016 Dimitris - PowerPoint PPT Presentation

Active Learning SPiNCOM reading group Sep. 30 th , 2016 Dimitris Berberidis 1 A toy example: Alien fruits Consider alien fruits of various shapes Train classifier to distinguish safe fruits from dangerous ones Passive learning:

Agenda Intro to Active Learning Activity Design Resources for Active Learning Lunch with Active

The Active Card An Active Mind in an Active Body More people, More Active, More often! The

Active Adversary Lecture 7 CCA Security MAC Active Adversary Active Adversary An active

Multi-Task Active Learning Yi Zhang Outline Active Learning Multi-Task Active Learning

Learning Loss for Active Learning Rymarczyk D., Zieliski B., Tabor J., Sadowski M., Titov M.

Partnership event 21 st November 2019 Welcome #ActiveBradford Active Bradford Members Active

MAC. SKE in Practice. Lecture 5 Active Adversary Active Adversary An active adversary can

Active Learning Passive Learning Active Learning 1. Think 1. Acquisition of knowledge Ability

Active Threat on Campus Prevention & Response Active threat defined An active threat can be

Active Learning with Active Learning with Model Selection Neil Rubens Sugiyama Lab / Tokyo

Predator and Prey: Active Learning Is Social Learning Kenneth Ronkowitz Active Learning

Active Learning: Rethinking Our Teaching to Promote Deeper Learning Facilitated by Ken Silvestri,

Active Transport Active Transport Requires Energy Why does active transport require energy?

Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University Tova Milo Tel

Active Adversary Lecture 7 CCA Security MAC Active Adversary An active adversary can inject

The Active Versus Passive Management Debate In Defense of Active Management Thierry Roncalli

Machine learning theory Active learning Hamid Beigy Sharif university of technology June 13,

What makes a good lecturer? Active Learning Richard Anderson University of Washington June 30,

The patients The patients and how that may impact the provider/patient relationship Identify

Learning as Tools for Biosecurity Education Jo L. Husbands Scholar/Senior Project Director

to promote student conceptual understanding Sahana Murthy IDP in Educational Technology Indian

reduced in active learning of agent strategies? Cline Hocquette & Stephen H. Muggleton

HYBRID CLASSROOM USING HIGH-TECH/HIGH-TOUCH Roneet Merkin Florida International University

Active Learning for Classification with Abstention Shubhanshu Shekhar 1 University of California,

Active Learning SPiNCOM reading group Sep. 30 th , 2016 Dimitris - PowerPoint PPT Presentation

Active Learning SPiNCOM reading group Sep. 30 th , 2016 Dimitris Berberidis 1 A toy example: Alien fruits Consider alien fruits of various shapes Train classifier to distinguish safe fruits from dangerous ones Passive learning:

Agenda Intro to Active Learning Activity Design Resources for Active Learning Lunch with Active

The Active Card An Active Mind in an Active Body More people, More Active, More often! The

Active Adversary Lecture 7 CCA Security MAC Active Adversary Active Adversary An active

Multi-Task Active Learning Yi Zhang Outline Active Learning Multi-Task Active Learning

Learning Loss for Active Learning Rymarczyk D., Zieliski B., Tabor J., Sadowski M., Titov M.

Partnership event 21 st November 2019 Welcome #ActiveBradford Active Bradford Members Active

MAC. SKE in Practice. Lecture 5 Active Adversary Active Adversary An active adversary can

Active Learning Passive Learning Active Learning 1. Think 1. Acquisition of knowledge Ability

Active Threat on Campus Prevention &amp; Response Active threat defined An active threat can be

Active Learning with Active Learning with Model Selection Neil Rubens Sugiyama Lab / Tokyo

Predator and Prey: Active Learning Is Social Learning Kenneth Ronkowitz Active Learning

Active Learning: Rethinking Our Teaching to Promote Deeper Learning Facilitated by Ken Silvestri,

Active Transport Active Transport Requires Energy Why does active transport require energy?

Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University Tova Milo Tel

Active Adversary Lecture 7 CCA Security MAC Active Adversary An active adversary can inject

The Active Versus Passive Management Debate In Defense of Active Management Thierry Roncalli

Machine learning theory Active learning Hamid Beigy Sharif university of technology June 13,

What makes a good lecturer? Active Learning Richard Anderson University of Washington June 30,

The patients The patients and how that may impact the provider/patient relationship Identify

Learning as Tools for Biosecurity Education Jo L. Husbands Scholar/Senior Project Director

to promote student conceptual understanding Sahana Murthy IDP in Educational Technology Indian

reduced in active learning of agent strategies? Cline Hocquette &amp; Stephen H. Muggleton

HYBRID CLASSROOM USING HIGH-TECH/HIGH-TOUCH Roneet Merkin Florida International University

Active Learning for Classification with Abstention Shubhanshu Shekhar 1 University of California,

Active Threat on Campus Prevention & Response Active threat defined An active threat can be

reduced in active learning of agent strategies? Cline Hocquette & Stephen H. Muggleton