sample complexity bounds for active learning
play

Sample Complexity Bounds for Active Learning Paper by Sanjoy - PowerPoint PPT Presentation

Sample Complexity Bounds for Active Learning Paper by Sanjoy Dasgupta Presenter: Peter Sadowski Passive PAC Learning Complexity Based on VC dimension To get error < with probability 1 : num samples O


  1. Sample Complexity Bounds for Active Learning Paper by Sanjoy Dasgupta Presenter: Peter Sadowski

  2. Passive PAC Learning Complexity � Based on VC dimension To get error < ǫ with probability ≥ 1 − δ : � � � num samples ≥ � O ǫ ( V C ( H ) log (1 /δ )) Is there some equivalent for active learning?

  3. Example: Reals in 1-D P=underlying distribution of points H=space of possible hypotheses w � 1 if x ≥ w H= { h w : w ∈ � } h w ( x ) = 0 if x < w O(1/ ǫ ) random labeled examples needed from P to get error rate < ǫ

  4. Example: Reals in 1-D � 1 if x ≥ w h w ( x ) = 0 if x < w w Passive learning: O(1/ ǫ ) random labeled examples needed from P to get error rate < ǫ Active learning (Binary Search): O(log 1 /ǫ ) examples needed to get error < ǫ Active learning gives us an exponential improvement!

  5. Example 2: Points on a Circle � P = some density on circle perimeter � H = linear separators in R 2 h � h � h �

  6. Example 2: Points on a Circle Worst case: small ǫ slice of the circle is different O(1/ ǫ ) � Passive learning: O(1/ ǫ ) � Active learning: No improvement!

  7. Active Learning Abstracted � Goal: Narrow down the version space , (hypotheses that fit with known labels � Idea: Think of hypotheses as points x=1 version space New version space if x=0 Observe x Cut made by Version space observing x

  8. Shrinking the Version Space � Define distance between hypotheses: d(h,h’)=P { x:h(x) � = h ′ ( x ) } � Ignore distances less than ǫ Q=H × H Q ǫ = { ( h, h ′ ) ∈ Q : d ( h, h ′ ) > ǫ } A good cut!

  9. Quick Example � What is the best cut? Q ǫ = { ( h, h ′ ) ∈ Q : d ( h, h ′ ) > ǫ }

  10. Quick Example � Cut edges => shrink version space After this cut, we have a solution! The hypotheses left are insignificantly different.

  11. Quantifying “Usefulness” of Points A point x ∈ X is said to ρ − split Q ǫ IF its label reduces the number of edges by a fraction ρ > 0 ¼-split 1-split ¾-split

  12. Quantifying the Difficulty of Problems Definition: Subset S of hypotheses is if ( ρ, ǫ, τ ))splittable P { x : x ρ )splits Q ǫ } ≥ τ ”At least a fraction of τ samples are ρ )useful in splitting S.” ρ small ⇒ smaller splits ǫ small ⇒ small error τ small ⇒ lots of samples needed to get a good split

  13. Lower Bound Result Suppose for some hypothesis space H: for some hypotheses � d( h � , h i ) > ǫ h � , h � , ..., h N { x : h � ( x ) � = h i ( x ) } � “disagree sets” are disjoint h � Then: For any τ and ρ > 1 /N , Q is not ( ρ, ǫ, τ ))splittable.

  14. An Interesting Result There is constant c > 0 such that for any dimension d ≥ 2, if 1. H is the class of homogeneous lenear separators in R d , and 2. P is the uniform distribution over the surface of the unit sphere, then H is (1 / 4 , ǫ, cǫ ))splittable for all ǫ > 0. ⇒ For any h ∈ H , any ǫ ≤ 1 / (32 π � √ d ), � � �� √ � B ( h, 4 ǫ ) is � , ǫ, 1 ǫ/ d )splittable.

  15. Conclusions � Active learning not always much better than passive. � “Splittability” is the VC dimension for active learning. � We can use this framework to fit bounds for specific problems.

Recommend


More recommend