An Investigation of Why Overparameterization Exacerbates Spurious Correlations Shiori Sagawa* Aditi Raghunathan* Pang Wei Koh* Percy Liang
Models can latch onto spurious correlations Misleading heuristics; might work on most training examples but may not always hold up input : bird image ML label: bird type model waterbird vs landbird Sagawa et al. (2020), Wah et al. (2011), Zhou et al. (2017)
Models can latch onto spurious correlations Misleading heuristics; might work on most training examples but may not always hold up input : bird image spurious correlation: water background ML prediction : waterbird model ✓ waterbird true label : Sagawa et al. (2020), Wah et al. (2011), Zhou et al. (2017)
Models can latch onto spurious correlations Misleading heuristics; might work on most training examples but may not always hold up input : bird image spurious correlation: land background ML prediction : landbird model ✕ true label : waterbird Sagawa et al. (2020), Wah et al. (2011), Zhou et al. (2017)
Models can latch onto spurious correlations input : face image ML label: hair color model blonde hair vs dark hair true label : Sagawa et al. (2020), Liu et al. (2015)
Models can latch onto spurious correlations input : face image spurious correlation: gender ML prediction : dark hair model ✕ true label : blonde hair Sagawa et al. (2020), Liu et al. (2015)
Models can latch onto spurious correlations label: object waterbird landbird water background spurious attribute: background ma majority minority land background minority ma majority Sagawa et al. (2020)
Models perform well on average label: object waterbird landbird water background spurious attribute: background error 0.05 0. 05 0. 0.21 21 land background avg 0. 0.40 40 0. 0.004 004 average error: 0.03 Sagawa et al. (2020)
But models can have high worst-group error label: object waterbird landbird water background spurious attribute: background error 0. 0.05 05 0.21 0. 21 land background avg worst group 0. 0.40 40 0. 0.004 004 worst-group error: 0.40 Sagawa et al. (2020)
Approaches for improving worst-group error fail on high-capacity models • Upweight minority groups: Lo Low-ca capacity models High-ca Hi capacity models Label y Label y error error 1 -1 1 -1 ✓ ✓ ✓ X Attribute a Attribute a 1 1 ✓ ✓ ✓ X avg worst avg worst -1 -1 group group • More robust to spurious correlation • Relies on spurious correlation • Low worst-group error • High worst-group error Sagawa et al. (2020)
Overparameterization hurts worst-group error for models trained with the reweighted objective av average error wo worst-gr grou oup error or Overparameterized is better than Overparameterized is worse than underparameterized underparameterized Our work: Ou : why do does es over erpa parame meter erization n exacer erba bate e worst-gr grou oup error or?
Overview 1. Empirical results 2. Analytical model and theoretical results 3. Subsampling
Overparameterization exacerbates worst- group error ResNet10 Logistic regression on random features
Intuition: overparameterized models learn the spurious attribute and memorize minority groups Overparameterized non-generalizable “memorizing” generalizable minority majority ! = 1 ! = −1 ! = 1 ! = −1 $ = 1 $ = −1 $ = −1 $ = 1
Overview 1. Empirical results 2. Analytical model and theoretical results 3. Subsampling
Toy example: data majority y 1 -1 1 a -1 Majority fraction minority
Toy example: data Spurious-to-core information ratio (SCR) spurious core co sp
Toy example: data For large N>>n, can be “memorized“ … spurious core noise co no sp
Toy example: linear classifier model • Logistic regression • In overparameterized regime, equivalent to … ma max-ma margin cl classifier spurious core noise co no sp
Worst-group error is provably higher in the overparameterized regime Th Theorem (informal). For any High High SCR majority , fraction there exists such that for all , with high probability, High worst-group error for overparameterized However, with and in the asymptotic regime with , Low worst-group error for underparameterized
Underparameterized models need to learn the core feature to achieve low reweighted loss learning core learning spurious ✓ low reweighted loss X high reweighted loss
In overparameterized regime, minimum-norm inductive bias favors less memorization No Norm scales with the number of points “m “memorized” learning core learning spurious memorizing outliers memorizing minority many examples memorized few examples memorized ✓ low norm X high norm
Intuition: memorizing as few examples as possible under the min-norm inductive bias mo model y 1 -1 1 a -1 Tr Train error
Learn spurious à memorize minority, low norm mo model y 1 -1 0 1 1 a 1 0 -1 Tr Train error
Learn spurious à memorize minority, low norm mo model y 1 -1 0 1 1 a 1 0 -1 Tr Train error points to memorize
Learn spurious à memorize minority, low norm ✓ low norm mo model y 1 -1 0 0 1 a 0 0 -1 Tr Train error points to memorize
Learn core à memorize more, high norm mo model y 1 -1 >0 >0 1 a >0 >0 -1 Tr Train error
Learn core à memorize more, high norm model mo y 1 -1 >0 >0 1 a >0 >0 -1 Train error Tr points to memorize
Learn core à memorize more, high norm X high norm model mo y 1 -1 0 0 1 a 0 0 -1 Train error Tr points to memorize
Overview 1. Empirical results 2. Simulations on synthetic data 3. Subsampling
Reweighting vs subsampling upweighting up ng su subsa sampling ! = 1 ! = 1 $ = 1 $ = 1 ! = −1 $ = −1 ! = −1 $ = −1 ! = 1 $ = −1 ! = 1 $ = −1 ! = −1 $ = 1 ! = −1 $ = 1 # examples # examples Reduces majority fraction • Lowers memorization cost of • learning the core feature Chawla et al. (2011)
Reweighting vs subsampling upweighting up ng su subsa sampling ! = 1 ! = 1 $ = 1 $ = 1 ! = −1 $ = −1 ! = −1 $ = −1 ! = 1 $ = −1 ! = 1 $ = −1 ! = −1 $ = 1 ! = −1 $ = 1 # examples # examples Chawla et al. (2011)
Subsampling the majority group à overparameterization helps worst-group error Upweighting Subsampling Potential tension between using all of the data vs. using large overparameterized models. Both help average error, but can’t have both for good worst-group error.
Thanks! Shiori Pang Wei Percy Aditi Thanks! Sagawa* Koh* Liang Raghunathan* Thank you to Yair Carmon, John Duchi, Tatsunori Hashimoto, Ananya Kumar, Yiping Lu, Tengyu Ma, and Jacob Steinhardt Funded by Open Philanthropy Project Award, Stanford Graduate Fellowship, Google PhD Fellowship, Open Philanthropy Project AI Fellowship, and Facebook Fellowship Program.
Recommend
More recommend