Ran Raz Princeton University Based on joint works with: Sumegha - PowerPoint PPT Presentation

Learning Fast Requires Good Memory: Time-Space Tradeoff Lower Bounds for Learning Ran Raz Princeton University Based on joint works with: Sumegha Garg, Gillat Kol, Avishay Tal [R16, KRT17, R17, GRT18]

This Talk: A line of recent works studies time-space (memory-samples) lower bounds for learning [S14, SVW16, R16, VV16, KRT17, MM17, R17, MM18, BOGY18, GRT18, DS18, AS18, DKS19, SSV19, GRT19, GKR19] Main Message: For some learning problems, access to a relatively large memory is crucial. In other words, in some cases, learning is infeasible, due to memory constraints

Original Motivation: Online Learning Theory: Initiated by: [Shamir 2014], [Steinhardt-Valiant-Wager 2015]: Can one prove unconditional lower bounds on the number of samples needed for learning, under memory constraints? (when each sample is viewed only once - also known as online learning)

Example: Parity Learning: 𝒚 = (𝒚 𝟐 , … , 𝒚 𝒐 ) ∈ 𝑺 𝟏, 𝟐 𝒐 is unknown A learner gets a stream of random linear equations (mod 2) in 𝒚 𝟐 , … , 𝒚 𝒐 , one by one, and tries to learn 𝒚 Formally: The learner gets a stream of samples: 𝒃 𝟐 , 𝒄 𝟐 , 𝒃 𝟑 , 𝒄 𝟑 … , where ∀𝒖 : 𝒃 𝐮 ∈ 𝑺 𝟏, 𝟐 𝒐 and 𝒄 𝒖 = 𝒃 𝒖 ⋅ 𝒚 (inner product mod 2) The learner needs to solve the equations and find 𝒚 (no noise)

Ready to Play? 𝒚 = (𝒚 𝟐 , 𝒚 𝟑 , 𝒚 𝟒 , 𝒚 𝟓 , 𝒚 𝟔 ) is unknown

Ready to Play? 𝒚 = (𝒚 𝟐 , 𝒚 𝟑 , 𝒚 𝟒 , 𝒚 𝟓 , 𝒚 𝟔 ) is unknown 𝒃 𝟐 = 𝟐, 𝟐, 𝟏, 𝟐, 𝟐 , 𝒄 𝟐 = 𝟏 𝒚 𝟐 + 𝒚 𝟑 + 𝒚 𝟓 + 𝒚 𝟔 = 𝟏 (mod 2)

Ready to Play? 𝒚 = (𝒚 𝟐 , 𝒚 𝟑 , 𝒚 𝟒 , 𝒚 𝟓 , 𝒚 𝟔 ) is unknown 𝒃 𝟑 = 𝟏, 𝟐, 𝟐, 𝟏, 𝟏 , 𝒄 𝟑 = 𝟏 𝒚 𝟑 + 𝒚 𝟒 = 𝟏 (mod 2)

Ready to Play? 𝒚 = (𝒚 𝟐 , 𝒚 𝟑 , 𝒚 𝟒 , 𝒚 𝟓 , 𝒚 𝟔 ) is unknown 𝒃 𝟒 = 𝟏, 𝟏, 𝟐, 𝟐, 𝟐 , 𝒄 𝟒 = 𝟏 𝒚 𝟒 + 𝒚 𝟓 + 𝒚 𝟔 = 𝟏 (mod 2)

Ready to Play? 𝒚 = (𝒚 𝟐 , 𝒚 𝟑 , 𝒚 𝟒 , 𝒚 𝟓 , 𝒚 𝟔 ) is unknown 𝒃 𝟓 = 𝟏, 𝟐, 𝟐, 𝟐, 𝟏 , 𝒄 𝟓 = 𝟏 𝒚 𝟑 + 𝒚 𝟒 + 𝒚 𝟓 = 𝟏 (mod 2)

Ready to Play? 𝒚 = (𝒚 𝟐 , 𝒚 𝟑 , 𝒚 𝟒 , 𝒚 𝟓 , 𝒚 𝟔 ) is unknown 𝒃 𝟔 = 𝟐, 𝟐, 𝟏, 𝟏, 𝟐 , 𝒄 𝟔 = 𝟏 𝒚 𝟐 + 𝒚 𝟑 + 𝒚 𝟔 = 𝟏 (mod 2)

Ready to Play? 𝒚 = (𝒚 𝟐 , 𝒚 𝟑 , 𝒚 𝟒 , 𝒚 𝟓 , 𝒚 𝟔 ) is unknown 𝒃 𝟕 = 𝟏, 𝟏, 𝟐, 𝟐, 𝟏 , 𝒄 𝟕 = 𝟐 𝒚 𝟒 + 𝒚 𝟓 = 𝟐 (mod 2)

Ready to Play? 𝒚 = (𝒚 𝟐 , 𝒚 𝟑 , 𝒚 𝟒 , 𝒚 𝟓 , 𝒚 𝟔 ) is unknown 𝒃 𝟖 = 𝟏, 𝟐, 𝟏, 𝟐, 𝟐 , 𝒄 𝟖 = 𝟏 𝒚 𝟑 + 𝒚 𝟓 + 𝒚 𝟔 = 𝟏 (mod 2)

Ready to Play? 𝒚 = (𝒚 𝟐 , 𝒚 𝟑 , 𝒚 𝟒 , 𝒚 𝟓 , 𝒚 𝟔 ) is unknown 𝒃 𝟗 = 𝟐, 𝟏, 𝟏, 𝟏, 𝟐 , 𝒄 𝟗 = 𝟐 𝒚 𝟐 + 𝒚 𝟔 = 𝟐 (mod 2)

Ready to Play? 𝒚 = (𝒚 𝟐 , 𝒚 𝟑 , 𝒚 𝟒 , 𝒚 𝟓 , 𝒚 𝟔 ) is unknown 𝒃 𝟘 = 𝟐, 𝟐, 𝟐, 𝟐, 𝟏 , 𝒄 𝟘 = 𝟏 𝒚 𝟐 + 𝒚 𝟑 + 𝒚 𝟒 + 𝒚 𝟓 = 𝟏 (mod 2)

Ready to Play? 𝒚 = (𝒚 𝟐 , 𝒚 𝟑 , 𝒚 𝟒 , 𝒚 𝟓 , 𝒚 𝟔 ) is unknown 𝒃 𝟐𝟏 = 𝟏, 𝟐, 𝟐, 𝟐, 𝟐 , 𝒄 𝟐𝟏 = 𝟐 𝒚 𝟑 + 𝒚 𝟒 + 𝒚 𝟓 + 𝒚 𝟔 = 𝟐 (mod 2)

Ready to Play? 𝒚 = (𝒚 𝟐 , 𝒚 𝟑 , 𝒚 𝟒 , 𝒚 𝟓 , 𝒚 𝟔 ) is unknown 𝒃 𝟐𝟐 = 𝟏, 𝟏, 𝟏, 𝟏, 𝟏 , 𝒄 𝟐𝟐 = 𝟏 𝟏 = 𝟏 (mod 2)

Parity Learning: 𝒚 ∈ 𝑺 𝟏, 𝟐 𝒐 is unknown A learner gets a stream of samples: 𝒃 𝟐 , 𝒄 𝟐 , 𝒃 𝟑 , 𝒄 𝟑 … , where ∀𝒖 : 𝒃 𝐮 ∈ 𝑺 𝟏, 𝟐 𝒐 and 𝒄 𝒖 = 𝒃 𝒖 ⋅ 𝒚 (mod 2) and needs to solve the equations and find 𝒚

𝒚 ∈ 𝑺 𝟏, 𝟐 𝒐 is unknown Parity Learning: A learner gets a stream of samples: 𝒃 𝟐 , 𝒄 𝟐 , 𝒃 𝟑 , 𝒄 𝟑 … , where ∀𝒖 : 𝒃 𝐮 ∈ 𝑺 𝟏, 𝟐 𝒐 and 𝒄 𝒖 = 𝒃 𝒖 ⋅ 𝒚 (mod 2) and needs to solve the equations and find 𝒚 By solving linear equations: 𝑷(𝒐) samples, 𝑷(𝒐 𝟑 ) memory bits

𝒚 ∈ 𝑺 𝟏, 𝟐 𝒐 is unknown Parity Learning: A learner gets a stream of samples: 𝒃 𝟐 , 𝒄 𝟐 , 𝒃 𝟑 , 𝒄 𝟑 … , where ∀𝒖 : 𝒃 𝐮 ∈ 𝑺 𝟏, 𝟐 𝒐 and 𝒄 𝒖 = 𝒃 𝒖 ⋅ 𝒚 (mod 2) and needs to solve the equations and find 𝒚 By solving linear equations: 𝑷(𝒐) samples, 𝑷(𝒐 𝟑 ) memory bits By trying all possibilities: 𝑷(𝒐) memory bits, exponential number of samples

𝒚 ∈ 𝑺 𝟏, 𝟐 𝒐 is unknown Parity Learning: A learner gets a stream of samples: 𝒃 𝟐 , 𝒄 𝟐 , 𝒃 𝟑 , 𝒄 𝟑 … , where ∀𝒖 : 𝒃 𝐮 ∈ 𝑺 𝟏, 𝟐 𝒐 and 𝒄 𝒖 = 𝒃 𝒖 ⋅ 𝒚 (mod 2) and needs to solve the equations and find 𝒚

𝒚 ∈ 𝑺 𝟏, 𝟐 𝒐 is unknown Parity Learning: A learner gets a stream of samples: 𝒃 𝟐 , 𝒄 𝟐 , 𝒃 𝟑 , 𝒄 𝟑 … , where ∀𝒖 : 𝒃 𝐮 ∈ 𝑺 𝟏, 𝟐 𝒐 and 𝒄 𝒖 = 𝒃 𝒖 ⋅ 𝒚 (mod 2) and needs to solve the equations and find 𝒚 𝒐 𝟑 [R 2016]: Any algorithm for parity learning requires either 𝟐𝟏 memory bits or an exponential number of samples (Conjectured by Steinhardt, Valiant and Wager [2015])

𝒚 ∈ 𝑺 𝟏, 𝟐 𝒐 is unknown Parity Learning: A learner gets a stream of samples: 𝒃 𝟐 , 𝒄 𝟐 , 𝒃 𝟑 , 𝒄 𝟑 … , where ∀𝒖 : 𝒃 𝐮 ∈ 𝑺 𝟏, 𝟐 𝒐 and 𝒄 𝒖 = 𝒃 𝒖 ⋅ 𝒚 (mod 2) and needs to solve the equations and find 𝒚 𝒐 𝟑 [R 2016]: Any algorithm for parity learning requires either 𝟐𝟏 memory bits or an exponential number of samples (Conjectured by Steinhardt, Valiant and Wager [2015]) Previously: no lower bound on the number of samples, even if the memory size is 𝒐 (for any learning problem) (for memory of size < 𝑜 , relatively easy to prove lower bounds, since inner product is a good two-source extractor)

𝒚 ∈ 𝑺 𝟏, 𝟐 𝒐 is unknown Parity Learning: A learner gets a stream of samples: 𝒃 𝟐 , 𝒄 𝟐 , 𝒃 𝟑 , 𝒄 𝟑 … , where ∀𝒖 : 𝒃 𝐮 ∈ 𝑺 𝟏, 𝟐 𝒐 and 𝒄 𝒖 = 𝒃 𝒖 ⋅ 𝒚 (mod 2) and needs to solve the equations and find 𝒚 𝒐 𝟑 [R 2016]: Any algorithm for parity learning requires either 𝟐𝟏 memory bits or an exponential number of samples (Conjectured by Steinhardt, Valiant and Wager [2015]) Previously: no lower bound on the number of samples, even if the memory size is 𝒐 (for any learning problem) I will focus on super-linear lower bounds on the memory size

𝒚 ∈ 𝑺 𝟏, 𝟐 𝒐 is unknown Parity Learning: A learner gets a stream of samples: 𝒃 𝟐 , 𝒄 𝟐 , 𝒃 𝟑 , 𝒄 𝟑 … , where ∀𝒖 : 𝒃 𝐮 ∈ 𝑺 𝟏, 𝟐 𝒐 and 𝒄 𝒖 = 𝒃 𝒖 ⋅ 𝒚 (mod 2) and needs to solve the equations and find 𝒚 𝒐 𝟑 [R 2016]: Any algorithm for parity learning requires either 𝟐𝟏 memory bits or an exponential number of samples (Conjectured by Steinhardt, Valiant and Wager [2015])

𝒚 ∈ 𝑺 𝟏, 𝟐 𝒐 is unknown Parity Learning: A learner gets a stream of samples: 𝒃 𝟐 , 𝒄 𝟐 , 𝒃 𝟑 , 𝒄 𝟑 … , where ∀𝒖 : 𝒃 𝐮 ∈ 𝑺 𝟏, 𝟐 𝒐 and 𝒄 𝒖 = 𝒃 𝒖 ⋅ 𝒚 (mod 2) and needs to solve the equations and find 𝒚 𝒐 𝟑 [R 2016]: Any algorithm for parity learning requires either 𝟐𝟏 memory bits or an exponential number of samples (Conjectured by Steinhardt, Valiant and Wager [2015]) 𝒐 𝟑 Best upper bound on the memory size : ≈ 𝟓 (when the number of samples is sub-exponential)

Motivation: Machine Learning Theory: For some online learning problems, access to a relatively large memory is crucial In some cases, learning is infeasible, due to memory constraints (if each sample is viewed only once)

Motivation: Machine Learning Theory: For some online learning problems, access to a relatively large memory is crucial In some cases, learning is infeasible, due to memory constraints (if each sample is viewed only once) Very interesting to understand how much memory is needed for learning Our result gives a concept class that can be efficiently learnt if and only if the learner has a quadratic-size memory

Motivation: Machine Learning Theory: For some online learning problems, access to a relatively large memory is crucial In some cases, learning is infeasible, due to memory constraints (if each sample is viewed only once) Very interesting to understand how much memory is needed for learning Our result gives a concept class that can be efficiently learnt if and only if the learner has a quadratic-size memory “ Good ” memory may be crucial in learning processes

Example: Neural Networks Many learning algorithms try to learn a concept by modeling it as a neural network. The algorithm keeps in the memory some neural network and updates the weights when new samples arrive. The memory used is the size of the network

Ran Raz Princeton University Based on joint works with: Sumegha - PowerPoint PPT Presentation

Learning Fast Requires Good Memory: Time-Space Tradeoff Lower Bounds for Learning Ran Raz Princeton University Based on joint works with: Sumegha Garg, Gillat Kol, Avishay Tal [R16, KRT17, R17, GRT18] This Talk: A line of recent works

Time-Space Tradeoffs for Two-Pass Learning Sumegha Garg (Princeton) Joint Work with Ran Raz

Accelleran in the open RAN Movement Arnaud Polster, Accelleran 2020 Accelleran Positioning CORE

Network (RAN) The mission of the Spartan Research Administrators Network (RAN) is to provide the

THROUGH EXPANDABLE ANTENNAS www.nslcomm.com Contact: raz@nslcomm.com + 972 - 52 - 509 - 6555

Experiment 1: Optical Measurements Authors: Eli Raz, Itamar Hason Motivation: Rainbow Rainbow

Microwave Oven Gal Dor, Ofer Eyal, Moshe Goldstein, Eli Raz Youtube Video

Dimension Reduction: Analysis and Algorithms Raz Kupferman Institute of Mathematics The Hebrew

The Wiedemann-Franz Law Asaf Rozen, Alexander Palevski, Eli Raz The Wiedemann-Franz Law Heat

Zero-length Springs and Slinky Coils Theory Question 1 Presentor: Niv Cohen Authors: Eli Raz,

Health Policy and the Pandemic: Lessons from History Mical Raz, MD, PhD, MSHP Charles E. and

Cryptanalysi Ben Nassi Raz Ben-Netanel s Prof. Adi Shamir Prof. Yuval Elovici Agenda 1)

OFFLINE SCHEDULER BOF Raz Ben Yehuda Linux plumbers conference 2009 CONCEPT PARTIAL

UAVs in an Australian Maritime Environment Marc Ware Lieutenant Commander RAN 14 July 2003 LCDR

AUT 2019 - Robotic and Autonomous Systems + AI CDRE Chris Smith , CSM, RAN DG Littoral CMDR Paul

DCS 530 SECTION ON NATURAL LANGUAGE UNDERSTANDING JAMES ALLEN FALL, 2017 THE HAPPY DOG RAN IN

WALL-E Yini Wang; Ran Mo Development Team Yini Wang: Photomultiplier Tube & Machine Learning

Generalized plumbings and Murasugi sums Patrick Popescu-Pampu Universit e de Lille 1, France

List decoding of RM(1,m) codes and Multi-linear Power Analysis attacks (MLPA) Ilya Dumer , Rafael

Faster Compact DiffieHellman: Endomorphisms on the x -line Craig Costello H useyin H sl

A QoS-Enabled OpenFlow Environment for Scalable Video

from trivalent gluing of web diagrams Hirotaka Hayashi (Tokai University) Based on the

MODULE FOR TURKISH INFLECTIONAL ANALYSIS AN EXAMPLE OF HIGHLY PRODUCTIVE MORPHOLOGY Arianna

How to Reconcile Randomness with Conditional . . . Physicists Belief that Every Theory Is

1. Know Your Talents 2 Alei Shur 1:169 1 Bereishis - 1:27

Ran Raz Princeton University Based on joint works with: Sumegha - PowerPoint PPT Presentation

Learning Fast Requires Good Memory: Time-Space Tradeoff Lower Bounds for Learning Ran Raz Princeton University Based on joint works with: Sumegha Garg, Gillat Kol, Avishay Tal [R16, KRT17, R17, GRT18] This Talk: A line of recent works

Time-Space Tradeoffs for Two-Pass Learning Sumegha Garg (Princeton) Joint Work with Ran Raz

Accelleran in the open RAN Movement Arnaud Polster, Accelleran 2020 Accelleran Positioning CORE

Network (RAN) The mission of the Spartan Research Administrators Network (RAN) is to provide the

THROUGH EXPANDABLE ANTENNAS www.nslcomm.com Contact: raz@nslcomm.com + 972 - 52 - 509 - 6555

Experiment 1: Optical Measurements Authors: Eli Raz, Itamar Hason Motivation: Rainbow Rainbow

Microwave Oven Gal Dor, Ofer Eyal, Moshe Goldstein, Eli Raz Youtube Video

Dimension Reduction: Analysis and Algorithms Raz Kupferman Institute of Mathematics The Hebrew

The Wiedemann-Franz Law Asaf Rozen, Alexander Palevski, Eli Raz The Wiedemann-Franz Law Heat

Zero-length Springs and Slinky Coils Theory Question 1 Presentor: Niv Cohen Authors: Eli Raz,

Health Policy and the Pandemic: Lessons from History Mical Raz, MD, PhD, MSHP Charles E. and

Cryptanalysi Ben Nassi Raz Ben-Netanel s Prof. Adi Shamir Prof. Yuval Elovici Agenda 1)

OFFLINE SCHEDULER BOF Raz Ben Yehuda Linux plumbers conference 2009 CONCEPT PARTIAL

UAVs in an Australian Maritime Environment Marc Ware Lieutenant Commander RAN 14 July 2003 LCDR

AUT 2019 - Robotic and Autonomous Systems + AI CDRE Chris Smith , CSM, RAN DG Littoral CMDR Paul

DCS 530 SECTION ON NATURAL LANGUAGE UNDERSTANDING JAMES ALLEN FALL, 2017 THE HAPPY DOG RAN IN

WALL-E Yini Wang; Ran Mo Development Team Yini Wang: Photomultiplier Tube &amp; Machine Learning

Generalized plumbings and Murasugi sums Patrick Popescu-Pampu Universit e de Lille 1, France

List decoding of RM(1,m) codes and Multi-linear Power Analysis attacks (MLPA) Ilya Dumer , Rafael

Faster Compact DiffieHellman: Endomorphisms on the x -line Craig Costello H useyin H sl

A QoS-Enabled OpenFlow Environment for Scalable Video

from trivalent gluing of web diagrams Hirotaka Hayashi (Tokai University) Based on the

MODULE FOR TURKISH INFLECTIONAL ANALYSIS AN EXAMPLE OF HIGHLY PRODUCTIVE MORPHOLOGY Arianna

How to Reconcile Randomness with Conditional . . . Physicists Belief that Every Theory Is

1. Know Your Talents 2 Alei Shur 1:169 1 Bereishis - 1:27

WALL-E Yini Wang; Ran Mo Development Team Yini Wang: Photomultiplier Tube & Machine Learning