limits on representing functions by linear combinations
play

Limits on Representing Functions by Linear Combinations of Simple - PowerPoint PPT Presentation

Limits on Representing Functions by Linear Combinations of Simple Functions 0,1 0,1 ? simple simple simple simple simple simple Ryan Williams MIT The -linear Representation Problem Let be a class of


  1. Limits on Representing Functions by Linear Combinations of Simple Functions โˆ‘ ๐‘” โˆถ 0,1 ๐‘œ โ†’ 0,1 ? โ‰ก simple simple simple simple simple simple Ryan Williams MIT

  2. The โ„ -linear Representation Problem Let ๐““ be a class of โ€œsimpleโ€ functions (take Boolean inputs, but need not be Boolean-valued) Which โ€œinterestingโ€ functions ๐’ˆ can(not) be represented by โ€œshortโ€ โ„ -linear combinations of functions from ๐““ ? โˆ‘ ๐‘” โˆถ 0,1 ๐‘œ โ†’ 0,1 poly( ๐’ ) โ€œsizeโ€ ? โ‰ก โˆ’๐œŒ 2 ๐œš โˆ’๐‘“ Call this a โˆ‘ โˆ˜ ๐““ circuit simple simple simple simple simple simple Note: If ๐““ spans the vector space of all functions ๐’ˆ โˆถ ๐Ÿ, ๐Ÿ ๐’ โ†’ โ„ then there is always a โˆ‘ โˆ˜ ๐““ circuit of โ‰ค ๐Ÿ‘ ๐’ sizeโ€ฆ

  3. The โ„ -linear Representation Problem Which โ€œinterestingโ€ functions ๐’ˆ can(not) be represented by โ€œshortโ€ โ„ -linear combinations of functions from ๐““ ? If ๐““ is the class of ๐Ÿ‘ ๐’ ๐‘ฉ๐‘ถ๐‘ฌ functions on ๐’ variables: โˆ‘ โˆ˜ ๐‘ฉ๐‘ถ๐‘ฌ โ‰ก ๐Ÿ/๐Ÿ polynomials over โ„ If ๐““ is the class of ๐Ÿ‘ ๐’ ๐‘ธ๐‘ฉ๐‘บ๐‘ฑ๐‘ผ๐’ functions on ๐’ variables: โˆ‘ โˆ˜ ๐‘ธ๐‘ฉ๐‘บ๐‘ฑ๐‘ผ๐’ โ‰ก โˆ’๐Ÿ/๐Ÿ polynomials over โ„ (Fourier analysis of Boolean functions) These are well-understood: ๐““ is a basis for the vector space of functions ๐‘” โˆถ 0,1 ๐‘œ โ†’ โ„ โ‡’ the โ„ -linear representation of ๐’ˆ is unique, so the โ€œshortestโ€ is also the โ€œlongestโ€โ€ฆ More interesting cases: representations are not unique

  4. This Paper: Three Simple Classes 1. Linear Threshold Functions [ ๐‘ด๐‘ผ๐‘ฎ ] 2. Rectified Linear Units [ ๐‘บ๐’‡๐‘ด๐‘ฝ ] ๐‘ฏ๐‘ฎ ( ๐’’ )- Polynomials of Degree- ๐’† [ ๐‘ธ๐‘ท๐‘ด๐’๐’† ๐’’ ] 3. ( ๐’’ prime and ๐’† โ‰ฅ ๐Ÿ‘ ) For all three classes: There are โ‰ซ ๐Ÿ‘ ๐’ functions on ๐’ variables, โ€ข so โ„ -linear representations are not unique ๐Ÿ‘ ๐šฐ ๐’ ๐Ÿ‘ LTFs, ๐’’ ๐šฐ ๐’ ๐’† degree- ๐’† polys, โˆž ReLU functions โ€ข โ„ -linear Representations have been studied! โˆ‘ โˆ˜ ๐‘ด๐‘ผ๐‘ฎ = Special Case of Depth-2 Threshold Circuits โˆ‘ โˆ˜ ๐‘บ๐’‡๐‘ด๐‘ฝ = โ€œDepth -2 Neural Net with ReLU activationโ€ โˆ‘ โˆ˜ ๐‘ธ๐‘ท๐‘ด๐’๐’†[๐’’] = โ€œHigher - Orderโ€ Fourier Analysis for ๐’† โ‰ฅ ๐Ÿ‘

  5. Sums of Linear Threshold Functions ๐‘œ : 0,1 ๐‘œ โ†’ 0,1 is an LTF if โˆƒ ๐‘ฅ 1 , โ€ฆ ๐‘ฅ ๐‘œ , ๐‘ข โˆˆ โ„ such that Def. ๐‘” โˆ€ ๐‘ฆ 1 , โ€ฆ , ๐‘ฆ ๐‘œ โˆˆ 0,1 ๐‘œ , ๐’ˆ ๐’š ๐Ÿ , โ€ฆ , ๐’š ๐’ = ๐Ÿ โ‡” โˆ‘ ๐’‹ ๐’™ ๐’‹ ๐’š ๐’‹ โ‰ฅ ๐’– Depth-Two LTF Circuits ( ๐‘ด๐‘ผ๐‘ฎ โˆ˜ ๐‘ด๐‘ผ๐‘ฎ ): Major problem to find โ€œniceโ€ functions without ๐‘œ ๐‘™ -gate ๐‘€๐‘ˆ๐บ โˆ˜ ๐‘€๐‘ˆ๐บ circuits, for all ๐‘™ [Hajnal et al.โ€™91] exp(n) depth-two lower bounds for small ๐‘ฅ ๐‘— โ€™s [Roychowdhury-Orlitsky- Siuโ€™94] What about โˆ‘ โˆ˜ ๐‘ด๐‘ผ๐‘ฎ ? Special case of ๐‘ด๐‘ผ๐‘ฎ โˆ˜ ๐‘ด๐‘ผ๐‘ฎ : the linear form for output LTF must always evaluate to 0 or 1 Still, no ๐’ ๐Ÿ.๐Ÿ” -gate lower bounds were known for โˆ‘ โˆ˜ ๐‘ด๐‘ผ๐‘ฎ ! We prove: Thm โˆ€๐’ , โˆƒ๐’ˆ ๐’ โˆˆ ๐‘ถ๐‘ธ without ๐’ ๐’ -size โˆ‘ โˆ˜ ๐‘ด๐‘ผ๐‘ฎ Thm โˆƒ๐’ˆ โˆˆ ๐‘ถ๐‘ผ๐‘ฑ๐‘ต๐‘ญ[๐’ ๐’Ž๐’‘๐’‰ โˆ— ๐’ ] without ๐’’๐’‘๐’Ž๐’›(๐’) -size โˆ‘ โˆ˜ ๐‘ด๐‘ผ๐‘ฎ Note: It is a major open problem to prove โˆƒ๐’ˆ โˆˆ ๐‘ถ๐‘ธ without ๐’ ๐’ -size (unrestricted) circuits

  6. Sums of ReLUs ๐‘œ : โ„ ๐‘œ โ†’ โ„ + is a ReLU if โˆƒ ๐‘ฅ 1 , โ€ฆ ๐‘ฅ ๐‘œ , ๐‘ข โˆˆ โ„ such that Def. ๐‘” โˆ€ ๐‘ฆ 1 , โ€ฆ , ๐‘ฆ ๐‘œ โˆˆ โ„ ๐‘œ , ๐’ˆ ๐’š ๐Ÿ , โ€ฆ , ๐’š ๐’ = ๐ง๐›๐ฒ(๐Ÿ, โˆ‘ ๐’‹ ๐’™ ๐’‹ ๐’š ๐’‹ + ๐’–) โˆ‘ โˆ˜ ๐‘บ๐’‡๐‘ด๐‘ฝ generalizes โˆ‘ โˆ˜ ๐‘ด๐‘ผ๐‘ฎ โˆ‘ โˆ˜ ๐‘บ๐’‡๐‘ด๐‘ฝ = โ€œDepth -Two Neural Nets with ReLU Activationsโ€ Very widely studied, thousands of references Several recent references [see paper] give lower bounds for some โ€œweirdโ€ ๐’ˆ: โ„ ๐‘œ โ†’ โ„ which vary sharply / sensitive No lower bounds known for discrete-domain / Boolean functions (note: โ€œmost sensitiveโ€ Boolean fn PARITY has O(n)-size โˆ‘โˆ˜ ๐‘ด๐‘ผ๐‘ฎ ) We can generalize the โˆ‘ โˆ˜ ๐‘ด๐‘ผ๐‘ฎ limits to โˆ‘ โˆ˜ ๐‘บ๐’‡๐‘ด๐‘ฝ : Thm โˆ€๐’ , โˆƒ๐’ˆ ๐’ โˆˆ ๐‘ถ๐‘ธ without ๐’ ๐’ -size โˆ‘ โˆ˜ ๐‘บ๐’‡๐‘ด๐‘ฝ Thm โˆƒ๐’ˆ โˆˆ ๐‘ถ๐‘ผ๐‘ฑ๐‘ต๐‘ญ[๐’ ๐’Ž๐’‘๐’‰ โˆ— ๐’ ] without ๐’’๐’‘๐’Ž๐’›(๐’) -size โˆ‘ โˆ˜ ๐‘บ๐’‡๐‘ด๐‘ฝ Again: major open problem to prove โˆƒ๐’ˆ โˆˆ ๐‘ถ๐‘ธ without ๐’ ๐’ -size (unrestricted) circuits

  7. Sums of Low-Degree GF(p)-Polys โˆ‘โˆ˜ ๐‘ธ๐‘ท๐‘ด๐’๐’†[๐’’] : Linear combination of ๐‘”: 0,1 ๐‘œ โ†’ {0,1, โ€ฆ , ๐‘ž โˆ’ 1} where for every ๐‘” there is a degree- ๐‘’ polynomial ๐‘Ÿ(๐‘ฆ) such that โˆ€๐‘ฆ โˆˆ 0,1 ๐‘œ , ๐’ˆ ๐’š = ๐’“ ๐’š mod ๐’’ Case of ๐’† = ๐Ÿ‘, ๐’’ = ๐Ÿ‘ is already very interesting! Compelling Conjecture [โ€œDegree - Two Uncertainty Principleโ€]: ๐‘ฉ๐‘ถ๐‘ฌ (on ๐’ inputs) requires ๐’ ๐ ๐Ÿ -size โˆ‘โˆ˜ ๐‘ธ๐‘ท๐‘ด๐’๐Ÿ‘[๐Ÿ‘] Known: ๐‘ฉ๐‘ถ๐‘ฌ requires ฮฉ(2 ๐‘œ ) -size โˆ‘โˆ˜ ๐‘ธ๐‘ท๐‘ด๐’๐Ÿ ๐Ÿ‘ ๐‘ฉ๐‘ถ๐‘ฌ has O(2 ๐‘œ/2 ) -size โˆ‘โˆ˜ ๐‘ธ๐‘ท๐‘ด๐’๐Ÿ‘[๐Ÿ‘] No non-trivial lower bounds were known for โˆ‘ โˆ˜ ๐‘ธ๐‘ท๐‘ด๐’๐Ÿ‘[๐’’] We prove: Thm โˆ€๐’†, ๐’, โˆ€๐’’ prime, โˆƒ๐’ˆ ๐’ โˆˆ ๐‘ถ๐‘ธ without ๐’ ๐’ -size โˆ‘โˆ˜ ๐‘ธ๐‘ท๐‘ด๐’๐’†[๐’’] Thm โˆƒ๐’ˆ โˆˆ ๐‘ถ๐‘ผ๐‘ฑ๐‘ต๐‘ญ[๐’ ๐’Ž๐’‘๐’‰ โˆ— ๐’ ] without ๐’’๐’‘๐’Ž๐’›(๐’) -size โˆ‘โˆ˜ ๐‘ธ๐‘ท๐‘ด๐’๐’†[๐’’] for all fixed ๐’† and fixed prime ๐’’

  8. A Key Theorem A new instance of โ€œ Circuit Analysis Algorithms โ‡’ Circuit Lower Bounds โ€ Key Theorem: Let ๐““ be a class of functions ๐’ˆ โˆถ ๐Ÿ, ๐Ÿ ๐’ โ†’ โ„ . Assume: there is an ๐œป > ๐Ÿ and an algorithm ๐‘ฉ so that for any given ๐’ˆ ๐Ÿ , โ€ฆ , ๐’ˆ ๐Ÿ“ โˆˆ ๐““ , ๐‘ฉ can compute the โ€œsum - productโ€ ๐Ÿ“ เท เท‘ ๐’ˆ ๐’‹ (๐’ƒ) ๐’ƒโˆˆ ๐Ÿ,๐Ÿ ๐’ ๐’‹=๐Ÿ in ๐Ÿ‘ ๐’ ๐Ÿโˆ’๐œป time. Then: โˆ€๐’ , โˆƒ๐’ˆ โˆˆ ๐‘ถ๐‘ธ without ๐’ ๐’ -size โˆ‘โˆ˜ ๐““ , and โˆƒ๐’ˆ โˆˆ ๐‘ถ๐‘ผ๐‘ฑ๐‘ต๐‘ญ ๐’ ๐’Ž๐’‘๐’‰ โˆ— ๐’ without ๐’’๐’‘๐’Ž๐’›(๐’) -size โˆ‘โˆ˜ ๐““ Applies the new Easy Witness Lemma of [Murray- Wโ€™18] We show how to compute sum-products in ๐Ÿ‘ ๐’ ๐Ÿโˆ’๐œป time for LTFs, ReLUs, and low-degree polynomials

  9. Major Ideas in the Key Theorem Assume: (1) There is a ๐Ÿ‘ ๐’ ๐Ÿโˆ’๐œป -time sum-product algorithm ๐‘ฉ for ๐““ (2) For some fixed ๐’ , all ๐’ˆ โˆˆ ๐‘ถ๐‘ธ have ๐’ ๐’ -size โˆ‘โˆ˜ ๐““ Goal: Derive a contradiction. (1) and (2) โ‡’ Given (unrestricted) circuit ๐‘ผ with ๐’ inputs and ๐’ size Can guess-and-check ๐’ ๐’ -size โˆ‘โˆ˜ ๐““ computing ๐‘ผ , in ๐Ÿ‘ ๐’ ๐Ÿโˆ’๐œป ๐’ ๐‘ท ๐Ÿ time Note: to guess, we need that the coefficients in our linear combinations have โ€œsmallโ€ bit complexity, WLOG (1) โ‡’ Can solve Circuit-UNSAT in nondeterministic ๐Ÿ‘ ๐’ ๐Ÿโˆ’๐œป ๐’ ๐‘ท ๐Ÿ time We can even solve #Circuit-SAT, because we can compute โˆ‘ ๐’ƒโˆˆ ๐Ÿ,๐Ÿ ๐’ (โˆ‘โˆ˜ ๐““ ๐’ƒ ) = โˆ‘ โˆ‘ ๐’ƒ ๐““(๐’ƒ) by solving sum-product for ๐’ ๐’ times [Murray- Wโ€™18] โ‡’ โˆ€๐’ , โˆƒ๐’ˆ โˆˆ ๐‘ถ๐‘ธ without ๐’ ๐’ -size unrestricted circuits Contradicts (2) when โˆ‘โˆ˜ ๐““ can be simulated by Boolean circuits! The proof crucially relies on โˆ‘โˆ˜ ๐““ computing a circuit exactly

  10. Sum-Product Algorithm for LTF Uses (old) fact that #Subset-Sum is solvable in ๐’’๐’‘๐’Ž๐’› ๐’ โ‹… ๐Ÿ‘ ๐’/๐Ÿ‘ time! Thm [HSโ€™76] #Subset-Sum on ๐’ numbers is in ๐’’๐’‘๐’Ž๐’› ๐’ โ‹… ๐Ÿ‘ ๐’/๐Ÿ‘ time Proof Given ๐’™ ๐Ÿ , โ€ฆ , ๐’™ ๐’ , ๐’– , we want to know the number of ๐‘ป โŠ† [๐’] such that โˆ‘ ๐’‹โˆˆ๐‘ป ๐’™ ๐’‹ = ๐’– 1. Enumerate all possible ๐Ÿ‘ ๐’/๐Ÿ‘ subsets ๐‘ป of {๐’™ ๐Ÿ , โ€ฆ , ๐’™ ๐’/๐Ÿ‘ } . Make a list ๐‘ด ๐Ÿ of the ๐Ÿ‘ ๐’/๐Ÿ‘ subset sums, and SORT all sums in ๐‘ด ๐Ÿ 2. Enumerate all possible ๐Ÿ‘ ๐’/๐Ÿ‘ subsets ๐‘ผ of {๐’™ ๐’/๐Ÿ‘+๐Ÿ , โ€ฆ , ๐’™ ๐’ } . For each ๐‘ผ summing to a value ๐’˜ , BINARY SEARCH for a value ๐’˜โ€ฒ in ๐‘ด ๐Ÿ such that ๐’˜ + ๐’˜โ€ฒ = ๐’– 3. To compute the total number of subsets summing to ๐’– : For each sum value ๐’˜โ€ฒ appearing in ๐‘ด ๐Ÿ , store the number ๐’ ๐’˜โ€ฒ of subsets in ๐‘ด ๐Ÿ which have value ๐’˜โ€ฒ . Later, if value ๐’˜โ€ฒ is found in the binary search, add ๐’ ๐’˜โ€ฒ to a running sum. Takes ๐’’๐’‘๐’Ž๐’› ๐’ โ‹… ๐Ÿ‘ ๐’/๐Ÿ‘ time in total

  11. Sum-Product Algorithm for LTF Uses (old) fact that #Subset-Sum is solvable in ๐’’๐’‘๐’Ž๐’› ๐’ โ‹… ๐Ÿ‘ ๐’/๐Ÿ‘ time! Thm For any ๐’ˆ ๐Ÿ , โ€ฆ , ๐’ˆ ๐Ÿ“ โˆˆ ๐‘ด๐‘ผ๐‘ฎ , we can compute ๐Ÿ“ in ๐’’๐’‘๐’Ž๐’› ๐’ โ‹… ๐Ÿ‘ ๐’/๐Ÿ‘ time. เท เท‘ ๐’ˆ ๐’‹ (๐’ƒ) ๐’ƒโˆˆ ๐Ÿ,๐Ÿ ๐’ ๐’‹=๐Ÿ Proof An Exact LTF ( ๐‘ญ๐‘ด๐‘ผ๐‘ฎ ) has the form ๐’‰ ๐’š = ๐Ÿ โ‡” โˆ‘ ๐’‹ ๐’™ ๐’‹ ๐’š ๐’‹ = ๐’– #Subset-Sum in ๐’’๐’‘๐’Ž๐’› ๐’ โ‹… ๐Ÿ‘ ๐’/๐Ÿ‘ time โ‡’ โˆ‘ ๐‘ ๐‘• ๐‘ in ๐’’๐’‘๐’Ž๐’› ๐’ โ‹… ๐Ÿ‘ ๐’/๐Ÿ‘ time [HP, CCCโ€™10]: Every ๐‘ด๐‘ผ๐‘ฎ on ๐’ inputs can be written as โˆ‘ ๐’’๐’‘๐’Ž๐’› ๐’ ๐‘ญ๐‘ด๐‘ผ๐‘ฎ ๐Ÿ“ ๐Ÿ“ for ๐‘ญ๐‘ด๐‘ผ๐‘ฎ s ๐’‰ ๐’‹,๐’Œ So we can write เท เท‘ ๐’ˆ ๐’‹ (๐’ƒ) = เท เท‘ เท ๐’‰ ๐’‹,๐’Œ (๐’ƒ) ๐’ƒโˆˆ ๐Ÿ,๐Ÿ ๐’ ๐’ƒโˆˆ ๐Ÿ,๐Ÿ ๐’ ๐’‹=๐Ÿ ๐’‹=๐Ÿ ๐’’๐’‘๐’Ž๐’› ๐’ ๐Ÿ“ ๐Ÿ“ Simple algebra: = เท เท เท‘ ๐’‰ ๐’‹,๐’Œโ€ฒ ๐’ƒ = เท เท เท‘ ๐’‰ ๐’‹,๐’Œโ€ฒ ๐’ƒ ๐’ƒโˆˆ ๐Ÿ,๐Ÿ ๐’ ๐’ƒโˆˆ{๐Ÿ,๐Ÿ} ๐’ ๐’’๐’‘๐’Ž๐’› ๐’ ๐’‹=๐Ÿ ๐’’๐’‘๐’Ž๐’› ๐’ ๐’‹=๐Ÿ Can compute in ๐’’๐’‘๐’Ž๐’› ๐’ โ‹… ๐Ÿ‘ ๐’/๐Ÿ‘ time! ๐Ÿ“ Each ฯ‚ ๐’‹=๐Ÿ ๐’‰ ๐’‹,๐’Œโ€ฒ ๐’š = ๐’Š ๐’š for some ๐‘ญ๐‘ด๐‘ผ๐‘ฎ ๐’Š

Recommend


More recommend