On the Limitations of Representing Functions on Sets Edward Wagstaff*, Fabian Fuchs*, Martin Engelcke* Ingmar Posner, Michael Osborne M achine L earning R esearch G roup *Equal contribution
Examples for Permutation Invariant Problems: Detecting Common Attributes Smiling Blond Hair CelebA Dataset, Liu et al.
The deep sets architecture Input
The deep sets architecture Input ϕ
The deep sets architecture Input Latent A ϕ
The deep sets architecture Input Latent A + ϕ
The deep sets architecture Input Latent A Latent B + ϕ
The deep sets architecture Input Latent A Latent B + ϕ ρ
The deep sets architecture Input Latent A Latent B Output + ϕ ρ
Input Output + ϕ f ( x 1 , …, x M ) ρ ϕ ( x 1 ) x 1 Y x M ϕ ( x M ) X ⊂ ℝ M ℝ NxM ℝ N ℝ
Input Output + ϕ f ( x 1 , …, x M ) ρ ϕ ( x 1 ) x 1 Y x M ϕ ( x M ) X ⊂ ℝ M ℝ NxM ℝ N ℝ Theorem 1 (Zaheer et al.): This architecture can successfully model any permutation invariant function, even for latent dimension N=1.
Input Output + ϕ f ( x 1 , …, x M ) ρ ϕ ( x 1 ) x 1 Y x M ϕ ( x M ) X ⊂ ℝ M ℝ NxM ℝ N ℝ Theorem 1 (Zaheer et al.): This architecture can successfully model any permutation invariant function, even for latent dimension N=1. Proof
Input Output + ϕ f ( x 1 , …, x M ) ρ ϕ ( x 1 ) x 1 Y x M ϕ ( x M ) X ⊂ ℝ M ℝ NxM ℝ N ℝ Theorem 1 (Zaheer et al.): This architecture can successfully model any permutation invariant function, even for latent dimension N=1. Proof Assume that neural networks Φ and ρ are universal function approximators
Input Output + ϕ f ( x 1 , …, x M ) ρ ϕ ( x 1 ) x 1 Y x M ϕ ( x M ) X ⊂ ℝ M ℝ NxM ℝ N ℝ Theorem 1 (Zaheer et al.): This architecture can successfully model any permutation invariant function, even for latent dimension N=1. Proof Find a Φ such that Assume that neural mapping from input networks Φ and ρ are & set X to latent universal function representation Y is approximators injective
Input Output + ϕ f ( x 1 , …, x M ) ρ ϕ ( x 1 ) x 1 Y x M ϕ ( x M ) X ⊂ ℝ M ℝ NxM ℝ N ℝ Theorem 1 (Zaheer et al.): This architecture can successfully model any permutation invariant function, even for latent dimension N=1. Proof Find a Φ such that Assume that neural mapping from input networks Φ and ρ are Everything can & set X to latent universal function be modelled representation Y is approximators injective
Input Output + ϕ f ( x 1 , …, x M ) ρ ϕ ( x 1 ) x 1 Y x M ϕ ( x M ) X ⊂ ℝ M ℝ NxM ℝ N ℝ Theorem 1 (Zaheer et al.): This architecture can successfully model any permutation invariant function, even for latent dimension N=1. Proof Find a Φ such that Assume that neural mapping from input networks Φ and ρ are Everything can & set X to latent universal function be modelled representation Y is approximators injective define c ( x ) : ℚ → ℕ
Input Output + ϕ f ( x 1 , …, x M ) ρ ϕ ( x 1 ) x 1 Y x M ϕ ( x M ) X ⊂ ℝ M ℝ NxM ℝ N ℝ Theorem 1 (Zaheer et al.): This architecture can successfully model any permutation invariant function, even for latent dimension N=1. Proof Find a Φ such that Assume that neural mapping from input networks Φ and ρ are Everything can & set X to latent universal function be modelled representation Y is approximators injective define c ( x ) : ℚ → ℕ ϕ ( x ) = 2 c ( x ) then define
Role of Continuity We need to take real numbers into account!
Input Output + ϕ f ( x 1 , …, x M ) ρ ϕ ( x 1 ) x 1 Y x M ϕ ( x M ) X ⊂ ℝ M ℝ NxM ℝ N ℝ
Input Output + ϕ f ( x 1 , …, x M ) ρ ϕ ( x 1 ) x 1 Y x M ϕ ( x M ) X ⊂ ℝ M ℝ NxM ℝ N ℝ Theorem 2 : If we want to model all permutation invariant functions, it is suf<icient and necessary that the latent dimension N is at least as large as the maximum input set size M.
Input Output + ϕ f ( x 1 , …, x M ) ρ ϕ ( x 1 ) x 1 Y x M ϕ ( x M ) X ⊂ ℝ M ℝ NxM ℝ N ℝ Theorem 2 : If we want to model all permutation invariant functions, it is suf<icient and necessary that the latent dimension N is at least as large as the maximum input set size M. Sketch of Proof for Necessity
Input Output + ϕ f ( x 1 , …, x M ) ρ ϕ ( x 1 ) x 1 Y x M ϕ ( x M ) X ⊂ ℝ M ℝ NxM ℝ N ℝ Theorem 2 : If we want to model all permutation invariant functions, it is suf<icient and necessary that the latent dimension N is at least as large as the maximum input set size M. Sketch of To prove necessity, we Proof for only need one function Necessity which can’t be decomposed with N<M . We pick max(X) .
Input Output + ϕ f ( x 1 , …, x M ) ρ ϕ ( x 1 ) x 1 Y x M ϕ ( x M ) X ⊂ ℝ M ℝ NxM ℝ N ℝ Theorem 2 : If we want to model all permutation invariant functions, it is suf<icient and necessary that the latent dimension N is at least as large as the maximum input set size M. Sketch of To prove necessity, we We show that, in order Proof for only need one function to represent max(X), Necessity which can’t be Φ ( X ) = ∑ ϕ ( x ) decomposed with x N<M . We pick max(X) . needs to be injective
Input Output + ϕ f ( x 1 , …, x M ) ρ ϕ ( x 1 ) x 1 Y x M ϕ ( x M ) X ⊂ ℝ M ℝ NxM ℝ N ℝ Theorem 2 : If we want to model all permutation invariant functions, it is suf<icient and necessary that the latent dimension N is at least as large as the maximum input set size M. Sketch of To prove necessity, we We show that, in order Proof for only need one function to represent max(X), This is not Necessity which can’t be Φ ( X ) = ∑ possible with ϕ ( x ) decomposed with N<M x N<M . We pick max(X) . needs to be injective
Illustrative Example: Regressing to the Median {0.1, 0.6, − 0.32, 1.61, 0.5, 0.67, 0.3}
Illustrative Example: Regressing to the Median {0.1, 0.6, − 0.32, 1.61, 0.5, 0.67, 0.3}
Illustrative Example: Regressing to the Median 10 0 100 15 30 80 critical latent dim N c 60 100 60 RMSE 200 10 − 1 300 40 400 500 20 10 − 2 0 10 0 10 1 10 2 10 3 0 100 200 300 400 500 600 N (latent dim) input size M
Thank You
Recommend
More recommend