Low-Rank Inducing Norms with Optimality Interpretations LU Christian Grussler 2017 Pontus Giselsson, Anders Rantzer June15 Automatic Control, Lund University
Low-Rank Inducing Norms Problem Grussler, Giselsson, Rantzer Problem & minimize k ( � X � ) + h ( X ) Motivation X ∈ R m × n Low-Rank subject to rank ( X ) ≤ r Inducing Norms 1 k : R ≥ 0 → R is an increasing, convex, proper, closed function 2 � · � is a unitarily invariant norm 3 h : R m × n → R is a closed, proper, convex function Vector-valued problems: minimize k ( � diag ( x ) � ) + h ( x ) x ∈ R n subject to rank ( diag ( x )) ≤ r � �� � card ( x )
Low-Rank Inducing Norms Example: Bilinear Regression § Grussler, Giselsson, Rantzer Problem & Motivation Given Y ∈ R m × n , L ∈ R k × m , R ∈ R n × k , k ≤ min { m, n } Low-Rank Inducing Norms � Y − L T XR T � 2 minimize ℓ 2 X ∈ R k × k subject to rank ( X ) ≤ r where • X, Y ∈ R m × n : � X, Y � = trace ( X T Y ) . �� � i σ 2 • � X � ℓ 2 = � X, X � = i ( X ) § I.S. Dhillon ’15
Low-Rank Inducing Norms By assumption rank ( L T XR T ) = rank ( X ) Grussler, � �� � Giselsson, =: M Rantzer Problem & � M � 2 Motivation minimize − 2 � Y, M � + I { M = L T XR T : X ∈ R k × k } ( M ) ℓ 2 M Low-Rank � �� � � �� � Inducing k ( � M � ) h ( M ) Norms subject to rank ( M ) ≤ r Applications: • Machine Learning: Principle Component Analysis, Multivariate Linear Regression, Data Compression, ... • Control: Model Reduction, System Identification, ...
Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Explicit Solution: Problem & Motivation ℓ 2 = { L † Y r R † : Y r ∈ svd r ( Y ) } � Y − L T XR T � 2 argmin Low-Rank Inducing rank ( X ) ≤ r Norms � r � q � � σ i ( Y ) u i v T σ i ( Y ) u i v T svd r ( Y ) := i : Y = i is SVD of Y i =1 i =1 with σ 1 ( Y ) ≥ · · · ≥ σ q ( Y )
Low-Rank Inducing Norms Problem: Convex structural constraints? Grussler, Giselsson, Rantzer ℓ 2 + ˜ � Y − L T XR T � 2 minimize h ( X ) Problem & X Motivation subject to rank ( X ) ≤ r Low-Rank Inducing Norms Examples: • Nonnegative approximation: ˜ h ( X ) = I R k × k ≥ 0 ( X ) . • Hankel approximation: ˜ h ( X ) = I Hankel ( X ) . • Feasibility problems: Y = 0 and ˜ h ( X ) = I C ( X ) . Generally, no closed-form solutions are known!
Low-Rank Inducing Norms Nuclear Norm Regularization Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Standard approach today: Replace rank by nuclear norm § Inducing Norms minimize k ( � X � ) + h ( X ) X subject to � X � ℓ 1 ≤ λ • � X � ℓ 1 = � i σ i ( X ) • λ ≥ 0 is fixed. § Tibshirani, Chen, Donoho, Fazel, Boyd,...
Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Pros: Problem & Motivation Low-Rank • Simple and generic heuristic = ⇒ No PhD needed! Inducing Norms • Probabilistic success guarantees § minimize rank ( X ) minimize � X � ℓ 1 X = ⇒ X subject to A ( X ) = y subject to A ( X ) = y § Cand` es, Tao, Recht, Fazel, Parrilo, Chandrasekaran, ...
Low-Rank Inducing Norms Baboon Approximation Grussler, Giselsson, Rantzer Problem & � Y − X � 2 Motivation minimize ℓ 2 + I R m × n ≥ 0 ( X ) X Low-Rank Inducing subject to rank ( X ) ≤ r Norms 0 . 3 � A − ( · ) � ℓ 2 0 . 25 � A � ℓ 2 0 . 2 0 . 15 0 . 1 0 . 05 1 20 40 60 80 rank
Low-Rank Inducing Norms Grussler, minimize k ( � X � ) + h ( X ) + λ � X � ℓ 1 Giselsson, X Rantzer � �� � bias Problem & Motivation Cons: Low-Rank Inducing • Bias = Norms ⇒ May not solve the non-convex problem, e.g., Low-rank approximation • No a posteriori check if the non-convex problem is solved • Deterministic structure? • Requires to sweep over a regularization parameter ⇒ Cross-validation Goal of this talk: Fix it for our problem class!
Low-Rank Inducing Norms Modifications Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms Replace � · � ℓ 1 with � · � s § minimize k ( � X � ) + h ( X ) + λ � X � s . X � �� � bias Problem: Nothing really changed! § Argyriou, Bach, Chandrasekaran, Eriksson, Mairal, Obozinski,...
Low-Rank Inducing Norms Convex Envelope Grussler, Giselsson, Rantzer Problem & Motivation f ∗∗ ( X ) min f ( X ) = min Low-Rank Inducing X X Norms f ∗∗ f ∗∗ ( X ) = ( f ∗ ) ∗ ( X ) f ( X ) ≥ f ∗∗ ( X ) X � ∗∗ unknown! � Problem: k ( � · � ) + I rank ( · ) ≤ r + h
Low-Rank Inducing Norms Old idea § Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Replace k ( � · � ) + I rank ( · ) ≤ r ( · ) with Norms � � ∗∗ k ( � · � ) + I rank ( · ) ≤ r Fact: � ∗∗ = k • � �� � ∗∗ � k ( � · � ) + I rank ( · ) ≤ r � · � + I rank ( · ) ≤ r § Lemar´ i f ∗∗ echal 1973: min x � i f i ( x i ) → min x � i ( x i )
Low-Rank Inducing Norms Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank Inducing Norms � X � g := g ( σ 1 ( X ) , . . . , σ min { m,n } ( X )) Example: � X � ℓ 2 − → g ( x ) = � x � ℓ 2 � X � ℓ 1 − → g ( x ) = � x � ℓ 1
Low-Rank Inducing Norms Grussler, Giselsson, Rantzer Dual norm Problem & Motivation � X, Y � = g D ( σ 1 ( Y ) , . . . , σ min { m,n } ( Y )) � Y � g D := sup Low-Rank � X � g ≤ 1 Inducing Norms Examples: � Y � ℓ D 2 = � Y � ℓ 2 � Y � ℓ D 1 = � Y � ℓ ∞ = σ 1 ( Y )
Low-Rank Inducing Norms Grussler, Giselsson, Truncated dual norms Rantzer � X, Y � = g D ( σ 1 ( Y ) , . . . , σ r ( Y )) Problem & � Y � g D ,r := sup Motivation � �� � � X � g ≤ 1 = g D ( σ 1 ( Y ) ,...,σ r ( Y ) , 0 ..., 0) Low-Rank rank ( X ) ≤ r Inducing Norms Examples: � � r � � σ 2 � Y � ℓ D 2 ,r = i ( Y ) � i =1 � Y � ℓ D 1 ,r = � Y � ℓ ∞
Low-Rank Inducing Low-rank inducing norms § Norms Grussler, Giselsson, � X � g,r ∗ := sup � X, Y � . Rantzer � Y � gD,r ≤ 1 Problem & Motivation Low-Rank • If � · � g SDP representable = ⇒ � · � g,r ∗ SDP repres. Inducing Norms • If prox �·� g computable = ⇒ prox �·� g,r ∗ computable = ⇒ prox I �·� g,r ∗≤ t ( · , t ) computable = ⇒ k ( � · � g,r ∗ ) = min k ( t ) + I �·� g,r ∗ ≤ t ( · , t ) t Complexity for g = ℓ 2 , ℓ ∞ : SVD + O ( n log n ) ( n = # SVs) § Atomic norms, Overlapping norms, Support norms
Low-Rank Inducing Norms Geometric Interpretation Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank g,r ∗ := { X ∈ R m × n : � X � g,r ∗ ≤ 1 } B 1 Inducing Norms E g,r := { X ∈ R m × n : � X � g = 1 , rank ( X ) ≤ r } • B 1 g,r ∗ = conv ( E g,r ) B 1 • � X � g ≤ � X � g,r ∗ g,r ∗ • � X � g = � X � g,r ∗ , rank ( X ) ≤ r .
Low-Rank Inducing Norms Grussler, minimize � X � g minimize � X � g,r ∗ Giselsson, Rantzer X X ⇔ subject to A ( X ) = y, subject to A ( X ) = y, Problem & rank ( X ) ≤ r rank ( X ) ≤ r Motivation Low-Rank Inducing Norms A ( X ) = y
Low-Rank Inducing Norms Grussler, minimize � X � g minimize � X � g,r ∗ Giselsson, Rantzer X X ⇔ subject to A ( X ) = y, subject to A ( X ) = y, Problem & rank ( X ) ≤ r rank ( X ) ≤ r Motivation Low-Rank Inducing Norms A ( X ) = y
Low-Rank Inducing Norms Best Convex Relaxation Grussler, Giselsson, Rantzer Problem & Motivation Low-Rank min [ k ( � X � g ) + h ( X )] ≥ X ∈ R m × n [ k ( � X � g,r ∗ ) + h ( X )] min Inducing X ∈ R m × n Norms rank ( X ) ≤ r Best in the sense: • ( k ( � · � g ) + I rank ( · ) ≤ r ( · ) + h ) ∗∗ unknown • Simple a posteriori test for optimality • Sweep over discrete r instead of λ = ⇒ Cross-validation ← → zero-duality gap Cost function replaced – NO BIAS!
Low-Rank Inducing Norms Nuclear Norm Grussler, Giselsson, Rantzer Standard interpretation: Problem & Motivation Low-Rank � · � ℓ 1 = ( rank ( · ) + I �·� ℓ ∞≤ 1 ) ∗∗ Inducing Norms Our interpretation # 1: � · � ℓ 1 = ( � · � ℓ 1 + I rank ( · ) ≤ r ) ∗∗ Our interpretation # 2: � X � ℓ 1 = � X � g, 1 ∗ ≥ · · · ≥ � X � g,r ∗ ≥ . . . ≥ � X � g,q ∗ = � X � g min [ k ( � X � g ) + h ( X )] ≥ X ∈ R m × n [ k ( � X � ℓ 1 ) + h ( X )] min X ∈ R m × n rank ( X ) ≤ 1
Low-Rank Inducing Norms Some good news Grussler, Giselsson, Rantzer Problem & Motivation • Zero-duality gap for bilinear regression Low-Rank Inducing Norms � Y − L T XR T � 2 minimize ℓ 2 X ∈ R k × k subject to rank ( X ) ≤ r • Optimality interpretations, e.g., iterative re-weighting min [ k ( � WX � g ) + h ( X )] X ∈ R m × n rank ( X ) ≤ r ≥ X ∈ R m × n [ k ( � WX � g,r ∗ ) + h ( X )] min
Recommend
More recommend