L M A D A Learning And Mining from DatA NANJING UNIVERSITY - - PowerPoint PPT Presentation

l m
SMART_READER_LITE
LIVE PREVIEW

L M A D A Learning And Mining from DatA NANJING UNIVERSITY - - PowerPoint PPT Presentation

Introduction Algorithms Experiments Conclusion L M A D A Learning And Mining from DatA NANJING UNIVERSITY Projection-free Distributed Online Convex Optimization with O ( T ) Communication Complexity Yuanyu Wan 1 , Wei-Wei Tu 2 and


slide-1
SLIDE 1

Introduction Algorithms Experiments Conclusion

Projection-free Distributed Online Convex Optimization with O( √ T) Communication Complexity

Yuanyu Wan1, Wei-Wei Tu2 and Lijun Zhang1

  • 1Dept. of Computer Science and Technology, Nanjing University

24Paradigm Inc., Beijing, China

ICML 2020

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

NANJING UNIVERSITY

slide-2
SLIDE 2

Introduction Algorithms Experiments Conclusion

Outline

1

Introduction Background The Problem and Our Contributions

2

Our Algorithms D-BOCG for Full Information Setting D-BBCG for Bandit Setting

3

Experiments

4

Conclusion

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-3
SLIDE 3

Introduction Algorithms Experiments Conclusion Background The Problem and Our Contributions

Outline

1

Introduction Background The Problem and Our Contributions

2

Our Algorithms D-BOCG for Full Information Setting D-BBCG for Bandit Setting

3

Experiments

4

Conclusion

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-4
SLIDE 4

Introduction Algorithms Experiments Conclusion Background The Problem and Our Contributions

Outline

1

Introduction Background The Problem and Our Contributions

2

Our Algorithms D-BOCG for Full Information Setting D-BBCG for Bandit Setting

3

Experiments

4

Conclusion

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-5
SLIDE 5

Introduction Algorithms Experiments Conclusion Background The Problem and Our Contributions

Distributed Online Learning over a Network

Formal definition

1: for t = 1, 2, . . . , T do 2:

for each local learner i ∈ [n] do

3:

pick a decision xi(t) ∈ K receive a convex loss function ft,i(x) : K → R

4:

communicate with its neighbors and update xi(t)

5:

end for

6: end for

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-6
SLIDE 6

Introduction Algorithms Experiments Conclusion Background The Problem and Our Contributions

Distributed Online Learning over a Network

Formal definition

1: for t = 1, 2, . . . , T do 2:

for each local learner i ∈ [n] do

3:

pick a decision xi(t) ∈ K receive a convex loss function ft,i(x) : K → R

4:

communicate with its neighbors and update xi(t)

5:

end for

6: end for

the network is defined as G = (V, E), V = [n] each node i ∈ [n] is a local learner node i can only communicate with its immediate neighbors Ni = {j ∈ V|(i, j) ∈ E} the global loss function is defined as ft(x) = n

j=1 ft,j(x)

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-7
SLIDE 7

Introduction Algorithms Experiments Conclusion Background The Problem and Our Contributions

Distributed Online Learning over a Network

Formal definition

1: for t = 1, 2, . . . , T do 2:

for each local learner i ∈ [n] do

3:

pick a decision xi(t) ∈ K receive a convex loss function ft,i(x) : K → R

4:

communicate with its neighbors and update xi(t)

5:

end for

6: end for

Regret of local learner i RT,i =

T

  • t=1

ft(xi(t)) − min

x∈K T

  • t=1

ft(x)

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-8
SLIDE 8

Introduction Algorithms Experiments Conclusion Background The Problem and Our Contributions

Distributed Online Learning over a Network

Formal definition

1: for t = 1, 2, . . . , T do 2:

for each local learner i ∈ [n] do

3:

pick a decision xi(t) ∈ K receive a convex loss function ft,i(x) : K → R

4:

communicate with its neighbors and update xi(t)

5:

end for

6: end for

Regret of local learner i RT,i =

T

  • t=1

ft(xi(t)) − min

x∈K T

  • t=1

ft(x) Applications multi-agent coordination distributed tracking in sensor networks

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-9
SLIDE 9

Introduction Algorithms Experiments Conclusion Background The Problem and Our Contributions

Projection-based Methods

Distributed Online Dual Averaging [Hosseini et al., 2013]

1: for each local learner i ∈ [n] do 2:

Play xi(t) and compute gi(t) = ∇ft,i(xi(t))

3:

zi(t + 1) =

j∈Ni Pijzj(t) + gi(t)

4:

xi(t + 1) = Πψ

K(zi(t + 1), α(t))

5: end for

Pij > 0 only if (i, j) ∈ E or Pij = 0 ψ(x) : K → R is a proximal function, e.g., ψ(x) = x2

2

projection step: Πψ

K(z, α) = argminx∈K z⊤x + 1 αψ(x)

α(t) = O(1/ √ t) → RT,i = O( √ T)

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-10
SLIDE 10

Introduction Algorithms Experiments Conclusion Background The Problem and Our Contributions

Projection-based Methods

Distributed Online Dual Averaging [Hosseini et al., 2013]

1: for each local learner i ∈ [n] do 2:

Play xi(t) and compute gi(t) = ∇ft,i(xi(t))

3:

zi(t + 1) =

j∈Ni Pijzj(t) + gi(t)

4:

xi(t + 1) = Πψ

K(zi(t + 1), α(t))

5: end for

Pij > 0 only if (i, j) ∈ E or Pij = 0 ψ(x) : K → R is a proximal function, e.g., ψ(x) = x2

2

projection step: Πψ

K(z, α) = argminx∈K z⊤x + 1 αψ(x)

α(t) = O(1/ √ t) → RT,i = O( √ T) Distributed Online Gradient Descent [Ram et al., 2010] also need a projection step

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-11
SLIDE 11

Introduction Algorithms Experiments Conclusion Background The Problem and Our Contributions

Projection-free Methods

Motivation: the projection step could be time-consuming if K is a trace norm ball, it requires SVD of a matrix

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-12
SLIDE 12

Introduction Algorithms Experiments Conclusion Background The Problem and Our Contributions

Projection-free Methods

Motivation: the projection step could be time-consuming if K is a trace norm ball, it requires SVD of a matrix Distributed Online Conditional Gradient [Zhang et al., 2017]

1: for each local learner i ∈ [n] do 2:

Play xi(t) and compute gi(t) = ∇ft,i(xi(t))

3:

vi = argminx∈K ∇Ft,i(xi(t))⊤x

4:

xi(t + 1) = xi(t) + st(vi − xi(t))

5:

zi(t + 1) =

j∈Ni Pijzj(t) + gi(t)

6: end for

Ft,i(x) = ηzi(t)⊤x + x − x1(1)2

2

η = O(T −3/4), st = 1/ √ t → RT,i = O(T 3/4)

  • nly contain linear optimization step (step 3)

T communication rounds

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-13
SLIDE 13

Introduction Algorithms Experiments Conclusion Background The Problem and Our Contributions

Outline

1

Introduction Background The Problem and Our Contributions

2

Our Algorithms D-BOCG for Full Information Setting D-BBCG for Bandit Setting

3

Experiments

4

Conclusion

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-14
SLIDE 14

Introduction Algorithms Experiments Conclusion Background The Problem and Our Contributions

Question

Can the O(T) communication complexity of distributed online conditional gradient (D-OCG) be reduced?

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

slide-15
SLIDE 15

Introduction Algorithms Experiments Conclusion Background The Problem and Our Contributions

Question

Can the O(T) communication complexity of distributed online conditional gradient (D-OCG) be reduced? An affirmative and non-trivial answer distributed block online conditional gradient (D-BOCG) communication complexity: from O(T) to O( √ T) regret bound: O(T 3/4)

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

slide-16
SLIDE 16

Introduction Algorithms Experiments Conclusion Background The Problem and Our Contributions

Question

Can the O(T) communication complexity of distributed online conditional gradient (D-OCG) be reduced? An affirmative and non-trivial answer distributed block online conditional gradient (D-BOCG) communication complexity: from O(T) to O( √ T) regret bound: O(T 3/4) An extension to the bandit setting distributed block bandit conditional gradient (D-BBCG) communication complexity: O( √ T) high-probability regret bound: O(T 3/4)

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

slide-17
SLIDE 17

Introduction Algorithms Experiments Conclusion D-BOCG for Full Information Setting D-BBCG for Bandit Setting

Outline

1

Introduction Background The Problem and Our Contributions

2

Our Algorithms D-BOCG for Full Information Setting D-BBCG for Bandit Setting

3

Experiments

4

Conclusion

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-18
SLIDE 18

Introduction Algorithms Experiments Conclusion D-BOCG for Full Information Setting D-BBCG for Bandit Setting

Outline

1

Introduction Background The Problem and Our Contributions

2

Our Algorithms D-BOCG for Full Information Setting D-BBCG for Bandit Setting

3

Experiments

4

Conclusion

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-19
SLIDE 19

Introduction Algorithms Experiments Conclusion D-BOCG for Full Information Setting D-BBCG for Bandit Setting

Main Idea

Delayed update mechanism

block 1 block m block

  • nly update in the beginning of each block
  • nly need

√ T communication rounds

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-20
SLIDE 20

Introduction Algorithms Experiments Conclusion D-BOCG for Full Information Setting D-BBCG for Bandit Setting

Main Idea

Delayed update mechanism

block 1 block m block

  • nly update in the beginning of each block
  • nly need

√ T communication rounds Iterative linear optimization steps recall the update rules of D-OCG vi = argmin

x∈K

∇Ft,i(xi(t))⊤x xi(t + 1) = xi(t) + st(vi − xi(t)) delayed update + D-OCG: a worse regret bound multiple linear optimization steps for each update

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-21
SLIDE 21

Introduction Algorithms Experiments Conclusion D-BOCG for Full Information Setting D-BBCG for Bandit Setting

Conditional Gradient with Stopping Condition (CGSC)

CGSC [Garber and Kretzu, 2019]

1: Input: feasible set K, ǫ > 0, L, F(x), xin 2: τ = 0, c1 = xin 3: repeat 4:

τ = τ + 1

5:

vτ ∈ argmin

x∈K

∇F(cτ)⊤x

6:

sτ = argmin

s∈[0,1]

F(cτ + s(vτ − cτ))

7:

cτ+1 = cτ + sτ(vτ − cτ)

8: until ∇F(cτ)⊤(cτ − vτ) ≤ ǫ or τ = L 9: return xout = cτ

F(xout) is very small with appropriate L and ǫ it was widely studied [Frank and Wolfe, 1956, Jaggi, 2013]

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-22
SLIDE 22

Introduction Algorithms Experiments Conclusion D-BOCG for Full Information Setting D-BBCG for Bandit Setting

The Proposed D-BOCG Algorithm

1: Initialization: choose {xi(1) = 0 ∈ K|i ∈ V} and set

{zi(1) = 0|i ∈ V}

2: for t = 1, · · · , T do 3:

mt = ⌈t/K⌉

4:

for each local learner i ∈ V do

5:

if t > 1 and mod(t, K) = 1 then

6:

  • gi(mt − 1) = t−1

k=t−K gi(k)

7:

zi(mt) =

j∈Ni Pijzj(mt − 1) +

gi(mt − 1)

8:

define Fmt,i(x) = ηzi(mt)⊤x + x2

2

9:

xi(mt) = CGSC(K, ǫ, L, Fmt,i(x), xi(mt − 1))

10:

end if

11:

play xi(mt) and observe gi(t) = ∇ft,i(xi(mt))

12:

end for

13: end for

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-23
SLIDE 23

Introduction Algorithms Experiments Conclusion D-BOCG for Full Information Setting D-BBCG for Bandit Setting

Regret of D-BOCG

Theorem 1 Let η = O(T −3/4), ǫ = O(T −1/2), K = √ T and L = O( √ T). For any i ∈ V, D-BOCG has RT,i ≤ O(GRT 3/4). Assumptions |ft,i(x) − ft,i(y)| ≤ Gx − y2 for any x, y ∈ K rBd ⊆ K ⊆ RBd, Bd is the unit Euclidean ball P ∈ Rn×n is symmetric and doubly stochastic, i.e., P = P⊤, 1⊤P = 1⊤, P1 = 1

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-24
SLIDE 24

Introduction Algorithms Experiments Conclusion D-BOCG for Full Information Setting D-BBCG for Bandit Setting

Regret of D-BOCG

Theorem 1 Let η = O(T −3/4), ǫ = O(T −1/2), K = √ T and L = O( √ T). For any i ∈ V, D-BOCG has RT,i ≤ O(GRT 3/4). Assumptions |ft,i(x) − ft,i(y)| ≤ Gx − y2 for any x, y ∈ K rBd ⊆ K ⊆ RBd, Bd is the unit Euclidean ball P ∈ Rn×n is symmetric and doubly stochastic, i.e., P = P⊤, 1⊤P = 1⊤, P1 = 1 Remarks regret bound: RT,i = O(T 3/4) #communication rounds: T/K = √ T #linear optimization steps: LT/K = O(T)

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-25
SLIDE 25

Introduction Algorithms Experiments Conclusion D-BOCG for Full Information Setting D-BBCG for Bandit Setting

Outline

1

Introduction Background The Problem and Our Contributions

2

Our Algorithms D-BOCG for Full Information Setting D-BBCG for Bandit Setting

3

Experiments

4

Conclusion

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-26
SLIDE 26

Introduction Algorithms Experiments Conclusion D-BOCG for Full Information Setting D-BBCG for Bandit Setting

Standard Technique

Bandit setting

  • nly the loss value is available to learners

the main challenge is due to the lack of gradient

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-27
SLIDE 27

Introduction Algorithms Experiments Conclusion D-BOCG for Full Information Setting D-BBCG for Bandit Setting

Standard Technique

Bandit setting

  • nly the loss value is available to learners

the main challenge is due to the lack of gradient One-point Gradient Estimator [Flaxman et al., 2005] δ-smoothed version of f(x)

  • fδ(x) = Eu∼Bd[f(x + δu)]

let δ > 0 and Sd be the unit sphere ∇ fδ(x) = Eu∼Sd d δ f(x + δu)u

  • nly observe the single value f(x + δu)

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-28
SLIDE 28

Introduction Algorithms Experiments Conclusion D-BOCG for Full Information Setting D-BBCG for Bandit Setting

Standard Technique

Bandit setting

  • nly the loss value is available to learners

the main challenge is due to the lack of gradient One-point Gradient Estimator [Flaxman et al., 2005] δ-smoothed version of f(x)

  • fδ(x) = Eu∼Bd[f(x + δu)]

let δ > 0 and Sd be the unit sphere ∇ fδ(x) = Eu∼Sd d δ f(x + δu)u

  • nly observe the single value f(x + δu)

A smaller set Kδ ⊆ K Kδ = (1 − δ/r)K = {(1 − δ/r)x|x ∈ K}, 0 < δ ≤ r x + δu ∈ K for x ∈ Kδ, u ∼ S

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-29
SLIDE 29

Introduction Algorithms Experiments Conclusion D-BOCG for Full Information Setting D-BBCG for Bandit Setting

The Proposed D-BBCG Algorithm

1: Initialization: choose {xi(1) = 0 ∈ Kδ|i ∈ V} and set

{zi(1) = 0|i ∈ V}

2: for t = 1, · · · , T do 3:

mt = ⌈t/K⌉

4:

for each local learner i ∈ V do

5:

if t > 1 and mod(t, K) = 1 then

6:

  • gi(mt − 1) = t−1

k=t−K gi(k)

7:

zi(mt) =

j∈Ni Pijzi(mt − 1) +

gi(mt − 1)

8:

define Fmt,i(x) = ηzi(mt)⊤x + x2

2

9:

xi(mt) = CGSC(Kδ, ǫ, L, Fmt,i(x), xi(mt − 1))

10:

end if

11:

ui(t) ∼ Sd

12:

play yi(t) = xi(mt) + δui(t) and observe ft,i(yi(t))

13:

gi(t) = d

δ ft,i(yi(t))ui(t)

14:

end for

15: end for

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-30
SLIDE 30

Introduction Algorithms Experiments Conclusion D-BOCG for Full Information Setting D-BBCG for Bandit Setting

Regret of D-BBCG

Theorem 2 Let η = O(T −3/4), δ = O(T −1/4), ǫ = O(T −1/2), K = T 1/2 and L = O( √ T). For any i ∈ V, with high probability, D-BBCG has RT,i ≤ O(T 3/4). Additional Assumption all local loss functions are chosen beforehand

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-31
SLIDE 31

Introduction Algorithms Experiments Conclusion D-BOCG for Full Information Setting D-BBCG for Bandit Setting

Regret of D-BBCG

Theorem 2 Let η = O(T −3/4), δ = O(T −1/4), ǫ = O(T −1/2), K = T 1/2 and L = O( √ T). For any i ∈ V, with high probability, D-BBCG has RT,i ≤ O(T 3/4). Additional Assumption all local loss functions are chosen beforehand Remarks high-probability regret bound: RT,i = O(T 3/4) #communication rounds: T/K = √ T #linear optimization steps: LT/K = O(T)

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-32
SLIDE 32

Introduction Algorithms Experiments Conclusion

Outline

1

Introduction Background The Problem and Our Contributions

2

Our Algorithms D-BOCG for Full Information Setting D-BBCG for Bandit Setting

3

Experiments

4

Conclusion

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-33
SLIDE 33

Introduction Algorithms Experiments Conclusion

Experimental Settings

Distributed multiclass classification [Zhang et al., 2017]

1: for t = 1, 2, . . . , T do 2:

for each local learner i ∈ [n] do

3:

receive an example ei(t) ∈ Rk, and choose Xi(t) = [x⊤

1 ; x⊤ 2 ; · · · ; x⊤ h ] ∈ K

4:

receive the true label yi(t), and suffer the multivariate logistic loss ft,i(Xi(t)) = log

  • 1 +

ℓ=yi(t) e x⊤

ℓ ei(t)−x⊤ yi (t)ei(t)

5:

communicate with its neighbors and update Xi(t)

6:

end for

7: end for

K = {X ∈ Rh×k|X∗ ≤ τ}, where X∗ denotes the trace norm of X and τ is a constant the network is a cycle graph with 9 nodes

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-34
SLIDE 34

Introduction Algorithms Experiments Conclusion

Experimental Results

aloi dataset from the LIBSVM repository [Chang and Lin, 2011]

500 1000 1500 2000 6.4 6.6 6.8 7 7.2

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-35
SLIDE 35

Introduction Algorithms Experiments Conclusion

Experimental Results

poker dataset from the LIBSVM repository

50 100 150 200 250 300 350 1 1.2 1.4 1.6 1.8 2 2.2 2.4

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-36
SLIDE 36

Introduction Algorithms Experiments Conclusion

Outline

1

Introduction Background The Problem and Our Contributions

2

Our Algorithms D-BOCG for Full Information Setting D-BBCG for Bandit Setting

3

Experiments

4

Conclusion

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-37
SLIDE 37

Introduction Algorithms Experiments Conclusion

Conclusion and Future Work

Conclusion D-BOCG enjoys an O(T 3/4) regret bound with only O( √ T) communication rounds D-BBCG for bandit setting enjoys a high-probability O(T 3/4) regret bound with only O( √ T) communication rounds Future Work improve the regret bound of projection-free distributed on- line learning by utilizing the curvature of functions

http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L

slide-38
SLIDE 38

Introduction Algorithms Experiments Conclusion

Reference I

Chang, C.-C. and Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(27):1–27. Flaxman, A. D., Kalai, A. T., and McMahan, H. B. (2005). Online convex optimization in the bandit setting: Gradient descent without a gradient. In Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 385–394. Frank, M. and Wolfe, P . (1956). An algorithm for quadratic programming. Naval Research Logistics Quarterly, 3(1–2):95–110. Garber, D. and Kretzu, B. (2019). Improved regret bounds for projection-free bandit convex optimization. arXiv:1910.03374. Hosseini, S., Chapman, A., and Mesbahi, M. (2013). Online distributed optimization via dual averaging. In 52nd IEEE Conference on Decision and Control, pages 1484–1489. Jaggi, M. (2013). Revisiting frank-wolfe: Projection-free sparse convex optimization. In Proceedings of the 30th International Conference on Machine Learning, pages 427–435. http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Thanks!

Learning And Mining from DatA

A DA

M L

slide-39
SLIDE 39

Introduction Algorithms Experiments Conclusion

Reference II

Ram, S. S., Nedi´ c, A., and Veeravalli, V. V. (2010). Distributed stochastic subgradient projection algorithms for convex optimization. Journal of Optimization Theory and Applications, 147(3):516–545. Zhang, W., Zhao, P ., Zhu, W., Hoi, S. C. H., and Zhang, T. (2017). Projection-free distributed online learning in networks. In Proceedings of the 34th International Conference on Machine Learning, pages 4054–4062. http://www.lambda.nju.edu.cn/wanyy Projection-free Distributed Online Learning

Learning And Mining from DatA

A DA

M L