Profile picture
Kishan Chakraborty
Research Scholar
Home
Blog
Back

Online Convex Optimization 1: Introduction

1. Model Formulation

Let K⊆Rn\mathcal{K} \subseteq \mathbb{R}^nK⊆Rn be a convex set available to a user to take actions from for TTT rounds.

  1. At iteration t, a user chosses xt∈Kx_t \in Kxt​∈K
  2. A bounded convex function ft∈F:K→Rf_t \in \mathcal{F}: \mathcal{K} \rightarrow \mathbb{R}ft​∈F:K→R is revealed to it, F\mathcal{F}F is a set of bounded functions.
  3. The user incurs a loss given by ft(xt).f_t(x_t).ft​(xt​).

Let AAA be an algorithm for OCO, which maps certain game history to a decision in K\mathcal{K}K.

xtA=A(f1,f2,⋯ ,ft−1)∈K x_t^A = A(f_1, f_2, \cdots, f_{t-1}) \in \mathcal{K}xtA​=A(f1​,f2​,⋯,ft−1​)∈K

To quantify the performance of the user, introduce a regret function as follows

regretT=∑t=1Tft(xt)−min⁡x∈K∑t=1Tft(x) regret_T = \sum_{t=1}^T f_t(x_t) - \min_{x\in \mathcal{K}} \sum_{t=1}^T f_t(x)regretT​=t=1∑T​ft​(xt​)−x∈Kmin​t=1∑T​ft​(x)

Worst case regret is given by:

RegretT=sup⁡(f1,f2,⋯ ,fT)∈F(∑t=1Tft(xt)−min⁡x∈K∑t=1Tft(x)) Regret_T = \sup_{(f_1, f_2, \cdots, f_T) \in \mathcal{F}} \left(\sum_{t=1}^T f_t(x_t) - \min_{x\in \mathcal{K}} \sum_{t=1}^T f_t(x)\right)RegretT​=(f1​,f2​,⋯,fT​)∈Fsup​(t=1∑T​ft​(xt​)−x∈Kmin​t=1∑T​ft​(x))

2. First Order Algorithm

The goal of an OCO algorithm is to minimize the regret, rather than the optimization error. There is a relationship between the regret and optimization error but this is out of scope for now.

2.1 Online Gradient Descent (Algorithm)

Algorithm:OnlineConvexOptimization1. Input: A convex set K,T,x1∈K,step size η2. for t=1,2,⋯ ,T do3. Play action xt and observe cost ft(xt)4. Update and project: yt+1=xt−ηt∇ft(xt)←Towards the direction of least cost.xt+1=∏k(yt+1)←Project yt+1 back to K.5. end for \begin{array}{l} \bold{Algorithm: Online Convex Optimization} \\ \text{1. Input: A convex set } \mathcal{K}, T, x_1 \in \mathcal{K}, \text{step size } \eta \\ \text{2. for } t=1, 2, \cdots, T \text{ do}\\ \text{3. } \quad \text{Play action } x_t \text{ and observe cost } f_t(x_t) \\ \text{4. } \quad \text{Update and project: }\\ \begin{aligned} \quad \quad \quad y_{t+1} &= x_t - \eta_t \nabla f_t(x_t) \leftarrow \text{Towards the direction of least cost.}\\ \quad \quad \quad x_{t+1} &= \prod_{\mathcal{k}}(y_{t+1}) \leftarrow \text{Project } y_{t+1} \text{ back to } \mathcal{K}. \end{aligned}\\ \text{5. end for } \end{array}Algorithm:OnlineConvexOptimization1. Input: A convex set K,T,x1​∈K,step size η2. for t=1,2,⋯,T do3. Play action xt​ and observe cost ft​(xt​)4. Update and project: yt+1​xt+1​​=xt​−ηt​∇ft​(xt​)←Towards the direction of least cost.=k∏​(yt+1​)←Project yt+1​ back to K.​5. end for ​

Line 4 of the above algorithm, first take a step opposite to the direction of steepest ascent of the last cost function. Such a step may take the resultant vector out of the decision set K\mathcal{K}K. Therefore, we need to project yt+1y_{t+1}yt+1​ back to K\mathcal{K}K to calculate the next action xt+1x_{t+1}xt+1​.

Despite the fact that the next cost function ft+1f_{t+1}ft+1​ can be very different from the cost observed thus far, the regret attained by the algorithm is sublinear.

2.2 Analysis of online gradient descent

Theorem 2.1 Online gradient descent with step size ηt=DGt,  t∈[T]\eta_t = \frac{D}{G \sqrt{t}},~~t \in [T]ηt​=Gt​D​,  t∈[T] guarantees the following for all T≥1T \ge 1T≥1

RegretT=sup⁡(f1,f2,⋯ ,fT)∈F(∑t=1Tft(xt)−min⁡x∈K∑t=1Tft(x))≤32GDT Regret_T = \sup_{(f_1, f_2, \cdots, f_T) \in \mathcal{F}} \left(\sum_{t=1}^T f_t(x_t) - \min_{x\in \mathcal{K}} \sum_{t=1}^T f_t(x)\right) \le \frac{3}{2} GD \sqrt{T}RegretT​=(f1​,f2​,⋯,fT​)∈Fsup​(t=1∑T​ft​(xt​)−x∈Kmin​t=1∑T​ft​(x))≤23​GDT​

where, G>0G > 0G>0 an upper bound on the norm of the subgradients of fff over K,\mathcal{K},K, i.e., ∣∣∇f(x)∣∣≤G  ∀G∈K||\nabla f(x)|| \le G ~~ \forall G \in \mathcal{K}∣∣∇f(x)∣∣≤G  ∀G∈K and DDD is an upper bound on the diameter of K:i.e.,∀x,y∈K,∣∣x−y∣∣≤D\mathcal{K}: i.e., \forall x,y \in \mathcal{K}, ||x-y|| \le DK:i.e.,∀x,y∈K,∣∣x−y∣∣≤D

An interesting observation about Online Convex Optimization is that any algorithm incurs a regret bounded by T\sqrt{T}T​. More formally
Theorem 2.2 Any algorithm for online convex optimization incurs Ω(DGT)\Omega (DG \sqrt{T})Ω(DGT​) regret in worst case. This is even true when the the cost functions are generated from a fixed stationary distribution.

Readers familier with Multi-armed Bandit might be worndering about logarithmic regret in OCO problems. Yes, there are certain problems where logarithmic regret is achievable but this is out of scope for this article.