Online Convex Optimization 1: Introduction
Let K⊆Rn be a convex set available to a user to take actions from for T rounds.
- At iteration t, a user chosses xt∈K
- A bounded convex function ft∈F:K→R is revealed to it, F is a set of bounded functions.
- The user incurs a loss given by ft(xt).
Let A be an algorithm for OCO, which maps certain game history to a decision in K.
xtA=A(f1,f2,⋯,ft−1)∈K
To quantify the performance of the user, introduce a regret function as follows
regretT=t=1∑Tft(xt)−x∈Kmint=1∑Tft(x)
Worst case regret is given by:
RegretT=(f1,f2,⋯,fT)∈Fsup(t=1∑Tft(xt)−x∈Kmint=1∑Tft(x))
2. First Order Algorithm
The goal of an OCO algorithm is to minimize the regret, rather than the optimization error.
There is a relationship between the regret and optimization error but this is out of scope for now.
2.1 Online Gradient Descent (Algorithm)
Algorithm:OnlineConvexOptimization1. Input: A convex set K,T,x1∈K,step size η2. for t=1,2,⋯,T do3. Play action xt and observe cost ft(xt)4. Update and project: yt+1xt+1=xt−ηt∇ft(xt)←Towards the direction of least cost.=k∏(yt+1)←Project yt+1 back to K.5. end for
Line 4 of the above algorithm, first take a step opposite to the direction of steepest ascent of the last cost function.
Such a step may take the resultant vector out of the decision set K. Therefore, we need to project yt+1
back to K to calculate the next action xt+1.
Despite the fact that the next cost function ft+1 can be very different from the cost observed thus far, the regret
attained by the algorithm is sublinear.
2.2 Analysis of online gradient descent
Theorem 2.1 Online gradient descent with step size ηt=GtD, t∈[T]
guarantees the following for all T≥1
RegretT=(f1,f2,⋯,fT)∈Fsup(t=1∑Tft(xt)−x∈Kmint=1∑Tft(x))≤23GDT
where, G>0 an upper bound on the norm of the subgradients of f over K, i.e., ∣∣∇f(x)∣∣≤G ∀G∈K
and D is an upper bound on the diameter of K:i.e.,∀x,y∈K,∣∣x−y∣∣≤D
An interesting observation about Online Convex Optimization is that any algorithm incurs a regret bounded by T. More formally
Theorem 2.2 Any algorithm for online convex optimization incurs Ω(DGT) regret in worst case. This is even true
when the the cost functions are generated from a fixed stationary distribution.
Readers familier with Multi-armed Bandit might be worndering about logarithmic regret in OCO problems. Yes, there are certain problems
where logarithmic regret is achievable but this is out of scope for this article.