2013年6月6日 星期四

[Summary] Online Dictionary Learning for Sparse Coding

Topic: Online Dictionary Learning for Sparse Coding

Author: Mairal

Summary:

Sparse coding—that is, modeling data vectors as sparse linear combination of basis elements. This paper focuses on learning the basis set, also called dictionary, to adapt it to specific data.

Problem Statement
Traditional dictionary learning can be considered as optimizing the empirical cost function

where D is the dictionary, and l is a loss function such that l(x,D) should be small if D is “good” at representing the signal x.

The the loss function l, which is defined as the distance between the original signal and the dictionary representation, plus the regularization term

Online Dictionary Learning
The algorithm is as below:

Assuming the training set composed of i.i.d. samples of a distribution p(x), its inner loop draws one element xt at a time, as in stochastic gradient descent, and alternates classical
sparse coding steps for computing the decomposition alpha t of xt over the dictionary Dt−1 obtained at the previous iteration, with dictionary update steps where the new dictionary
Dt is computed by minimizing cost function.

Dictionary Update

The algorithm for updating the dictionary uses blockcoordinate descent with warm restarts, and one of its main advantages is that it is parameter-free and does not require any learning rate tuning.

Optimizing the Algorithm
  • Handling Fixed-Size Datasets: The size of the training set is often finite. In this situation, the same data points may be examined several times, and it is very common in online algorithms to simulate an i.i.d. sampling of p( x ) by cycling over a randomly permuted training set.
  • Mini-Batch Extension: Improve the convergence speed of algorithm by drawing η>1 signals at each iteration instead of a single one, which is a classical heuristic in stochastic gradient descent algorithm.
  • Purging the Dictionary from Unused Atoms: Some of the dictionary atoms are never used. Replace them during the optimization by elements of the training set.

沒有留言:

張貼留言