Author: Herve Jegou
Summary:
Accuracy, efficiency, and the memory usage are the three constraints that have to be considered jointly in image searching on a large scale.
The proposed approach contains three main parts to optimize the accuracy:
1. The presentation: how to aggregate local image descriptors into a vector representation
The proposed algorithm, VLAD(vector of locally aggregated descriptors), is to accumulate, for each visual word ci, the differences x-ci of the vectors x assigned to ci. It can be seen as a simplification of the Fisher kernel(only consider the "mean" factor). As in figure 1, we can observe that similar pictures have similar VLAD descriptors.
2. Dimensionality reduction of the vectors
It used principal component analysis(PCA) for dimensionality reduction: the eigenvectors associated with the D' most energetic eigenvalues of the covariance matrix are used to define a matrix M mapping vector x(D dimension) to x'=Mx(D' dimension). Then, by using the ADC(asymmetric distance computation) approach, it encoded the vector x' to q(x').
3. The indexing algorithm
Dataset: Holidays, UKB, Flickr
- The choice of D' is constrained by the structure of ADC, which D' is a multiple of m.
- The optimization is solely based on the mean square error quantization criterion.
- There is a tradeoff on D'. If D' is large, the projection error vector is limited but a large quantization error is introduced.
- The proposed approach significantly outperforms the state of the art.
Comments:
The simplification of GMM and proposed a new method is terrific! However, the paper doesn't explain why the weight and the variance in the GMM are not be considered. What if the weight and the variance are included? Do they really have no impact on the results?
沒有留言:
張貼留言