2013年3月21日 星期四

[Summary] Probabilistic Latent Semantic Indexing

Topic: Probabilistic Latent Semantic Indexing

Author: T. Hofmann

Summary:

PLSA is an approach to automated document indexing which is based on a statistical latent class model  for factor analysis of count data. The model is able to deal with synonymy and polyonymy.

Latent Semantic Analysis(LSA) usually takes the high dimensional vector space representation of documents based on TF and applies Singular Value Decomposition (SVD) to the corresponding term/document matrix.

PLSA is based on the likelihood principle and defines a proper generative model of the data. The core of PLSA is a statistical model called aspect model. 
  • unobserved class variable z (z1...zK)
  • word w (w1...wM)
  • document d (d1...dN)
The generative model is defined as follows:
  • select a document d with prob. P(d)
  • pick a latent class z with prob. P(z|d)
  • generate a word w with prob P(w|z)
The aspect model is a statistical mixture model which is based on two assumptions:
  1. Observation pairs (d,w) are assumed to be generated independently.
  2. The conditional independence assumption is made that conditioned on the latent class z, word w are generated independently of the specific document d. 
Following the likelihood principle, one determines P(d), P(z|d), and P(w|z) by maximization of the log-likelihood function

where n(d,w) is the term frequency.

The standard procedure for maximum likelihood estimation is the Expectation Maximization (EM) algorithm. It contains two steps:
  1. Expectation (E): Posterior prob. are computed for the latent variables z, based on the current estimates of the parameters.
  2. Maximization (M): Parameters are updated for given posterior prob. computed in the E-step.
In order to prevent overfitting of the traing algorihm, the author introduces a variable beta. The proposal is called tempered EM (TEM).
  1. Beta equals 1 at first, and performs EM until the performance on held-out data deteriorates.
  2. Decrease beta by a constant factor eta, and perform EM continuely.
  3. Stop when decreasing beta does not yield further improvement.
The comparison between Mixture Decomposition and Singular Value Decomposition:
  • If we define matrices by U=(P(di|zk))i,k, V=(P(wj|zk))j,k, Sigma=diag(P(zk))k. Then, the joint prob. model P can be written as P=U(Sigma)V, which can be compared with the SVD in LSA.
  • The weighted sum over outer products between rows of U and V reflects conditional independence in PLSA.
  • The left/ right eigenvectors in SVD are seen to correspond to the factors P(w|z) and the component distributions P(d|z) of the aspect model.
  • The mixing proportions P(z) in PLSA substitute the singular values of the SVD in LSA.
Comments:
PLSA gives a probability meaning to the proposed model, and it takes advantage of statistical standards methods for model fitting, overfitting control and model combination. The same approach is also used in speech processing in my impression, and maybe we can use it in the social network domain for link prediction, or it can find one's action characteristics by adding a time domain and some adjustment.



沒有留言:

張貼留言