Mean centering of the covariance matrix in PCA

In summary, PCA (principal component analysis) is a method for reducing the dimensionality of a dataset by finding a new set of orthogonal variables that best explain the variability in the data. Mean-centering the variables in the original matrix is recommended to get the best results, as the covariance matrix and eigenvectors will not change. However, this may not always be necessary as the overall effect is to decrease the largest eigenvalue. The transformation can also be understood in terms of the singular value decomposition of the data matrix.
  • #1
physical101
47
0
Hi all,
I thought I posted this last night but have received no notification of it being moved or can't find it the thread I have started list.

I was wondering if you could help me understand how PCA, principal component analysis, works a little better. I have read often that it to get the best results using PCA you should mean centre the variables within your matrix first. I thought however that one method of calculating the principal components was the covariation matrix method where the eigenvalues and eigenvectors gives you the direction of the greatest variance within the matrix. I also assumed that the elements of the covariation matrix was calculated by using the following formula:

Cov=sum([Xi-Xmean][Yi-Ymean])/N-1

If i subtracted the mean from the original data matrix would it matter because I would get the same distribution regardless using the above calculation.

I hope some one can help

Thanks
 
Physics news on Phys.org
  • #2
well I've been thinking about it all day and have thought that in the original transformation of the data by multiplication with the eigenvalue to carryout a linear transform - in the uncentered cases this transform will be done with a vector coming from the origin at zero in all dimensions. In the mean centered cases the vector will be coming from the centroid of the data, now at zero. I can only imagine that this will affect the ransformation in such away that it better explains the varience within the dataset as the covarience matrix won't change and hence neither will the eigenvectors. do you think i am getting close?
 
  • #3
I think mean-subtraction is recommended because Var[X]<=E[X^2]. For the n-d case I find it easier to understand PCA in more general terms using the (reduced) singular value decompostion where we write the data matrix X as
X = Y+PDQ'
where Y is some predefined matrix (e.g. row-repeated column means), D is square diagonal and P and Q are rectangular matrices with P'P=I=Q'Q. The columns of P and Q are the eigenvectors corresponding to the non-zero eigenvalues of XX' and X'X respectively, and the diagonal of D has the square roots of the non-zero eigenvalues. This representation works for both low-dimensional many-data and high-dimensional few-data problems.

I'm not sure if there is a simple relation between the eigenvectors for Y=0 vs Y=Xbar, but it should be possible to show that the largest eigenvalue decreases.
 

Related to Mean centering of the covariance matrix in PCA

1. What is mean centering in PCA?

Mean centering in PCA refers to the process of subtracting the mean value from each variable in a dataset before performing principal component analysis (PCA). This is done to remove any biases in the data and to focus on the variance of the variables.

2. Why is mean centering important in PCA?

Mean centering is important in PCA because it helps to reduce the influence of variables with large mean values on the principal components. It also allows for a better interpretation of the resulting principal components, as they will be centered around the mean of the data.

3. How is mean centering different from standardization in PCA?

Mean centering and standardization are two different methods used to preprocess data for PCA. While mean centering simply subtracts the mean value from each variable, standardization also divides each variable by its standard deviation. This ensures that all variables have the same scale and are equally weighted in the PCA.

4. Can you perform PCA without mean centering the data?

Yes, it is possible to perform PCA without mean centering the data. However, this may result in principal components that are heavily influenced by variables with large mean values. Mean centering helps to reduce this bias and results in more meaningful principal components.

5. Are there any downsides to mean centering in PCA?

One potential downside of mean centering in PCA is that it assumes that the data is normally distributed. If the data is not normally distributed, mean centering may not be the most appropriate preprocessing technique. Additionally, mean centering may also affect the interpretation of the principal components, as the centered values may not be directly comparable to the original data.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Programming and Computer Science
Replies
3
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
11
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
797
  • MATLAB, Maple, Mathematica, LaTeX
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
940
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
7K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
897
Back
Top