Principal component analysis and greatest variation

In summary, the conversation discussed the steps to find the sample mean and covariance matrix for a given table. It also touched on performing principal component analysis to find a size index that explains the greatest variation. The final steps involve finding the eigenvectors and eigenvalues of the covariance matrix and creating a feature matrix. Ultimately, the size index can be determined by choosing the eigenvector with the highest eigenvalue, which in this case would be the vector for the x component.
  • #1
visharad
54
0
Problem - Given the following table
x y
15 50
26 46
32 44
48 43
57 40

a) Find the sample mean
b) Find the covarince matrix
c) Perform principal component analysis and find a size index which explains the greatest variation.

My attempt
a) n = 5
xbar = Sum(x)/n = 35.6
ybar = Sum(y)/n = 44.6
Sample mean = [35.6 44.6]

b) I calculated Var(X) = 1/n * Sum(X-Xbar)^2 = 228.24
Var(Y) = 1/n * Sum(Y-Ybar)^2 = 11.04
COV(X,Y) = 1/n * Sum[(X-Xbar)(Y-Ybar)] = -48.16

I made a 2x2 matrix in which principal diagonal elements are Var(X) and Var(Y). Each of the other two elements equals COV(X, Y)

Please see if there is any mistake in my solutions to parts a and b.
I have no idea how to answer part c. Could you help?
 
Last edited by a moderator:
Physics news on Phys.org
  • #2
visharad said:
Problem - Given the following table
x y
15 50
26 46
32 44
48 43
57 40

a) Find the sample mean
b) Find the covarince matrix
c) Perform principal component analysis and find a size index which explains the greatest variation.

I have no idea how to answer part c. Could you help?

To get your covariance matrix, I assume you subtracted the means of x and y. Once you have this matrix, you need to find the two eigenvectors and their eigenvalues. The vector with largest eigenvalue will be your principle component vector. From your data, I can tell which one that is. Can you? Note the eigenvectors are mutually orthogonal, which is the goal of PCA. You want to extract the independent vectors that describe the data. The data can be compressed by simply eliminating any vectors whose eigenvalues are too small to make much difference on your analysis. You don't need to do this here IMO.

The next step is to create a feature matrix where your eigenvectors are the row vectors. The transpose of this is your solution. Note that x accounts for over 95% of the total variance in your two component system, so I think your teacher would want you to discard y and retain x in a reduced 1 component system (your size index). I just wanted to take you through the final steps as if you had more than one component.
 
Last edited:
  • #3
SW VandeCarr said:
To get your covariance matrix, I assume you subtracted the means of x and y. Once you have this matrix, you need to find the two eigenvectors and their eigenvalues. The vector with largest eigenvalue will be your principle component vector. From your data, I can tell which one that is. Can you? Note the eigenvectors are mutually orthogonal, which is the goal of PCA. You want to extract the independent vectors that describe the data. The data can be compressed by simply eliminating any vectors whose eigenvalues are too small to make much difference on your analysis. You don't need to do this here IMO.

The next step is to create a feature matrix where your eigenvectors are the row vectors. The transpose of this is your solution. Note that x accounts for over 95% of the total variance in your two component system, so I think your teacher would want you to discard y and retain x in a reduced 1 component system (your size index). I just wanted to take you through the final steps as if you had more than one component.

Don't you mean the eigenvectors/eigenvalues of MTM , where M is the matrix with the x,y entries? Sorry, I am kind of rusty; I have not seen this in a while.
 
  • #5
Bacle2 said:
Don't you mean the eigenvectors/eigenvalues of MTM , where M is the matrix with the x,y entries? Sorry, I am kind of rusty; I have not seen this in a while.

It's the variance-covariance matrix (usually just called the covariance matrix). The trace of this matrix is the total variance. What matrices were you referring to? The new matrices are constructed from the eigenvectors obtained from the original covariance matrix.
 
Last edited:
  • #6
Yes, I was referring to the variance-covariance matrix. The way I remembered it, we did m tests on n subjects, and then calculated mean, then adjusted/normalized, then we calculated the variance-covariance matrix, which we then applied the whole process to. Thanks for the refresher.
 

Related to Principal component analysis and greatest variation

What is principal component analysis (PCA)?

Principal component analysis (PCA) is a statistical method commonly used in data analysis to reduce the dimensionality of a large dataset. It does this by finding the most important variables, also known as principal components, that explain the maximum amount of variation in the data.

How does PCA help with data analysis?

PCA helps with data analysis by simplifying the dataset and making it easier to interpret. By reducing the number of variables, it allows for a more efficient analysis, as well as the identification of patterns and relationships between variables that may not have been apparent before.

What is the greatest variation in PCA?

The greatest variation in PCA refers to the principal component that explains the most variation in the data. This component is also known as the first principal component and is used to represent the most significant patterns in the data.

How is the greatest variation calculated in PCA?

The greatest variation in PCA is calculated using the eigenvectors and eigenvalues of the covariance matrix of the data. The eigenvector with the highest corresponding eigenvalue represents the first principal component, which explains the most variation in the data.

What are some common applications of PCA?

PCA has a wide range of applications in various fields, including finance, biology, psychology, and computer vision. Some common applications include data compression, feature extraction, and pattern recognition.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
17
Views
2K
  • Calculus and Beyond Homework Help
Replies
2
Views
1K
  • Calculus and Beyond Homework Help
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
  • Calculus and Beyond Homework Help
Replies
1
Views
1K
  • Calculus and Beyond Homework Help
Replies
3
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
9
Views
2K
  • Precalculus Mathematics Homework Help
Replies
5
Views
4K
  • Calculus and Beyond Homework Help
Replies
1
Views
1K
Replies
5
Views
1K
Back
Top