Problem with random variables in Matlab's PCA

In summary, the conversation discusses the process of calculating uncorrelated random variables using PCA. The speaker shares their code for designing a Gaussian kernel and using PCA, but their resulting histogram does not appear Gaussian. The expert summarizer suggests transforming the data using the eigenvectors to obtain the desired results.
  • #1
confused_engineer
39
2
TL;DR Summary
I cannot calculate the random variables associated to the eigenfuctions of the principal component analysis.
Hello.

I have designed a Gaussian kernel as:

[X,Y] = meshgrid(0:0.002:1,0:0.002:1);
Z=exp((-1)*abs(X-Y));

Now, I calculate PCA:

[coeffG, scoreG, latentG, tsquaredG, explainedG, muG]=pca(Z, 'Centered',false);

I can rebuid the original data propperly as defined in the dcumentation https://la.mathworks.com/help/stats/pca.html#bth9ibe-coeff:

Zrec=scoreG*coeffG';

Now, I wish to calculate the uncorrelated random variables which acompany the eigenvectors, columns of scoreG. For that, I do the following:

randvarG=Z*scoreG(:,1); However, the resulting histogram of randvarG does not look gaussian at all.

Any hindsight on what I am doing wrong is very appreciated.
Thanks for reading
 
Physics news on Phys.org
  • #2
.In order to calculate the uncorrelated random variables that accompany the eigenvectors, you need to transform the data using the eigenvectors. This is done by multiplying the original data matrix (Z) with the transposed eigenvectors matrix (coeffG'). The resulting matrix is then used to calculate the uncorrelated random variables. For example, you can use the following code to calculate the uncorrelated random variables:transformedData = Z*coeffG';randvarG=transformedData(:,1);This should give you the expected Gaussian histogram.
 

Related to Problem with random variables in Matlab's PCA

What is the purpose of using PCA in Matlab?

PCA (Principal Component Analysis) is a statistical technique used for dimensionality reduction. It is commonly used in Matlab to identify patterns in high-dimensional data and to reduce the complexity of the data by identifying the most important features or variables. This helps in data visualization, data compression, and feature selection for machine learning algorithms.

What are random variables in the context of PCA in Matlab?

In PCA, random variables refer to the original variables or features in the dataset that are used to construct the principal components. These variables are assumed to be normally distributed and are used to calculate the covariance matrix, which is the basis for finding the principal components.

What is the problem with random variables in Matlab's PCA?

The main problem with random variables in PCA is that they can introduce noise or irrelevant information into the principal components, which can affect the accuracy of the results. This can happen if the variables are not standardized or if there are outliers or missing values in the dataset.

How can I address the problem of random variables in Matlab's PCA?

To address the problem of random variables in PCA, it is important to preprocess the data by standardizing the variables and handling any outliers or missing values. This helps in reducing the impact of noise and irrelevant information on the principal components. Additionally, it is recommended to perform feature selection before applying PCA to only include the most relevant variables in the analysis.

Are there any alternatives to PCA for dimensionality reduction in Matlab?

Yes, there are several alternatives to PCA for dimensionality reduction in Matlab, such as Factor Analysis, Independent Component Analysis (ICA), and Non-negative Matrix Factorization (NMF). Each technique has its own advantages and limitations, so it is important to choose the most suitable method based on the specific dataset and research goals.

Similar threads

  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
2
Views
1K
  • Precalculus Mathematics Homework Help
Replies
3
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
794
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Calculus and Beyond Homework Help
Replies
3
Views
2K
Back
Top