**2.352 pca questions.**

I want to cluster a massive dataset for which I have only the pairwise distances. I implemented a k-medoids algorithm, but it's taking too long to run so I would like to start by reducing the ...

Suppose I do a PCA on a data set and get $k$ principal components that explain 100% of the total variance of the data set.
We can say any observation from the data set can be reconstructed by the ...

Lanczos/Arnoldi/Rietz/CG-like algorithm share the same core strategy... In each, a little miracle appears, most of the Gram-Schmidt inner products are zeroes ! In others words, new direction need only ...

I'm trying to understand the difference between eigenvectors and eigenfaces, are they different names for same concepts?
I ask this because I got confused when I am trying to compute eigenvectors for ...

I have 5 different independent variables, lets name 1 to 5. The 3rd IV has 10 sub-variables under it and 4th IV has 11 sub-variables in it. Whereas other 3 IV's have just two sub-variables (...

I'm (very) new to PCA and confused about how to use the output of a PCA analysis to construct new variables that will be used as predictors in a regression analysis. I've looked at previous questions (...

Often, a variable is considered to be significantly loaded on a PC if its loading value in the loading table is above a cut off value (suppose 0.4 or 0.5 in some published cases). Is there any ...

I was hoping that someone could simply validate or correct my interpretation of Principal Components Analysis. There are a lot of questions on this site about Principal Components analysis--some ...

I conducted a principal component analysis (PCA) with direct oblimin factor rotation in SPSS.
Because by that time I didn't know any better, I used the COMPONENT MATRIX for interpretation. I added ...

According to Wikipedia page on Naive Bayes:
.. Naive Bayes classifiers are a family of simple "probabilistic
classifiers" based on applying Bayes' theorem with strong (naive)
independence ...

Does it make sense to do PCA before carrying out a Random Forest Classification?
I'm dealing with high dimensional text data, and I want to do feature reduction to help avoid the curse of ...

According to Filzmoser et al. 2009, the best way to conduct a principal component analysis for compositional data with outliers is:
using a robust PCA method
and using the isometric log ratio ...

I have got a dataset that represents around 30 characteristics from a few hundred samples. Some of these characteristics could be condensed into 2 PCs as shown by a PCA. Now I would like to take these ...

I was thinking about using PCA to deal with issues of multicollinearity on my dataset. I was wondering how appropriate it is to run PCA on only subsets of variables that seem to have issues of ...

I am reading the research paper “Eigen faces for Recognition”. https://www.cs.ucsb.edu/~mturk/Papers/jcn.pdf. In Figure 2, paper shows the seven Eigen faces having white and black spots on them. What ...

I'm using the principal() from the R package psych.
This is my call:
...

I am getting strange result.
data_scaled = StandardScaler().fit_transform(dat_final)
pca = PCA(.99)
pca.fit(data_scaled)
print(np.cumsum((pca.explained_variance_)))
plt.plot(np.cumsum((pca....

For my PhD thesis I have to do a Principal Component Analysis (PCA). I didn't find it too difficult in Stata and was happy interpreting the results (I know there is a difference between factor and ...

Given a data set where we have different measured features in the same units for each subject. For example, numbers of different cell types (features) in a tumour (subject), where we have n tumours ...

I am trying to optimize a panel regression $G=\beta G+e$. $G \in R^{N\times T}$. $\beta\in R^{N\times N}$ is unknown coefficient, constrained to $diag(\beta)=0$, and reduced rank $rank(\beta)\leq r$. ...

I have a data set with about 1000 proteins (concentration levels) measured at 3 different time points for 10 different patients performing exercise. I would like to identify proteins that changes due ...

I’m using Stata 12.0, and I’ve downloaded the polychoricpca command written by Stas Kolenikov, which I wanted to use with data that includes a mix of categorical ...

Take 20 random points in a 10,000-dimensional space with each coordinate iid from $\mathcal N(0,1)$. Split them into 10 pairs ("couples") and add the average of each pair ("a child") to the dataset. ...

In genome-wide association studies (GWAS):
What are the principal components?
Why are they used?
How are they calculated?
Can a genome-wide association study be done without using PCA?

I am reading a very good (recent) publication in clustering: Kiselev et al., 2017, SC3 - consensus clustering of single-cell RNA-Seq data (if you don't have access, see author PDF).
The algorithm ...

Literature suggests that Antoencoders can be effective in dimensionality reduction, like PCA. PCA can be evaluated based on the variance of each principal component generated. How to do the same for ...

My Problem: I'm trying to classify a data into two groups as A and B based on 25 observations (data point) and 100 features. I used the Gradient Boosting Machine (GBM) to find out which feature has ...

What are the main differences between performing principal component analysis (PCA) on the correlation matrix and on the covariance matrix? Do they give the same results?

I did a survey to know the attitude of customers towards various elements of direct banking channels. I have performed Principal Component Analysis on a set of 70 items and generated five factors. I ...

I am trying to understand some descriptions of PCA (the first two are from Wikipedia), emphasis added:
Principal components are guaranteed to be independent only if the data set is jointly normally ...

I have done a dimensionality reduction of binary labelled data (0,1 labels) from 300 features to 2 features. The plot looks like -
What kind of inferences can I make from this plot? Can I infer -
...

I am reading a document on PCA. I got some idea that PCA is a dimensionality reduction technique. It performs this tasks by shifting the data points in the new space. The centre of points in the old ...

I have m1 rows (samples) and n columns (variables) in matrix A, and m2 rows and n columns in matrix B (n>m1 and n>m2). Normally, I performed PCA on matrix A and got a low-dimensional representation of ...

It seems that a number of the statistical packages that I use wrap these two concepts together. However, I'm wondering if there are different assumptions or data 'formalities' that must be true to use ...

Here is a quote from Bishop's "Pattern Recognition and Machine Learning" book, section 12.2.4 "Factor analysis":
According to the highlighted part, factor analysis captures the covariance between ...

My aim is to perform PCA since I have 76 variables in my dataset. Problem is that most of my variables are highly skewed as you can see in the histogram below.
These variables are proportions ...

I am studying about eigen faces. I have some confusion in understanding the concepts. Initially we have a 255*255 2d array but then we create 1d vectors i.e N^2 * 1 vector. We can do this for M images....

For example, if I have a 64-dimension problem, and 80% of the variance lies within just 12 components.
Is there some mathematical relationship that says something about the number of components that ...

I have checked several sites and found that eigen faces are Eigen Vectors. PCA transforms the faces into a new space such that the hyper plane is in the direction of maximum variance. I have attached ...

I'm trying to figure out how to reproduce in Python some work that I've done in SAS. Using this dataset, where multicollinearity is a problem, I would like to perform principal component analysis in ...

I have a simple problem, which I think must have an easy solution.
I have a vector space say with a 1000 dimensions for each vector.
Now, I have a large number of sample vectors from this vector ...

When we do a textbook PCA decomposition, get a series of eigenvalue $\lambda$ and eigenvector $v$ that fulfill:
$ Av= \lambda v $
we can sort these eigenvalues (together with the corresponding eigen ...

for a prospective study of parameters affecting student's success in graduate school I am looking at a population of about 1500 med students. I have performed a cluster analysis (using Gower's ...

I am doing a FAVAR analysis with 2 steps PCA method. I am confused a bit about the second step. When I get the PCs, how should then I estimate VAR? Just including PCs as other variables and simply ...

I understand that given N dimensional data you can use PCA to construct an N dimensional orthonormal basis that explains 100% of the variance of the original data. However, you can also construct ...

I am working with datasets that consists of mixed type purchase data for a whole year of 2017. My aim is to use PCA/FA for dimension reduction since I have many variables in this dataset and then do ...

Suppose that my data are such that a PCA gives a unique solution for the first principal component up to scaling (e.g. my data do not all lie on a circle, or some such weirdness).
Is it the case that ...

I have percentage data and would like to see if these different variables have an affect on certain factors;
i.e., I have different habitats of an area e.g., improved grassland: 40%, arable: 15%, ...

I am reading a paper and the data passed to a data.frame in R.
On R:
X[60x14] = matrix of predictors (without the dependent) R_xx: Correlation Matrix. evalues and vectors of R_xx
Then the author say:...

Where principal component analysis can potentially be used ?
some examples with some explanation would be great

- r
- dimensionality-reduction
- factor-analysis
- regression
- machine-learning
- svd
- eigenvalues
- clustering
- correlation
- classification
- feature-selection
- multivariate-analysis
- matlab
- time-series
- data-visualization
- variance
- python
- spss
- kernel-trick
- data-transformation
- covariance-matrix
- linear-algebra
- factor-rotation
- categorical-data
- scikit-learn

- As DM, how do I deal with PCs that expect everything in the game to be relevant to the story?
- Why did Dormammu keep his promise?
- Replying to professor who say that I can join his research group as a PhD student next year
- Python Blackjack game
- Roles to play when tailgaiting into a residential building
- How can I avoid railroading in a sci-fi campaign?
- Should I send professors 10 dollars for illegally downloading their books?
- Own implementation of Lazy<T> object
- How can I unscrew this nut with little clearance?
- Shannon Entropy of 0.922, 3 Distinct Values
- How should a big universe be introduced without being boring?
- meaning of さ after noun?
- Making a job offer to a candidate while privately advising them to decline
- Would low-grade levitation be of any use?
- Why does O(n^2) code execute faster than O(n)?
- Employer makes me use what I believe to be an insecure website for HR functions. What to do?
- how would I procedurally texture a marble tile floor without the texture spreading across every tile
- Is this quadrilateral cyclic?
- How can a company recover after a Glassdoor debacle?
- How can I create a mechanism for natural reproductive control in males?
- Is it true that "All spicy food is from Latin America"?
- How is graphics RAM different from system RAM?
- Can the Aztec Empire learn and reuse Conquistador technology?
- Continuous power between multi-gang boxes with 3 way switches