pca's questions - English 1answer

2.298 pca questions.

I am being asked to apply a statistical technique that I do not think is optimal. I have asked before, but was told my question was vague. I have re-worded it appropriately here. I am being asked to ...

I am working on a project where my predecesor has been analyzing a table of rows by columns of count data. Brands represent the columns, and statements about those brands represent the rows. The cells ...

I would like to perform a PCA on continuous data and binary data. Actually, I want to submit 8 continuous variables and 1 binary variable to a PCA. Is it a problem to include one binary variable ? ...

As far as I can see from basic dimensionality techniques like PCA, EFA, ICA etc. the problem is reduced to ($X$ is dataset with observations as rows and $D$ columns; $A$ has $d<D$ columns) : $$X \...

Here is the working code:# the eigen vectors eigen(cov(mtcars))$vectors # the principal components prcomp(mtcars, scale. = T)The first five eigenvectors that ...

From the point of view of Machine Learning, what is the purpose of Factor Analysis (FA)? I used to think that it is a dimensionality reduction, because it is so connected to PCA, but when I read in ...

The Eckart-Young-Mirsky theorem is sometimes stated with rank $\le k$ and sometimes with rank $= k$. Why? More specifically, given a matrix $X \in \mathbb{R}^{n \times d}$, and a natural number $k \...

I have done a lot of reading about PCA using the covariance and correlation matrix. I understand that if the variables are measured on different scales i.e. number of leaves, height in m, width in mm, ...

I have an old dataset, for which i perform PCA, show its significance (non-randomness). Now I have new data (obtained in a bit different conditions). And I want to say that independently from ...

In this blog figure 4 shows that the principal components of a random walk are sinusoidal with increasing frequency for decreasing eigenvalue. Is there an intuitive way of understanding this? If I ...

I have done a PCA analysis on genes expressed in cells under different stimulations, and retrieved the eigenvectors for a number of components. My question is can I use these to determine which ...

From here (slide 23) and here (page 5, 4th slide) I understand that it is said that PPCA (probabilistic PCA) is rotational invariant. It can be written as follows: $$\text{PPCA}(X) = [\mu, W, \sigma^...

So I have a dataset of measurements (lengths, surface areas, volumes...) from 3 species from 2 different environments, with 3 individuals per species. Can be summarised like that: ...

At the end of the PCA algorithm one gets a $D\times d$ matrix $U$ such that $z=U^Tx$ (here $x$ is $D$-dimensional and $z$ is $d$ dimensional with $d\leq D$). In multiple sources on the Web I found ...

I am trying to reproduce the results of the PCA with rotations from SPSS in python. But there is some information I didn't find in their documentation. I am trying to do the PCA like in the FACTOR ...

Linear Discriminant Analysis (LDA) is a supervised dimensionality reduction algorithm, so from d dimensional data (input data) we want to obtain p new dimensions, where d>>p. It is the same principle ...

PCA looks for factors in data that maximize explained variance. Canonical correlation analysis (CCA), as far as I understand, is like an PCA but looks for a factors that maximize cross covariance ...

I have four groups of individuals which have a certain illness and are part of a clinical trial: Taking drug, illness gets better Taking drug, illness is not getting better Taking placebo, illness ...

I have a large collection of documents, and would like to be able to select a subset of them that is representative of the whole. I have searched this question on here, Stack Overflow, and Google, ...

I'm comparing the Support Vector Machines (SVM) formulation of linear PCA with kernel PCA. I know that in linear PCA, the maximum number of principal components is equal to the dimension of the input ...

When using Kernel pca and denoising handwritten images, basically every number gets denoised very well, and just with 1 PC, we have a clean denoised image. Yet, the number 7 gets somewhat an extra ...

The PCA optimization problem is known as $$ \max_{U \in \mathbb{R}^{d\times r}, U^TU = I} tr(U^T\Sigma U), $$ where $\Sigma$ is a covariance matrix of a dataset $\{x_1,\dots,x_n\} \subset \mathbb{R}^d$...

I am trying to cluster by dataset with mixed features using k-means. As a distance metric, I am using Gower's Dissimilarity. I want to ask 2 things: -Is k-means an appropriate algorithm that can ...

If I know that a multivariate dataset has a piecewise-linear data generating process with known knots (or breakpoints), then what is the appropriate kernel function to use in Kernel-PCA? For example, ...

According to this, the fundamental difference between PCA and FA can be illustrated via the following image: So, the direction of arrows changes. According to this answer and a few others: ...

Is it possible (or does it make sense) to check for correlation after varimax rotation, since varimax assumes that there aren't any correlation between factors (or components)?

In PCA and Factor Analysis, there is the term loadings, which refers to factor loadings (onto the original variable). Does the term (original) variable loading (onto the latent factor) exist?

I am using the R implementation of robust PCA here for anomaly detection. I have a vector of time series data, and a vector of dates. The algorithm works fine when the length of the vector is a ...

The psych package in R provides the root mean square of the residuals (RMSR) when using the principal (principal components analysis) or fa (factor analysis) functions. How could I calculate the ...

I have a dataset of 93 records and 45 radiomics variables from various CT scans. I wanted to check if age and sex could be classified by the variables so I made a new variable with both sex and age. I ...

I am trying to construct a financial stress index. I have selected 12 variables that I use as indicators of financial market stress. These are all time series of daily data (VIX, credit spreads, etc.)....

Let's say I use PCA to reduce the dimensionality of my dataset before building a linear regression model. See R example: ...

I have a data set of 60 sensors. I wish to decrease the number of sensors used during an experiment, and use the remaining sensor data to predict the removed sensors. If I were to run principal ...

I was wondering if there is precedent for centering to the median and scaling to the median absolute deviation (MAD) (as opposed to arithmetic mean and standard deviation) prior to conducting PCA. I ...

I have two matrices a, b of dimensions (100x500), (100x15000) and I am trying to find associations between sets of variables in both matrices. When I perform principal component analysis on matrix a,...

Given a design matrix that consists of N (>100) variables and J (>100) observations (the data, itself, is actual time-series): ...

I'm Having a ML problem where my data set contains 80 features labelled into 3 groups (0, 1, -1). I want to plot the data on a 2D surface to see how "close" (similar) data with ...

I am working on a personal project, and I want to use Statsmodels' PCA on a dataset. The ultimate goal is to then perform a linear regression and evaluate its prediction. I know scikit-learn may be ...

I conducted a principal component analysis (PCA) with direct oblimin factor rotation in SPSS. Because by that time I didn't know any better, I used the COMPONENT MATRIX for interpretation. I added ...

I’m using Stata 12.0, and I’ve downloaded the polychoricpca command written by Stas Kolenikov, which I wanted to use with data that includes a mix of categorical ...

I have a data set that has been partitioned into four clusters by executing a clustering algorithm that used principal components from a principal component analysis (PCA). I then make a contingency ...

The definition of multicollinearity is: Given a set of $N \times 1$ predictors $X = (x_1, x_2, \cdots, x_m)$, if $$x_j = \sum_{i \neq j}a_ix_i$$ then we say there is multicollinearity among the ...

This is a follow-up question from the post: PCA on correlation or covariance? The accepted answer quotes: You tend to use the covariance matrix when the variable scales are similar and the ...

Can anyone help me with this part? I don’t understand why d vector should be eigen vector of covariance matirx of X nor the generalization part

I am using a Neural Networks based classifier to run a classification for my data in n-dimensional. Then I thought it may be a good idea to run dimension reduction like PCA for my data at first, and ...

I'm planning to conduct a study for clustering a set of observations. For the beginning, I'm planning to include more than 50 variables to my study. Therefore I must apply a relevant statistical ...

I am reading a documentation from Matlab (https://www.mathworks.com/help/stats/quality-of-life-in-u-s-cities.html#d119e60095) on performing PCA which makes this claim: When all variables are in ...

for a prospective study of parameters affecting student's success in graduate school I am looking at a population of about 1500 med students. I have performed a cluster analysis (using Gower's ...

Literature suggests that Antoencoders can be effective in dimensionality reduction, like PCA. PCA can be evaluated based on the variance of each principal component generated. How to do the same for ...

I have 3 trials each on 87 animals in each of 2 contexts (some missing data; no missing data = 64 animals). Within a context, I have many specific measures (time to enter, number of times returning ...

Related tags

Hot questions

Language

Popular Tags