**610 matrix questions.**

in this article can somebody tell me where the heck the Wq, Wk, and Wv matrices in the “Self-Attention in Detail” section come from a little more intuitively and specifically since the article doesn't ...

Apologize in advance for the drawn out question (found at the end). But I want to give a comprehensive picture of the problem.
I have an equation of the following form:
$T_{ij}^{sim}=A_i*O_i^{^{obs}...

Background: Looking for specific/interesting information on the equations within LSTM-Networks, I found the paper LSTM: A Search Space Odyssey. It is frequently mentioned in other articles. To gain a ...

i trying to sum over specific index in a matrix.
For example. If i have the matrix
a = {{1, 2, 3, 3}, {4, 5, 6, 6}, {7, 8, 9, 9}, {1, 5, 9, 7}}
i need to sum the index {2, 1} and {4, 3} , i.e 4 + ...

I would like to know the mathematical concepts behind singular matrices. Matrices that do not have inverses in R throw one of two errors. I have provided some examples of both errors below:
Error in ...

Why matrix is not invertible if A has row 1 + row 2 = row 3?
I'm interested in intuition so please elaborate on the answer please.

I have a proximity matrix from a hiearchical clustering calculation. I have used the Pearson Correlation as my distance metric. Is it valid to test the statistical significance of this value using a ...

I have a bunch of features that I would like to use for classification/machine learning and cluster analysis. Normally I use single point values or transformations of values for features and ...

I am having trouble simulating a matrix which is low rank and sparse (sparse along both rows and columns). One way to simulate a low-rank matrix is by generating a random matrix, then taking SVD and ...

I've noticed lately that a lot of people are developing tensor equivalents of many methods (tensor factorization, tensor kernels, tensors for topic modeling, etc) I'm wondering, why is the world ...

I tried to obtain the score vector (1st derivative of density function w.r.t. parameters) of multivariate normal distribution.
Density function of multivariate normal distribution:
\begin{align*}
f(x)...

Let $X$ be an $D \times N$ matrix. Let $I$ be a $D \times D$ identity matrix. Also let $y$ be a $N \times 1$ column vector. Suppose we are trying to solve $(X X ^T + k I) w = Xy$ for a $D$ dimensional ...

For a matrix $M$ in which entries $m_{a,b}$ denote the number of co-occurrences between elements $a,b$ from two distinct sets $A$ and $B$, how do I identify pairs with a significantly high co-...

I have a correlation matrix in R. Many of the correlations are specified, but there are some that are "NA".
eg,
A __ B __ C
A 100% NA 25%
B NA 100% 50%
C 25% 50% ...

I am putting together a wrapper for a quadratic programming library. I am going through the C example here but I don't understand the indexing used for the matrices.
The relevant excerpt is below, ...

The Fisher Information Matrix is positive semi definite. So, it is not necessarily invertible.
By the Multivariate Central Limit Theorem we know that
$\sqrt{n}(\hat{\theta}−\theta)=S_{n}⟹\mathcal{N}(...

Suppose I have a matrix A corrupted with noise and I am looking for a test that tests the null hypothesis that the matrix A, rank(A)==1 v.s. rank(A)>1. I checked a little the literature and this paper ...

It is well-known (e.g. in the field of compressive sensing) that the $L_1$ norm is "sparsity-inducing," in the sense that if we minimize the functional (for fixed matrix $A$ and vector $\vec{b}$) $$f_{...

Could I use a mean squared error statistical analysis on a set of 1 x 2 matrices? For example, if I had [123 456] as the actual matrix and [111 222] as the predicted matrix, could I use the mean ...

Is there any difference between a Cholesky decomposition and a log-cholesky decomposition? If yes, what is the difference?
In the paper "An R package for dynamic linear models" by Giovanni Petris ( ...

I am doing some calculations on different matrices (mainly in logistic regression) and I commonly get the error "Matrix is singular", where I have to go back and remove the correlated variables. My ...

I have read that there is an equivalency between a linear autoencoder and performing SVD. SVD can be used in collaborative filtering, for example, factorization of a user-movies matrix $\mathbf{M}$ ...

I try to understand more about the update in multivariate fixed point iteration. I saw the examples where the updates have the same variable (the wrt. variable of partial differentiation) on both ...

I have built an Logistic Regression model in R. The class that I want to predict, is very unbalanced (99 vs 1).
My first finding is that this Logistic model does a better job if I train it on a ...

I have a k x n dataset where k equals the number of variables and n equals the number of observations per variable. I know these data are correlated and I would like to whiten them with the ordinary ...

I want to implement the following formula where $C$ is variance-covariance matrix of variables x, y, and z:
$$C = \begin{bmatrix}cov(x,x)&cov(x,y)&cov(x,z)\\cov(y,x)&cov(y,y)&cov(y,z)\...

Principal component analysis (PCA) is usually explained via an eigen-decomposition of the covariance matrix. However, it can also be performed via singular value decomposition (SVD) of the data matrix ...

Why is it impossible to do a PCA in R using principal from psych package without warnings with a matrix, which has more columns ...

Suppose that I have two distance matrices for the same set of items.
By a distance matrix I mean a square matrix whose (i,j)th entry holds the distance (in terms of cosine similarity) between ith and ...

I'm new to both linear algebra and numpy, so please bear with me. I'm taking a course on linear regression, where I learned that we can express our hypothesis as $\theta^TX$ where $\theta$ is our ...

Consider a normal vector $Y \sim \mathcal N(\mu, V)$ with $\mu \in\mathbb R^n$ and $V\in\mathbb R^{n\times n}$. I am interested in the expected value of
$$ {1\over n-1} \left( Y'Y - {1\over n} (\...

I have a what I understand to be a multivariate multiple predictive regression. The y's are different variables and I am attempting to see if these are lead by w at the same time.
I use the standard ...

From my understanding non-negative matrix factorization (NMF) provides a natural way to obtain soft clusters from a non-negative $n$x$m$ data matrix $X$. NMF decomposes $X$ into two non-negative ...

I got a strange question when I was experimenting some convex optimizations. The question is:
Suppose I randomly (say standard normal distribution) generate a $N \times N$ symmetric matrix, (for ...

What model should I use for for 2 independent categorical Games, Publications and dependent interval Review Scores? Each score is made by a publication reacting to elements or genera of the game. For ...

Let's say I have a similarity matrix where every subject is compared to every other subject on some similarity measure (e.g., body movement synchrony). These subjects are divided into two groups, say ...

For two vectors $x \in \{0, 1, 2\}^{n}$ and $y \in \{0, 1, 2\}^{n}$
And I need to generate a matrix $C\in \mathcal{R}^{3\times3}$, where $C_{i,j}$ equals to the number of index $t$, where $x[t]=i$ ...

Let $X$ be a $n \times p$ matrix ($n \geq p$ like a conventional data matrix), with each column j filled by iid draws from a variable $\mathcal{X}_j$. I would like ...

Let $A$ and $B$ be two constant matrices and let $x$ and $ y$ be two random vectors, what is the general formula for $Var(Ax+By)$? I know the formula for when $x$ and $y$ are scalar random variables ...

I've been working about Linear Discriminant Analysis the last weeks, and after reading many articles, I see some aspects of this problem not very clear.
The LDA optimization problem is formulated by ...

I am currently reading Appendix C from Gujarati Basic Econometrics 5e.
It deals with the Matrix Approach to Linear Regression Model.
I am unable to decipher how the author went from equation 7.4.19 ...

I have $q$ $n$-dimensional vectors $\vec y_i$ and a matrix $\hat B$ of shape $n\times m$. I'm looking for $q$ $m$-dimensional vectors $\vec x_i$ such that:
$\vec y_i=\hat B \vec x_i$
each vector $\...

Gram matrix
Let $\bf X$ be a n x p dataset with columns (variables) centered. Then p x p $\bf X'X$ is the total scatter matrix ...

The k-means implemented in scikit-learn precomputes distances but I don't how these distances are used. In its standard version, k-means is known to compute only the distances between the points and ...

I use batch matrix multiply (torch.bmm) very often in my models and I want to write them down in math notation for documentation purposes. Does bmm have a standard ...

A Gaussian process indexed by $T \subseteq \mathbb{R}^d$ is a collection of random variables $\{ X_t : t \in T\}$, for which each finite subset is distributed as a multivariate Gaussian.
Let $G$ be a ...

Given a kernel, can we represent it as a Gram matrix? For example, a linear kernel can be presented (in Python/MATLAB code) in a ...

I am trying to get a little bit back into matrix calculations after I have sucessfully ignored it for about 25 years. Certainly you're laughing at me for this question. Here it comes:
I thought I'd ...

I am looking for some help in organising some analyses. I will describe what I am trying to do with a fictional example and then talk about some of the things I've thought about already.
Example
I ...

I recently implemented a neural network, with backpropagation in a fully matrix approach, as described here, where the whole dataset is used for each backprop: http://ufldl.stanford.edu/wiki/index.php/...

- r
- regression
- linear-algebra
- correlation
- machine-learning
- covariance
- covariance-matrix
- matrix-decomposition
- self-study
- mathematical-statistics
- clustering
- distance
- multivariate-analysis
- matrix-inverse
- pca
- variance
- least-squares
- eigenvalues
- optimization
- matlab
- normal-distribution
- multiple-regression
- probability
- svd
- linear-model

- What is this effect and how can I remove it?
- Just started a postdoc, but it went REALLY bad, REALLY fast. Stay or go?
- No spanish required
- A poorly-designed chess problem
- How to include the current Git commit ID and branch in my document?
- Pathfinder Versatile weapon and Weapon Finesse
- How exactly will the ISS die in 2025?
- 6502 CMP instruction doesn't compare as expected
- What does "\!" aka backslash exclamation mark do?
- Plane is not apearing as blue, in render
- Why are IPv4 addresses running out?
- Min value of a trigonometric expression
- Is there any significance to Bousfield localization in the non-derived context?
- Print a Wavy String Line-by-Line
- How do I tell students at a school I volunteer at to stop flirting with me?
- What was the point of Draco's plan?
- Is it still best practice to avoid using the default ports for SQL Server?
- If reality were frame-rate based, how could we detect it?
- Best way to flatten dataframe based on values on column
- My supervisor is making me work on something that is not my PhD project. What should I do?
- How do we feed a character whose mouth is magically sewn shut?
- Did Neil Armstrong really do this on the Moon?
- Is it possible to take a photo of Elizabeth Tower (Big Ben) from an airplane window?
- Argon Enviorment Spelunking