I know from previous studies that

$Var(A+B) = Var(A) + Var(B) + 2 Cov (A,B)$

However, I don't understand why that is. I can see that the effect will be to 'push up' the variance when A and B covary highly. It makes sense that when you create a composite from two highly correlated variables you will tend to be adding the high observations from A with the high observations from B, and the low observations from A with the low observations from B. This will tend tend to create extreme high and low values in the composite variable, increasing the variance of the composite.

But why does it work to multiply the covariance by *exactly* 2?

byouness 05/15/2018.

**Simple answer:**

The variance involves a square: $$Var(X) = E[(X - E[X])^2]$$

So, your question boils down to the factor 2 in the square identity:

$$(a+b)^2 = a^2 + b^2 + 2ab$$

Which can be understood visually as a decomposition of the area of a square of side $(a+b)$ into the area of the smaller squares of sides $a$ and $b$, in addition to **two** rectangles of sides $a$ and $b$:

**More involved answer:**

If you want a mathematically more involved answer, the covariance is a bilinear form, meaning that it is linear in both its first and second arguments, this leads to:

$$\begin{aligned} Var(A+B) &= Cov(A+B, A+B) \\ &= Cov(A, A+B) + Cov(B, A+B) \\ &= Cov(A,A) + Cov(A,B) + Cov(B,A) + Cov(B,B) \\ &= Var(A) + 2 Cov(A,B) + Var(B) \end{aligned}$$

In the last line, I used the fact that the covariance is symmetrical: $$Cov(A,B) = Cov(B,A)$$

**To sum up:**

It is two because you have to account for both $cov(A,B)$ and $cov(B,A)$.

Acccumulation 05/15/2018.

The set of random variables is a vector space, and many of the properties of Euclidean space can be analogized to them. The standard deviation acts much like a length, and the variance like length squared. Independence corresponds to being orthogonal, while perfect correlation corresponds with scalar multiplication. Thus, variance of independent variables follow the Pythagorean Theorem:

$var(A+B) = var(A)+var(B)$.

If they are perfectly correlated, then

$std(A+B) = std(A)+std(B)$

Note that this is equivalent to

$var(A+B) = var(A)+var(B)+2\sqrt{var(A)var(B)}$

If they are not independent, then they follow a law analogous to the law of cosines:

$var(A+B) = var(A)+var(B)+2cov(A,B)$

Note that the general case is one in between complete independence and perfect correlation. If $A$ and $B$ are independent, then $cov(A,B)$ is zero. So the general case is that $var(A,B)$ always has a $var(A)$ term and a $var(B)$ term, and then it has some variation on the $2\sqrt{var(A)var(B)}$ term; the more correlated the variables are, the larger this third term will be. And this is precisely what $2cov(A,B)$ is: it's $2\sqrt{var(A)var(B)}$ times the $r^2$ of $A$ and $B$.

$var(A+B) = var(A)+var(B)+MeasureOfCorrelation*PerfectCorrelationTerm$

where $MeasureOfCorrelation = r^2$ and $PerfectCorrelationTerm=2\sqrt{var(A)var(B)}$

Put in other terms, if $r = correl(A,B)$, then

$\sigma_{A+B} = \sigma_A^2+\sigma_B^2+ 2(r\sigma_A)(r\sigma_B)$

Thus, $r^2$ is analogous to the $cos$ in the Law of Cosines.

Bananin 05/16/2018.

I would add that what you cited is not the *definition* of $Var(A+B)$, but rather a *consequence* of the definitions of $Var$ and $Cov$. So the answer to why that equation holds is the calculation carried out by **byouness**. Your question may really be why that makes sense; informally:

How much $A+B$ will "vary" depends on four factors:

- How much $A$ would vary on its own.
- How much $B$ would vary on its own.
- How much $A$ will vary as $B$ moves around (or varies).
- How much $B$ will vary as $A$ moves around.

Which brings us to $$Var(A+B)=Var(A)+Var(B)+Cov(A,B)+Cov(B,A)$$ $$=Var(A)+Var(B)+2Cov(A,B)$$ because $Cov$ is a symmetric operator.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

- Variance-covariance matrix of the errors in linear regression
- Difference in Variance of factor scores for supplementary and active observations PCA
- Machine learning: intuition behind perceptron learning algorithm
- Machine learning: intuition behind perceptron learning algorithm
- Understanding that $\operatorname{COV}(X,X) = \operatorname{VAR}(X)$ intuitively
- Alternative measure to the “explained variance” or “r-square” to explain the variance?
- Variance of sum of square (not independent) random vectors
- Why does the variance change, changing the sign of the random variable?
- Variance Formula
- Does multiplying two random variables with positive covariance increase variance?

- What vehicle would be the best one to start a >1000 km travel in a post apocalyptic zombie situation?
- What makes Random Number Generators so fragile?
- Javascript replacing a dot instead removes everything
- How do I motivate my partner to study a certain field?
- What happens if you swab positive for a drug sample at Canadian airports even if you don’t have anything on you?
- Stay longer in the US than what I told CBP
- If a large space ship crashes on a planet, what could kill passengers without obliterating the ship?
- Can I cut my Probation Period short?
- Can multiplication of two primes be seen as a strong cipher?
- Plausible reason why my time machine can only go to certain points in time?
- How does tilting a bike make it turn sharper?
- Is there a gender neutral equivalent of “manspreading”?
- What is an iterator in general?
- Implementing a stack
- Confused between "echo command | ssh server" and "ssh server command"
- Could someone explain how to form the genitive partitive in German? (Einer meiner wording)
- Would an executable need an OS kernel to run?
- How to break a 2x1 meter glass in a safe and handy way?
- How did words like align get a g?
- Does the no true Scotsman fallacy apply to anti Stalinist etc. communism?
- The Crappiest Riddle. (Parental advisory: vulgar language)
- Differences between 2 pointers of a template
- Are "Midas" swords useless for warfare?
- Maxima & Minima