This is the repository of the formulas we use along other articles. The purpose of this document is to have in a summarized way, the mathematical operators and terms along with their notation and some useful relationships between them. We won't demonstrate how to obtain them. That is out of the scope of this document and there are many resources on the web about that.

We use the "⋅" operator to denote multiplication. This way, if we see the term "abxy", in a math expression, we know it represents a single entity and not —for example— the multiplication of the terms "ab" by "xy", in which case we will use "ab⋅xy".

In these formulas, the operator "==" is used mainly to show two alternate ways to express the same thing.

Statistics Formulas

In these formulas, the symbols with bold typeface (e.g. X) represent random variables and the symbols with regular (non-bold) typeface, represent non-random variables (e.g. "c").

The Expected Value

We use the expression Eva( X ) to denote the Expected Value of the random variable X. The symbol μx represents the value resulting from that expression.

Expected Value

Formula 1.
Eva symbols.

The Expected Value of a non-random variable —for example a constant— is the value of that non-random variable itself:

Expected Value of non-random variable

Formula 2. Eva of a non-random variable.

The Expected Value of the product of two independent random variables is equal to the product of those variables Expected values:

Expected Value of the product of independent variables

Formula 3. Eva of the product of independent variables.

If the variable is scaled by a constant (a non-random variable) the Expected Value gets scaled by that constant. This is a direct consequence of the two previous formulas.

Expected Value of the product of independent variables

Formula 4. Eva of a scaled variable.

If two random variables are correlated, it means the value of one of them, in some degree, determines or influences the value of the other one. The Covariance is a measure of how much those variables are correlated.

For example, smoking is correlated with the probability of having cancer: the more you smoke, the greater the likelihood you eventually will get cancer.

The Expected Value of the product of two correlated random variables is equal to the product of those variables Expected values plus the Covariance of them:

Expected Value of the product of two correlated variables

Formula 5. Eva of the product of two correlated variables.

Notice how the formula 3 is a particular case of the previous formula: when the random variables are independent, the Covariance term is zero and goes away.

The Expected Value of the sum of any random variables is equal to the sum of the Expected Values of those variables. This is true regardless if those random variables are independent or not.

Expected Value of the sum of variables

Formula 6. Eva of the sum of two random variables. Regardless they are independent or not.

The consolidation of the previous relationships results in the Linearity property of the Expected Value operator (Linearity of Expectation), which is true regardless the random variables are independent or not.

Expected Value Linearity

Formula 7. Linearity of expectation.

The Covariance

We use the expression Cov( X, Y ) to denote the Covariance of the two random variables X and Y.

Covariance

Formula 8. Covariance symbols.

The Covariance is a measure of how much the values of each of two correlated random variables determines the other. If both variables change in the same way (e.g. when —in general— one grows the other also grows), the Covariance is positive, otherwise it is negative (e.g. when one increases the other decreases).

If the variables are independent the Covariance is zero. However, the Covariance must be considered zero or non zero from a statistical point of view, which means there is some fact leading us to believe the Covariance value can not be justified on the basis of chance alone.

Covariance

Formula 9. Evaluation of Covariance from raw data.

If the variables are independent, their Covariance is zero. This can clearly be seen in the following formula, where if both variables were independent to each other, both terms in the subtraction become the same value and the Covariance becomes to zero:

Covariance and Eva

Formula 10. Evaluation of the Covariance from the Expected values.

However, the converse of the previous rule is not alway true: If the Covariance is zero, it does not necessarily mean the random variables are independent.

For example, if X is uniformly distributed in [-1, 1], its Expected Value and the Expected Value of the odd powers (e.g. X³) of X result zero in [-1, 1]. For that reason, if the random variable Y is defined as Y = X², clearly X and Y are correlated. However, their Covariance is numerically equal to zero:

Covariance zero for correlated variables

Formula 11. If (independent variables) ==> (Covariance == 0). However, (Covariance == 0) does not necessarily imply those variables are independent.

The Covariance is commutative.

Covariance commutativity

Formula 12. Covariance commutative property.

The Covariance is invariant to the displacement of one or both variables.

Covariance displacement

Formula 13. The Covariance is invariant to the displacement of the variables.

The Covariance is scaled by each of the scales of X and Y.

Covariance scale

Formula 14. The scale of each variable, scales up the Covariance.

Consolidating the two last equations:

Covariance scale plus displacement

Formula 15. Covariance non-linearity property.

The Variance

The Variance is the special case of the Covariance when both variables are the same.

Variance

Formula 16. Variance symbols.

The Variance measures how much the values of a random variable are spread out. In other words, how much they are different among them.

Variance Computation

Formula 17. Evaluation of Variance from raw data.

We can compute the Variance from the Expected Values of X and its square (X²).

Variance from Evas

Formula 18. Evaluation of Variance from Expected values.

The Variance of a non-random variable is zero.

Variance of non-random variable

Formula 19

The Variance is invariant to the displacement of the variable.

Variance of displaced variable

Formula 20.
The displacement of the variable is transparent for the Variance.

If the variable is scaled by a constant (a non-random variable) the Variance gets scaled by the square of that constant.

Variance of scaled variable

Formula 21.
The variance is also scaled by the constant but cuadratically.

The Variance of the sum or difference of two correlated random variables is:

Variance of the sum of variables

Formula 22.
The variance of the sum (and the difference) of two random variables.

If the variables are independent, the Covariance is zero and the Variance of the sum or the difference is the same:

Variance of the sum of independent variables

Formula 23.
When the covariance is zero the variance of the sum is equal to the variance of the difference of two random variables.

The Variance of the product of two correlated random variables:

Variance of the product of two random variables

Formula 24. Variance of the product of correlated variables.

Using this relationship between the variance and the expected values we can say "Eva(X²) = Var(X) + Eva²(X)". Using that in the previous formula we get:

Variance of the product of variables - Expanded

Formula 25. Variance of the product of correlated variables.

The Variance of the product of two independent random variables comes from the previous formulas, knowing that in such case $(\sigma_{X,Y} = \sigma_{X^2,Y^2} = 0)$:

Variance of the product of independent variables

Formula 26. Variance of the product of two independent variables.

Another way to get the product of two independent random variables is through this beautiful equation:

Variance of the product of independent variables

Formula 27. The product of two independent variables.

The Standard Deviation

We use the expression StdDev(X) to denote the Standard Deviation of the random variable X.

Standard Deviation

Formula 28.
Standard Deviation symbols.

The Standard Deviation is the square root of the Variance.

Standard Deviation

Formula 29. The standard deviation is (by definition) the square root of the Variance.

The Correlation Coefficient

The Correlation Coefficient is a measure of dependence between two —correlated— random variables.

Standard Deviation

Formula 30. The correlation coefficient.

Analytic Relationship between two Numerically Correlated Functions

Sometimes you can measure two output variables, from an experiment, which are strongly correlated because they come from two functions of a common variable and what you wanna know these two functions. If you build a correlation formula for the relationship between those output variables, with the help of data from many experiments, you can find a good approximation of that correlation formula. However, the question is: How this correlations formula can shed us light about the functions producing those output variables?

Let's suppose, we have $(f(x))$ and $(g(x))$, both evaluated at $(x_i)$ in $({x_1, x_2... x_n})$, which means we have the points $((f(x_i), g(x_i)))$. Now we find a correlation over those points. However, the correlation formula is, of course, between $(f())$ and $(g())$. How can we relate this correlation with $(f(x))$ and $(g(x))$?

To solve this question we will think we are plotting $((z, h(z)))$, then clearly:

\begin{align*} (z, h(z)) &= (f(x_i), g(x_i))\\ \\ z &= f(x) \\ h(z) &= g(x) \\ \\ x &= f^{-1}(z) \\ \therefore h(z) &= g(f^{-1}(z)) \\ \end{align*}

Formula 31. Analytic Correlation between two functions.

Linear case

For example, let's see the case when both, f() and g(), functions are linear:

\begin{align*} ({\bf z}, h({\bf z})) &= (f({\bf x}), g({\bf x})) \\ \\ g({\bf x}) &= g_1 \cdot {\bf x} + g_0 \\ f({\bf x}) &= f_1 \cdot {\bf x} + f_0 \\ \\ {\bf x} &= f^{-1}({\bf z}) \\ {\bf x} &= \frac{{\bf z} - f_0}{f_1} \\ \\ h({\bf z}) &= g(f^{-1}({\bf z})) \\ h({\bf z}) &= g\left(\frac{{\bf z} - f_0}{f_1}\right) \\ h({\bf z}) &= g_1 \cdot \left(\frac{{\bf z} - f_0}{f_1}\right) + g_0 \\ \therefore h({\bf z}) &= \frac{g_1}{f_1} \cdot {\bf z} + \left( g_0 - \frac{g_1}{f_1} \cdot f_0 \right) \tag{32} \end{align*}

Formula 32. Analytic Correlation between two linear functions.

This means that when we correlate two linear functions, the resulting curve is also a line, having as slope the one of $(g(x))$ divided by the one of $(f(x))$. We also know the correlated line crosses the $(h)$ axis (when $(z = 0)$) at $(h = (g_1/f_1) \cdot z + ( g_0 - (g_1/f_1) \cdot f_0 ))$.

We can "validate" this case very easily. Let's suppose our linear functions are:

\begin{align*} f(x) &= 2 \cdot x + 13 \\ g(x) &= 3 \cdot x + 5 \end{align*}

If we plot these functions for $(x)$ in $({1, 2})$ we get $((f, g) = { (15, 8), (17, 11) })$. Knowing those points belong to a line, we can find the slope as $(\Delta y/ \Delta x)$. If our previous deduction is correct, this slope should be equal to $(g_1/f_1 = 3/2)$:

\begin{equation*} slope = \Delta y/\Delta x = (11 - 8) / (17 - 15) = 3/2 \end{equation*}

Which is exactly what we thought!

Also according to our last formula, the interception of the correlated line with $(z = 0)$ should occur when h is:

\begin{align*} h(0) &= (g_1/f_1) \cdot 0 + (g_0 - (g_1/f_1) \cdot f_0) \\ h &= 5 - (3/2) \cdot 13 \\ h &= -14.5 \end{align*}

We can find the interception with $(z = 0)$ from the correlated line data, knowing it contains the point (15, 8):

\begin{align*} (h - 8) &= (3/2) \cdot (z - 15) \\ (h - 8) &= (3/2) \cdot (0 - 15) \\ h &= 8 - (3/2) \cdot 15 \\ h &= -14.5 \end{align*}

Again the result is consistent to our deduction!

Quadratic case

When $(f())$ is linear but $(g())$ is quadratic, we get:

\begin{alignat*}{2} f({\bf x}) &=& &f_1 \cdot {\bf x} + f_0& \cr g({\bf x}) &=&~~g_2 \cdot {\bf x}^2 ~+~ &g_1 \cdot {\bf x} + g_0& \cr \end{alignat*}

\begin{align*} {\bf z} &= f({\bf x}) \cr {\bf x} &= \frac{{\bf z} - f_0}{f_1} \cr \cr h({\bf z}) &= g_2 \cdot \left(\frac{{\bf z} - f_0}{f_1}\right)^2 + g_1 \cdot \left(\frac{{\bf z} - f_0}{f_1}\right) + g_0 \cr \end{align*}

Developing the last equation and summarizing, we finally get:

$$ ({\bf z}, h({\bf z})) = (f({\bf x}), g({\bf x})) $$

\begin{alignat*}{2} f({\bf x}) &=& &f_1 \cdot {\bf x} + f_0& \cr g({\bf x}) &=&~~g_2 \cdot {\bf x}^2 ~+~ &g_1 \cdot {\bf x} + g_0& \cr \end{alignat*}

$$ \begin{align*} \therefore h({\bf z}) &= z_2 \cdot {\bf z}^2 + z_1 \cdot {\bf z} + z_0 \cr \cr z_2 &= \frac{g_2}{f_1^2} \cr z_1 &= \frac{g_1}{f_1} - 2 \cdot \frac{f_0 \cdot g_2}{f_1^2} \cr z_0 &= \frac{g_2}{f_1^2} \cdot f_0^2 - \frac{g_1}{f_1} \cdot f_0 + g_0 \cr \end{align*}\tag{33} $$

Formula 33. Analytic Correlation of a quadratic against a linear function.