1 Introduction to Bayesian Inference

Introduction

In the Bayesian paradigm the model’s parameters are random variables whose different possible values make the model produce data that is more or less likely to come from the same distribution as the observed data.

This post develops the Bayesian inference concepts through their application in a practical example, solving a simple regression model numerically, in a fully Bayesian way. We will match the theoretical concepts with their practical implementation, helping you to grasp their meaning entirely.

The eCompo dataset

We will use a dataset named “eCompo” for our running example. These are a series of readings of the “\(x\)” response emitted by an electronic component after being subjected to different stimulus values “\(y\)” in a test. We want to know the two parameters \(\beta\) and \(\sigma_\epsilon\) in the model shown below: \[\begin{align} y &= \beta \cdot x + \epsilon \\ \epsilon &\sim \mathcal{N}(0, \sigma_{\epsilon}^2) \tag{1.1} \\[3pt] \implies y - (\beta \cdot x) &\sim \mathcal{N}(0, \sigma_{\epsilon}^2) \tag{1.2} \end{align}\]

ECOMPO_DATA <- data.frame(
  x = c(23.54, 22.97, 11.78, 18.64, 14.36, 18.93, 19.04, 10.768, 27.39, 
    17.28, 22.63, 14.32, 22.95, 27.82, 28.34, 23.71, 29.73, 25.41, 
    27.60, 15.54 ),
  y = c(247.35, 180.47, 86.78, 178.12, 168.72, 135.10, 160.17, 172.48, 
    274.35, 130.32, 224.71, 144.31, 245.39, 314.81, 256.71, 187.07, 
    251.90, 253.75, 181.58, 179.68)
)

In the model equation (1.2), notice that the expected distribution of the residuals has their variance, but its center is fixed at zero. In the following examples, we will estimate the variance \(\sigma_{\epsilon}^2\) with that of the residuals in the data.

1.1 Bayesian Inference

1.1.1 The Notion of Joint Probability

In the “Beta vs. Residuals” graph 1.1, the residuals expected distribution is shown with green, and its actual distribution \(\epsilon = y - (\beta \cdot x)\) with purple.

With \(\beta = 9.3\) in (1), the residuals approach their expected –zero centered– distribution. In contrast, when \(\beta\) increases to \(15\) in (2), the residuals spread out, and their center moves away from its expected zero value. We say that \(\beta = 15\) “makes it unlikely to observe the data we have.”

The phrase “makes it unlikely to observe the data” is an abbreviated way to say that some parameter values make it more or less probable for the model to produce the observed data; in this case, less likely.

Beta vs. Residuals. The residuals expected distribution is colored green. It is centered at zero and has the residuals’ actual variance. The residuals’ real distribution is colored purple. At the moment (1), \(\beta=9.3\) produces residuals whose distribution relatively matches its expected distribution, where the green and purple blobs overlap. At the moment (2), \(\beta=15\) causes more spread and negative residuals. The expected and actual residuals become more different from each other.

Figure 1.1: Beta vs. Residuals. The residuals expected distribution is colored green. It is centered at zero and has the residuals’ actual variance. The residuals’ real distribution is colored purple. At the moment (1), \(\beta=9.3\) produces residuals whose distribution relatively matches its expected distribution, where the green and purple blobs overlap. At the moment (2), \(\beta=15\) causes more spread and negative residuals. The expected and actual residuals become more different from each other.

This exercise aims to give you the intuition that a model’s parameters have a joint distribution. Changing any of them changes the likelihood of observing the data according to the model.

1.2 The Bayesian Inference Components

In our eCompo model, the joint distribution of observing the data (\(y, X\)) with the model \(\mathcal{M}\), and parameters \((\beta, \sigma_{\epsilon})\) can be denoted as: \[\begin{align*} \pi(\beta, \sigma_{\epsilon} \mid \mathcal{M}, y, X) \end{align*}\]

Where \(\pi()\) is the distribution, which depends on \(\beta, \sigma_{\epsilon}\) given the model and the data.

1.2.1 Notational Convention

For continuous random variables, we use the term distribution in the same sense as probability density function (PDF), which corresponds to a probability mass function (PMF) for discrete variables. We use \(\pi(\cdot)\) to denote a PDF or a PMF. From the context, it will be evident which is the case.

Being \(\pi(\cdot)\) the distribution of \((\cdot)\), it is a different function for each different \((\cdot)\). Again, it refers to a PDF in the continuous case or a PMF otherwise. On some occasions, to avoid confusion, we use \(Pr(\cdot)\) to denote an event’s probability.

1.2.2 Simplifying the Notation

For simplicity, we will refer with the vector \(\theta\) to all the model parameters. In turn, with \(y\), we will represent all the data, so \(X\) will be implicit. Being the model constant, we will also omit it.

With this simplified notation, the joint probability of observing the data (\(y, X\)) with the model’ parameters \(\theta\) is: \[\begin{align*} \pi(\theta \mid y) \end{align*}\]

By the Bayes rule we know: \[\begin{align} \pi(\theta \mid y) &= \frac{\pi(y, \theta)}{\pi(y)} = \frac{\pi(y \mid \theta) \cdot \pi(\theta)}{\pi(y)} \tag{1.3} \end{align}\]

The equation (1.3) is fundamental in Bayesian inference, and each term has a name that we will refer to frequently.

  • \(\pi(\theta)\) is the prior distribution: the one we assume has the parameters before seeing the data.
  • \(\pi(y \mid \theta)\) is the data sampling distribution. When you consider the data \("y"\) is constant, this is a function only of \(\theta\), and this term is called the model likelihood.
  • \(\pi(\theta \mid y)\) is the posterior distribution: It is the distribution of the parameters conditioned to the data.
  • \(\pi(y)\) is sometimes called the evidence or the marginal likelihood.

The names prior and posterior refer to the fact that they are distributions modeled before and after considering at the data, respectively.

1.2.3 \(\pi(y)\)

\(\pi(y)\), the denominator in (1.3), is a value that depends only on the data \(y\) and is the normalization constant that makes the integration of the distribution \(\pi(\theta \mid y)\) equal to \(1\) (or the sum in the discrete case).

For continuous \(\theta\), \(\pi(y)\) is: \[\begin{align} \pi(y) &= \int \pi(y \mid \theta) \cdot \pi(\theta) \, \text{d}\theta \tag{1.4} \end{align}\]

For the discrete case, the previous equations is equivalent to:

\[ \pi(y) = \sum_{\theta} \pi(y \mid \theta) \cdot \pi(\theta) \]

In these expressions, we are averaging \(\pi(y \mid \theta)\) (the likelihood) over all possible \(\theta\) values. This is an operation called marginalization and is the reason why \(pi(y)\) is called the marginal likelihood.

1.3 Building the Posterior Distribution

We will develop, for the eCompo model, the distributions we just defined in the previous section. Later we will use them to solve that model.

1.3.1 The Prior Distribution

The prior distribution is modeled with the previously known information about the model parameters or the general expectations. This is called a weakly-informative prior. In our example, we ask the company experts what the model’s parameters domain is.

The experts tell us the value of the \(\beta\) must be between \(1\) y \(25\), with an approximately Normal distribution. Now we can say that the \(\beta\) prior distribution is Normal, with its center around \((25+1)/2=13\), and a standard deviation about \((25-1)/4=6\).

The experts also tell us that the Exponential distribution is suitable as a prior of the parameter \(\sigma_\epsilon\), using the reciprocal of the standard deviation of “\(y\)” as its \(\lambda\) parameter, so we adopt as prior of \(\sigma_\epsilon\) an exponential distribución with \(\lambda=1/sd(y)\).

PRIOR_BETA_SD <- 6
PRIOR_BETA_MEAN <- 13
PRIOR_SIGMA_LAMBDA <- 1 / sd(ECOMPO_DATA$y)

The prior distribution \(\pi(\theta)\) of our model is:

prior_density <- function(beta, sigma, log = TRUE) {
  log_prior_beta <- dnorm(beta, mean = PRIOR_BETA_MEAN, sd = PRIOR_BETA_SD, log = TRUE)
  log_prior_sigma <- dexp(sigma, rate = PRIOR_SIGMA_LAMBDA, log = TRUE)
  result <- log_prior_beta + log_prior_sigma

  if (log != TRUE) result <- exp(result)
  return(result)
}

Where the beta and sigma function parameters correspond to \(\beta\) and \(\sigma_\epsilon\) in our model (1.2). Since these parameters are independent, their joint prior distribution is the product of these parameters’ individual priors. As expected, this distribution does not depend on the data, only on \(\theta\). The hyper-parameters like PRIOR_BETA_SD or PRIOR_SIGMA_LAMBDA used to model each individual prior are model constants. We use the logarithmic scale for the calculations as a purely technical matter (stability and numerical accuracy).

Prior Distribution. We have a weakly-informative prior distribution, supporting all plausible model parameter values.

Figure 1.2: Prior Distribution. We have a weakly-informative prior distribution, supporting all plausible model parameter values.

Our weakly-informative prior distribution supports all plausible parameter values. It is shown in the “Prior Distribution” graph 1.2.

1.3.2 The Likelihood Distribution

The likelihood distribution \(\pi(y \mid \theta)\) of our model is:

likelihood_density <- function(beta, sigma, y, x, log = TRUE) {
  result <- 0
  for (i in seq_along(y)) {
    mean <- beta * x[i]
    result <- result + dnorm(y[i], mean = mean, sd = sigma, log = TRUE)
  }

  if (log != TRUE) result <- exp(result)
  return(result)
}

In our model, the equation (1.1) defines the probability of seeing each observation in the data. Since these are independent, their joint probability is the product of their individual likelihood. In the logarithmic scale, it is the sum of the logarithm of their probabilities.

Note that likelihood_density() and prior_density() are non-normalized density functions. For them to be normalized, the integral with respect to \(\theta\), of each of these functions, is required to be \(1\).

For example, in the case of the prior distribution, this is \(\int_{-\infty}^{+\infty} \pi(\theta)~\textrm{d}\theta = 1\), which is not true. We need to divide each function’s current output by the value of its integral over \(\theta\). We will deal with this later.

1.3.3 The Joint Distribution

The name Joint distribution can be confusing because all these distributions (prior, likelihood, posterior) are joint distributions on the model parameters. Here we are referring to \(\pi(y, \theta)\), the numerator in (1.3), which is the joint density of the parameters and the data.

As this function is the product of the prior and likelihood densities \(\pi(y, \theta) = \pi(y \mid \theta) \cdot \pi(\theta)\), this is easy:

joint_density <- function(beta, sigma, data, log = TRUE) {
  result <- prior_density(beta, sigma) + likelihood_density(beta, sigma, data$y, data$x)

  if (log != TRUE) result <- exp(result)
  return(result)
}

As in previous cases, this is also a non-normalized distribution, and we shall correct this in the posterior distribution.

1.3.4 The Posterior Distribution

The joint distribution \(\pi(y, \theta) = \pi(y \mid \theta) \cdot \pi(\theta)\) computed by the function joint_density() above, is the non-normalized distribution of \(\pi(\theta \mid y)\), in the sense its integral is not equal to \(1\). We must divide joint_density() by the normalization constant \(\pi(y)\) to get the normalized posterior density \(\pi(\theta \mid y)\) according to (1.3).

We will get \(\pi(y)\) by marginalizing out \(\theta\) from \(\pi(y, \theta)\). Mathematically, this corresponds to integrating \(\pi(y, \theta)\) over \(\theta\). \[\begin{align} \pi(y) &= \int \pi(y , \theta) \text{d}\theta \end{align}\]

In turn, we can replace \(\pi(y, \theta) = \pi(y \mid \theta) \cdot \pi(\theta)\), which gives us the equation (1.4). In our case, \(\theta\) contains two parameters: \[\begin{align} \pi(y) &= \int_0^{+\infty} \int_{-\infty}^{+\infty} \textrm{joint}\_\textrm{density}(\beta, \sigma_\epsilon ,\text{data}) ~\textrm{d}\beta~\textrm{d}\sigma_\epsilon \end{align}\]

The posterior distribution \(\pi(\theta \mid y)\) has the form:

posterior_density <- function(beta, sigma, data, log = TRUE) {
  result <- joint_density(beta, sigma, data, log = TRUE) - log(pi_y)

  if (log != TRUE) result <- exp(result)
  return(result)
}

Where pi_y is the value of \(\pi(y)\), the value we must calculate. In the code above, we are subtracting in the logarithm scale, which is equivalent to dividing in the linear one.

We have the difficulty that when we integrate with pi_y = 1 the posterior_density() function returns values smaller than \(3.4 \times 10^{-47}\), causing numerically unstable integration. To deal with this, we shall assign to pi_y a temporary value while we integrate, and then shall adjust it according to the integral.

pi_y <- 3.4e-47
int_pi_y <- integrate(function(sigmas) { # Integrate over sigma
  sapply(sigmas, function(sigma) {
    integrate(function(beta) # Integrate over beta
      posterior_density(beta, sigma, ECOMPO_DATA, log = F), -Inf, +Inf)$value
  })
}, 0, +Inf)$value

pi_y <- pi_y * int_pi_y

In the code above, like in the mathematical expression, the inner integration is the integrand of the outer one. We wrap the inner integration with a call to sapply() to compute the inner integral over beta for many sigmas from the outer integration.

The final value is \(\pi(y)= 2.80 \times 10^{-46}\), and now the posterior_density() function is a proper density distribution.

Posterior Distribution Graph

The Posterior Distribution graph 1.3 shows the surface of our just developed posterior_density() function. Notice that it is a map of each possible set of parameters (the horizontal plane) and its density (the vertical axis), which in turn tell us there are some combination of parameters that explain the data better than others.

To build this graph, we compute the dist_mat matrix with the code below. This matrix contains the posterior density for a grid of equidistant points in the beta and sigma plane. The functions row_to_beta() and col_to_sigma() –also in the code below– allow us to obtain the values of beta and sigma from the row and column position of any cell in the dist_mat matrix.

beta_start <- 7.5
sigma_start <- 19

betas <- seq(beta_start, 11, by = 0.02)
sigmas <- seq(sigma_start, 79, by = 0.02)

dist_mat <- outer(
  betas, sigmas,
  function(beta, sigma) posterior_density(beta, sigma, ECOMPO_DATA, log = FALSE)
)
row_to_beta <- function(row_nbr) (row_nbr - 1) * 0.02 + beta_start
col_to_sigma <- function(col_nbr) (col_nbr - 1) * 0.02 + sigma_start
str(dist_mat)

num [1:165, 1:2931] 4.01e-20 8.73e-20 1.88e-19 4.02e-19 8.49e-19 ...

The Posterior Distribution Mode

In the Posterior Distribution graph 1.3, we see it has a prominent mode. Estimation algorithms of the maximum-likelihood type estimate the likelihood distribution mode’s position as the model parameters’ solution. They don’t use to have a prior distribution, and then their posterior is just the likelihood, hence the maximum-likelihood name.

Posterior Distribution. Numerically calculated posterior distribution, with some contour lines to better notice its shape, like \(\sigma_\epsilon\) having a longer tail on its right end.

Figure 1.3: Posterior Distribution. Numerically calculated posterior distribution, with some contour lines to better notice its shape, like \(\sigma_\epsilon\) having a longer tail on its right end.

We can estimate the mode position using the dist_mat matrix we computed in the previous section. For this, we find the cell with the maximum value in the matrix, and with that cell position, we get the value of beta and sigma with a precision of \(0.02\), the spacing in the grid.

mode_pos <- which(dist_mat == max(dist_mat), arr.ind = T)
c(row_to_beta(mode_pos[1]), col_to_sigma(mode_pos[2]))

[1]  9.302 35.260

In our example, we can also find it with a numerical optimizer in R. This time we will find the position of likelihood mode.

nlm(
  function(theta) 
   -likelihood_density(theta[1], theta[2], ECOMPO_DATA$y, ECOMPO_DATA$x), c(7, 11)
)

:
$estimate
[1]  9.303106 35.262023
:

Since the nlm() (Non-Linear Minimization) function finds minimums, and we are looking for the maximum, we invert the sign of the likelihood_density() function. Also, because log is a monotonic function, we can optimize on that scale for numerical convenience.

We can get the posterior mode in non-trivial models by using the optimizing() function in rstan.

🖥️ Get this Article Source Code

Get this article in .pdf format, including the solution to exercises (implicitly or explicitly) left to the reader. Additionally, you will get the supplemental source code, including this article’s code snippets and other ones used to create the post, such as the 3D graphics or compute estimands.

1.4 Computing Expected Values

1.4.1 The Expected Value of an Estimand

Suppose we are interested in a scalar parameter which is a function \(g(\theta)\) of the model parameters. Its expected value, by definition is: \[ \mathbb{E}[g(\theta)] = \int g(\theta) \cdot \pi(\theta \mid y) ~\textrm{d}\theta \tag{1.5} \] Where the integration is over all the parameters in \(\theta\). In our example it is over \(\beta\) and \(\sigma_{\epsilon}\). On the other hand, \(\pi(\theta \mid y)\) in our model is the posterior_density() function.

1.4.2 The Parameters’ Marginal Distribution

The posterior distribution we have, is the parameters’ joint density \(\pi(\beta, \sigma_{\epsilon})\) conditioned to the data. We need both a value for \(\beta\) and \(\sigma_{\epsilon}\) to evaluate this distribution. However, we can get the marginal distribution of each parameter, by integrating all the parameters except the one for which we want the marginal. For example, the marginal distribution of \(\beta\) is: \[\begin{align} \pi(\beta) &= \int \pi(\beta, \sigma_\epsilon) ~\textrm{d}\sigma_\epsilon \\ \pi(\beta) &= \int \pi(\beta \mid \sigma_\epsilon) \cdot \pi(\sigma_\epsilon) ~\textrm{d}\sigma_\epsilon \end{align}\]

We can see, in the last equation above, that the marginal \(\pi(\beta)\) is the average of \(\pi(\beta \mid \sigma_\epsilon)\) along the possible \(\sigma_\epsilon\) values for the given \(\beta\). For clarity, we are omitting in the notation that all these distributions are also conditioned to the data \(y\).

beta_marginal <- function(betas) {
  sapply(betas, function(beta) { # Compute the integral for each `beta` in `betas`
    integrate(function(sigma) { # Integrate over sigma
      posterior_density(beta, sigma, ECOMPO_DATA, log = F)
    }, 0, +Inf)$value
  })
}

The code above implements the marginal \(\pi(\beta)\) density. We wrap the integration with a call to sapply() to make the function able to compute, in a single call, the integral for many beta values in the argument betas.

Marginal Distribution of Beta and Sigma. These are the marginal distributions of our model parameters, plotted under the same scales. There we can see we have a lot more uncertainty about sigma than about beta.

Figure 1.4: Marginal Distribution of Beta and Sigma. These are the marginal distributions of our model parameters, plotted under the same scales. There we can see we have a lot more uncertainty about sigma than about beta.

These marginal distributions will help us to build easier to understand equations and code.

The Expected Value of a Single Parameter Estimand

For the particular case where we are interested in an estimand \(g(\alpha)\) which only depends on a single parameter \(\alpha\), the same equation above can be denoted using the parameter marginal distribution \(\pi(\alpha)\).

\[ \mathbb{E}[g(\alpha)] = \int g(\alpha) \cdot \pi(\alpha) ~\textrm{d}\alpha \tag{1.6} \] This is the case –for example– for the expected value of a parameter or its standard deviation.

1.5 Solving the eCompo model

Now that we have the posterior distribution, we can learn about the model parameters, which we call solving it. For a more compelling case dealing with estimands involving more than one parameter, we will study the Ratio of beta to sigma \((\beta / \sigma_{\epsilon})\).

Computing the expected value of estimands depending on a single parameter is pretty easy with the equation (1.6) and the marginal distributions we developed in the previous section. We will leave them as exercises for the reader. You can check how we compute expectations for the Ratio beta to sigma as guidance to solve single parameter expectations. In the end, you should get:

           mean       sd    median     mad       Q5      Q95
beta    9.30638  0.39920    9.3053  0.3139   8.6547   9.9607 
sigma  38.59072  6.65043   37.7209  5.1139  29.4171  50.7350 

Notice that here we are computing the actual values of those estimands. Later we will use them as their ground-truth values to assess the accuracy of other methods.

1.5.1 The Ratio’s Expected Value

For this estimand we need to use the equation (1.5), with \(g(\beta, \sigma_{\epsilon})= (\beta / \sigma_{\epsilon})\):

\[ \mathbb{E}(\beta / \sigma_{\epsilon}) = \int_0^{+\infty} \int_{-\infty}^{+\infty} (\beta / \sigma_{\epsilon}) \cdot \textrm{joint}\_\textrm{density}(\beta, \sigma_\epsilon ,\textrm{data}) ~\textrm{d}\beta~\textrm{d}\sigma_\epsilon \]

Which in R code is:

beta_sigma_ratio_ev <- integrate(function(sigmas) { # Integrate over sigma
  sapply(sigmas, function(sigma) {
    integrate(function(beta) # Integrate over beta
      beta/sigma * posterior_density(beta, sigma, ECOMPO_DATA, log = F), -Inf, +Inf)$value
  })
}, 0, +Inf)$value %>% print()

0.2476815

1.5.2 The Ratio’s Standard Deviation

We will use again the equation (1.5), except that this time \(g(\beta, \sigma_{\epsilon})= (\beta / \sigma_{\epsilon} - \mu_r)^2\) where \(\mu_r\) is the Ratio’s expected value computed in the previous section.

0.04153125

1.5.3 The Parameters’ Median

The Median is a little bit tricky, so we will first compute the Parameters’ Median, to get a better sense of how to calculate it. In the next section, we will add a little bit of complexity to compute the Ratio’s Median.

The median is the point in the parameter distribution’ horizontal axis (1.4) where the area under the curve is split exactly into two halves. This condition is easier to state using the Cumulative Distribution Function (CDF). For example, let’s beta_cdf() be the beta CDF: \[ \textrm{beta}\_\textrm{cdf}(\beta_0) = \textrm{Pr}(\beta<\beta_0) = \int_{-\infty}^{\beta_0} \pi(\beta) \, \textrm{d}\beta \] Then \(\tilde{\beta}\), the median of \(\beta\), is the value that satisfies the condition: \[ \textrm{beta}\_\textrm{cdf}(\tilde{\beta}) = \textrm{Pr}(\beta<\tilde{\beta}) = 0.5 \] In R the beta_cdf() function is:

beta_cdf <- function(beta_limit) {
  integrate(function(beta) beta_marginal(beta), -Inf, beta_limit)$value
}

Therefore, beta_median\(=\tilde{\beta}\) must satisfy beta_cdf(beta_median) = 0.5, which means beta_median = beta_cdf_inverse(0.5). That is, the median is the inverse of the beta CDF evaluated at \(0.5\). We compute the beta_cdf_inverse(p) by solving the equation beta_cdf(beta) - p where p is a given value, like \(0.5\) for the median.

beta_cdf_inverse <- function(p) {
  to_solve <- function(beta) {
    beta_cdf(beta) - p
  }
  uniroot(to_solve, c(0, 11))$root
}

Now it is easy to get the median by evaluating beta_cdf_inverse(0.5).

median_beta <- beta_cdf_inverse(0.5)

        median
beta    9.3053   
sigma  37.7209

1.5.4 The Ratio’s Median

Similar to what we did to compute the parameters median, we will first compute the Ratio’s Cumulative Distribution Function (CDF), then we will solve it to get its inverse.

The Ratio’s CDF computes \(Pr(\beta / \sigma_{\epsilon} < r)\), where \(r\) is a given value, which implies \(Pr(\beta < r · \sigma_{\epsilon})\). Therefore, we compute the median using: \[ %% \begin{align} \textrm{ratio}\_\textrm{cdf}(r) = \textrm{Pr}(\beta / \sigma_{\epsilon} < r_0) = \int_0^{+\infty} \int_{-\infty}^{r · \sigma_{\epsilon}} \textrm{joint}\_\textrm{density}(\beta, \sigma_\epsilon ,\text{data}) ~\textrm{d}\beta~\textrm{d}\sigma_\epsilon %% \end{align} \]

In the code below, to avoid precision issues, we use the limits containing more than 99% of the distribution mass, rather than from/to infinity.

beta_sigma_ratio_cdf <- function(ratio) {
  integrate(
    function(sigmas) { # Integrate over sigma
      sapply(sigmas, function(sigma) {
        integrate(function(beta) { # Integrate over beta
          posterior_density(beta, sigma, ECOMPO_DATA, log = F)
        }, 7.2, sigma * ratio)$value
      })
    }, 18, 79
  )$value
}
c(Q50=beta_sigma_ratio_cdf_inverse(0.50))

      Q50
0.2467930

1.5.5 The Ratio’s Credible Interval

Having the inverse of the Ratio’s CDF, we can easily compute its credible interval at \(90\%\). For example, the lower \(5\%\) limit is beta_sigma_ratio_cdf_inverse(0.05).

       Q5       Q95 
0.1815486 0.3182257 

Probability density function of the Ratio beta/sigma. We use the the Ratio’s CMF to compute this PDF.

Figure 1.5: Probability density function of the Ratio beta/sigma. We use the the Ratio’s CMF to compute this PDF.

We can use the Ratio’s CMF to compute its PDF, shown in the Probability density function of the Ratio beta/sigma graph 1.5.

1.5.6 Summary of the Ratio’s Statistics

We have computed the following Ratio’s Statistics.

           mean       sd  median      Q5     Q95
ratio    0.2476  0.04153  0.2467  0.1815  0.3182

🎁️ Get this Article in .pdf format

Get this article beautifully presented in .pdf format, including the solution to exercises (implicit or explicit) left to the reader.

1.6 Conclusion

We have developed the posterior distribution for the eCompo model and then used it to solve it. We have done this to make tangible what exactly the posterior distribution is and hopefully refresh or clarify some other concepts.

To solve the model, we computed expectations, which lead us to the actual value of estimands for any function of the model parameters, like the Ratio beta to sigma.

We have also seen using R for many different tasks, like integrating, finding a function minimum, or solving non-linear equations.

However, it should be clear this method is not viable to solve problems with more than just a few parameters. Imagine a twelve parameters problem, the math acrobatics for integrating twelve concurrent integrals, or the grid for the posterior distribution having a dozen dimensions. The required complexity and computational resources for such cases are unmanageable.

When dealing with multilevel regressions, it is not unusual to have tens of parameters or more. For example, for the “Junior School Project,” we had more than one thousand parameters.

Comments