In these days of the big data world, situations where you have a large amount of data from which you have to "read between the numbers " instead of "read between the lines " arises more often and naturally. It is a fascinating subject, but sometimes a little bit frustrating or challenging because there is always something new to learn.

As I don't have a great memory, any time I have a challenge learning something, forcing me to do a little "research" or to peruse some books, I write an article in this section, with enough bread crumbs to help me to come back over those steps when required.

Latest Data Analysis Articles

Posted: 2021-03-14 16:44:35
Last edited: 2021-05-17 17:16:45

Estimaremos la Eficacia de varias vacunas contra Covid-19 (Pfizer, Moderna, AstraZeneca, y Sputnik), usando el mismo método de estimación para todas, que nos permitirá compararlas mirandolas a través un mismo lente.

Aprovecharemos para evaluar la Eficacia de las dos vacunas Sinopharm ensayadas en Perú, la de los Institutos Beijing y Wuhan, según los datos filtrados a la prensa. Por supuesto, la validez de esos resultados esta condicionada a qué tanto la data filtrada corresponde a la data final y completa del ensayo de las vacunas.

Usaremos el modelo empleado por Pfizer para reportar su ensayo, y también un método más robusto, con un Modelo Bayesiano completo. Ambos modelos son explicados en detalle.

Posted: 2021-03-14 17:59:01
Last edited: 2021-03-14 17:59:01

In 1989, Peter Mortimer published "A study of effective junior schools". There he wants to find out how much the students' background influences his/her learning ability. But his main focus is to determine whether individual schools can affect students learning.

We will look at Mortimer's study data, analyzing it as an exercise for practicing multilevel regression using full Bayesian analysis. We will find how much schools make a difference.

Posted: 2021-03-13 17:16:45
Last edited: 2021-03-13 17:16:45

In data analysis, we often want to contrast how certain groups in the data affect an outcome variable. For example, the school's effect on the student's learning, the education on the voter political preferences, or the boss's on employee performance.

When this is the case, doing a separate regression for each group is an analysis option. However, the problem then is that each group's model completely ignores others' observations. Ignoring the general data pattern causes overfitting when there are few observations in a group.

Another alternative is to ignore the groups and do a single regression for all the data. But when the analysis's focus is to compare each group's effect, this path is not feasible.

A multilevel regression incorporates in a single model both the trend within each group and the general pattern in all the observations, enabling us to find out how much each group affects the outcome variable.

Posted: 2021-04-05 18:45:37
Last edited: 2021-04-05 14:59:23

The Bayesian paradigm treats the model’s parameters like random variables whose different values make the model produce data that is more or less likely to come from the same distribution as the observed data.

This post develops the Bayesian inference concepts through their application in a practical example, solving a simple regression model numerically, in a fully Bayesian way. We will match the theoretical concepts with their practical implementation, helping you to grasp their meaning entirely.

Posted: 2020-12-31 16:16:20
Last edited: 2020-12-31 16:16:20

When I was young, I was curious to know how a computer calculates functions such as the cosine, the logarithm, or the tangent's inverse, the so-called elementary transcendental functions.

It turns out that they are calculated with the so-called Chebyshev polynomial approximation, where you get each polynomial coefficient by solving an integral.

In some cases, you need a couple of tens of coefficients, so there are lots of integrals to solve. It is more complicated when the function you want to approximate appears in those integrals because you don't have how to calculate that function yet, and that's precisely why you want to have the approximation!

With patience and a little luck, I could analytically solve the approximation to some of these functions; and here I will tell you how I did it.

Posted:
Last edited:
<p>Aquí queremos mostrar los datos que describen la evolución y la situación actual de la pandemia en el Perú, así como proveer algunos indicadores clave, señalando cuanto y qué tan rápido se esparce, con detalle de nuestros departamentos del Perú y con referencias a otros países para comprender mejor los datos propios. Como siempre, la mejor manera de entender los datos es a través de gráficos.</p> <p>Estos gráficos e indicadores van a ser actualizados en lo posible tan pronto como hay nuevos datos.</p>
Posted:
Last edited:
<p>In a previous article we introduced a <em>"A Simple DSLR Camera Sensor Noise Model"</em> and then we used this model to analyze a camera sensor raw noise in <em>"Use of Linear Regressions to Model Camera Sensor Noise"</em>. Those articles deal with the theoretical and practical basis to analyze noise from raw image samples.</p> <p>The input to that analysis are samples of photographs taken at different exposition levels, from which we compute the variance and mean of the photosite readings. Until now, we have prepared the samples and computed their variance and mean using <em>Iris Software</em>. But that workflow was very labor intensive, error prone, and inhibit us to further analysis like the study of the covariance between channels in the same sample because <em>Iris software</em> does not bring that information. Furthermore, <em>Iris</em> is a windows only software, which prevents users with other OSs to collect noise data using this tool.</p> <p>To solve those issues, we have prepared a R package called <code>imgnoiser</code> (from IMaGe NOISE R package). In this article, we introduce you to its usage, which automatizes the acquisition, analysis of noise statistics, letting us to focus in the study, instead of devoting a lot of time to collect the data or to prepare code to study that data.</p> <p>You can install the <code>imgnoiser</code> package from github using <code>devtools::install_github('oscardelama/imgnoiser')</code>...</p>
Posted:
Last edited:
<p>This is the repository of the formulas we use along other articles.</p> <p>Currently contains formulas for:</p> <ul> <li><p>Statistics</p> <ul> <li>Expected Value</li> <li>Covariance</li> <li>Variance</li> <li>Standard Deviation</li> <li>Correlation Coefficient</li> </ul></li> <li><p>Analytic Relationship of two Numerically Correlated Functions</p> <ul> <li>Linear Case</li> <li>Quadratic case</li> </ul></li> </ul>
Posted:
Last edited:
<p>Notes about Linear Regression<br /></p> <p>These are my personal notes related about how to fit Linear Regression, how to interpret the most useful R tests. </p>
Posted:
Last edited:
<p>In the post <em>"Use of Linear Regressions to Model Camera Sensor Noise"</em> we need to fit two linear regressions. However, it turns out that fitting those regressions was not a trivial task: the data was <em>heteroscedastic</em> (with non constant variance) and required some analysis, transformation and cleaning.</p> <p>This was a chance to learn and practice some techniques about linear regressions using the <em>R Language</em>. This article details all the steps, considerations and tools we used to fit a valid model over that heteroscedastic data.</p>