Jose Jorge's Threads | Buzz Chronicles

How many times you have heard that your residuals should be independent, normally distributed with mean zero, and with constant variance?

No matter if your answer is zero or a million times. This thread can help you

It doesn't even matter that you don't know about residuals👇

First of all: What is a residual?

It is a synonym for error. It is the difference between the expected output and the output of our model:

Y - Ym

Where Ym is the output of our model. So residuals can be positive or negative and we need them to stay close to zero

1: Independence

We don't want the error in for some input to be dependent on the error for another input

That would mean there is information about the relationship between inputs and outputs that our model is missing and that is present in the residuals2

2: Mean equals zero

Well, this is an intuitive one, isn't it?

We have said already that we need residuals close to zero. There should be both negative and positive residuals.

But we don't want a single huge positive error and then a lot of small negative ones

Let's continue

3: Normally distributed

The mean of the residuals is zero, but that's not enough

We need about 50% of the errors to be negative and the other half to be positive

Also, we need most of the errors to be close to zero

In other words, we need the errors to be normally distributed

Authors Jose Jorge