One of the most popular activation functions used in deep learning models is ReLU.
I asked: "Is ReLU continuous and differentiable?"
Surprisingly, a lot of people were confused about this.
Let's break this down step by step: ↓
Let's start by defining ReLU:
f(x) = max(0, x)
In English: if x <= 0, the function will return 0. Otherwise, the function will return x.
If you draw this function, you'll get the attached chart.
Notice there are no discontinuities in the function.
This should be enough to answer half of the original question: the ReLU function is continuous.
Let's now think about the differentiable part.
A necessary condition for a function to be differentiable: it must be continuous.
ReLU is continuous. That's good, but not enough.
Its derivative should also exist for every individual point.
Here is where things get interesting.
We can compute the derivative of a function using the attached formula.
(I'm not going to explain where this is coming from; you can trust me on this one.)
We can use this formula to see whether ReLU is differentiable.