**An amazing new project from @bearpelican was just released: https://t.co/DBov6sZTVS . A beautiful design; you can auto-generate a melody from chords, chords from a melody, and more.**

It's technically brilliant, combining BERT, seq2seq, and Transformer XL

https://t.co/jF3mO5aXiu

It's technically brilliant, combining BERT, seq2seq, and Transformer XL

https://t.co/jF3mO5aXiu

https://t.co/jF3mO5aXiu

https://t.co/DrCJxxJRAy

https://t.co/GgatWxa9nM

https://t.co/U9Yp5IZpxt

#### More from Data science

Here's the code to generate the data frame. You can get the "raw" data from https://t.co/jcTE5t0uBT

Obligatory stacked bar chart that hides any sense of variation in the data

Obligatory stacked bar chart that shows all the things and yet shows absolutely nothing at the same time

STACKED Donut plot. Who doesn't want a donut? Who wouldn't want a stack of them!?! This took forever to render and looked worse than it should because coord_polar doesn't do scales="free_x".

2/ In this gif, narrow relu networks have high probability of initializing near the 0 function (because of relu) and getting stuck. This causes the function distribution to become multi-modal over time. However, for wide relu networks this is not an issue.

3/ This time-evolving GP depends on two kernels: the kernel describing the GP at init, and the kernel describing the linear evolution of this GP. The former is the NNGP kernel, and the latter is the Neural Tangent Kernel (NTK).

4/ Once we have these two kernels, we can derive the GP mean and covariance at any time t via straightforward linear algebra.

5/ So it remains to calculate the NNGP kernel and NT kernel for any given architecture. The first is described in https://t.co/cFWfNC5ALC and in this thread