This post got a little delayed since the paper's publication, but the intervening time has given me an opportunity to reflect.

So this post has some extra thoughts regarding more recent NLG developments - e.g., pretrained models like GPT(2), and the suitability of beam search.