1 There's a chasm between an NLP technology that works well in the research lab and something that works for applications that real people use. This was eye-opening when I started my career, and every time I talk to an NLP engineer at @textio, it continues to strike me even now.

2 Research conditions are theoretical and/or idealized. A huge problem for so-called NLP or AI startups with highly credentialed academic founders is that they bring limited knowledge of what it takes to build real products outside the lab.
3 A product is ultimately a thing that people pay for - not just cool technology or user experience. But I’m not even talking about knowledge gaps in go-to-market work. I'm talking purely technical gaps: how you go from science project to performant + delightful user experience.
4 Most commoditized NLP packages solve well-understood problems in standard ways that sacrifice either precision or performance. In a research lab, this is not usually a hard trade-off; in general, no one is using what you make, so performance is less important than precision.
5 In software, when you’re making something for real people to use, these tradeoffs are a big deal. Especially if you’re asking those people to pay for what you’ve made (can’t get away from that pesky GTM thinking). They expect quality, which includes precision AND performance.
6 Example: Let’s say you’re trying to do something simple and commoditized, like implement a grammar checker. (I’ll pause while someone argues with me, but I stand by it: grammar checking is a commodity offering, not a commercial one.)
7 Grammar checkers have historically been rule-based, which means that someone can sit down + write a dozen/hundred/thousand rule-based statements that capture the system you want to implement. But not all rules are created equal!
8 You can choose a small number of rules that account for the majority of grammar mistakes that people make. By keeping the rule set small, you can make sure the system works faster - it won’t take huge swaths of time to calculate errors and suggestions across an entire document.
9 But by choosing a small set of grammar rules, you end up with a long tail of mistakes that profoundly erodes user confidence in your system overall. You may catch 80% of the errors with 5% of the rules, but the 20% you mischaracterize makes the user think your system is trash!
10 By contrast, implementing thousands of rules gets you awesome precision. But how long do all these rules make it take to grammar-check someone's real documents? You may get all the grammar right, but your app's performance erodes user confidence anyway.
11 All this is for a "simple," commoditized feature… not so simple, even with rules, and even for something commoditized that everyone expects to "just work." Now let’s say you’re NOT implementing a grammar checker as a be-all, end-all, but as a component of a larger system.
12 The complexity that exists in your grammar checker exists across your system, and further, all the libraries you use (build, buy, or borrow) have to interact with each other… further slowing your system down and/or compromising the precision of one part in service of another.
13 You only encounter these issues as a production NLP engineer. They don’t come up in the research lab. Which is why it takes so long for great research to impact real products (which again, are things people pay for). And why so many researchers do not enjoy industry work.
14 Thanks to @kwhumphreys who inspired me thinking down this path today and who solves these problems for us every day! 🎉

More from Machine learning

Really enjoyed digging into recent innovations in the football analytics industry.

>10 hours of interviews for this w/ a dozen or so of top firms in the game. Really grateful to everyone who gave up time & insights, even those that didnt make final cut 🙇‍♂️ https://t.co/9YOSrl8TdN


For avoidance of doubt, leading tracking analytics firms are now well beyond voronoi diagrams, using more granular measures to assess control and value of space.

This @JaviOnData & @LukeBornn paper from 2018 referenced in the piece demonstrates one method
https://t.co/Hx8XTUMpJ5


Bit of this that I nerded out on the most is "ghosting" — technique used by @counterattack9 & co @stats_insights, among others.

Deep learning models predict how specific players — operating w/in specific setups — will move & execute actions. A paper here: https://t.co/9qrKvJ70EN


So many use-cases:
1/ Quickly & automatically spot situations where opponent's defence is abnormally vulnerable. Drill those to death in training.
2/ Swap target player B in for current player A, and simulate. How does target player strengthen/weaken team? In specific situations?
With hard work and determination, anyone can learn to code.

Here’s a list of my favorites resources if you’re learning to code in 2021.

👇

1. freeCodeCamp.

I’d suggest picking one of the projects in the curriculum to tackle and then completing the lessons on syntax when you get stuck. This way you know *why* you’re learning what you’re learning, and you're building things

2.
https://t.co/7XC50GlIaa is a hidden gem. Things I love about it:

1) You can see the most upvoted solutions so you can read really good code

2) You can ask questions in the discussion section if you're stuck, and people often answer. Free

3. https://t.co/V9gcXqqLN6 and https://t.co/KbEYGL21iE

On stackoverflow you can find answers to almost every problem you encounter. On GitHub you can read so much great code. You can build so much just from using these two resources and a blank text editor.

4. https://t.co/xX2J00fSrT @eggheadio specifically for frontend dev.

Their tutorials are designed to maximize your time, so you never feel overwhelmed by a 14-hour course. Also, the amount of prep they put into making great courses is unlike any other online course I've seen.

You May Also Like

Great article from @AsheSchow. I lived thru the 'Satanic Panic' of the 1980's/early 1990's asking myself "Has eveyrbody lost their GODDAMN MINDS?!"


The 3 big things that made the 1980's/early 1990's surreal for me.

1) Satanic Panic - satanism in the day cares ahhhh!

2) "Repressed memory" syndrome

3) Facilitated Communication [FC]

All 3 led to massive abuse.

"Therapists" -and I use the term to describe these quacks loosely - would hypnotize people & convince they they were 'reliving' past memories of Mom & Dad killing babies in Satanic rituals in the basement while they were growing up.

Other 'therapists' would badger kids until they invented stories about watching alligators eat babies dropped into a lake from a hot air balloon. Kids would deny anything happened for hours until the therapist 'broke through' and 'found' the 'truth'.

FC was a movement that started with the claim severely handicapped individuals were able to 'type' legible sentences & communicate if a 'helper' guided their hands over a keyboard.