Wellll... A few weeks back I started working on a tutorial for our lab's Code Club on how to make shitty graphs. It was too dispiriting and I balked. A twitter workshop with figures and code:

Here's the code to generate the data frame. You can get the "raw" data from https://t.co/jcTE5t0uBT
Obligatory stacked bar chart that hides any sense of variation in the data
Obligatory stacked bar chart that shows all the things and yet shows absolutely nothing at the same time
STACKED Donut plot. Who doesn't want a donut? Who wouldn't want a stack of them!?! This took forever to render and looked worse than it should because coord_polar doesn't do scales="free_x".
More donuts. Let's get rid of all that messy variation in the data
One pie for @surt_lab, one for @watermicrobe, and one waiting to explode for @a2binny
Fine. Here's a pie for those of you that are still watching... This also took forever to render. The numbers are subject IDs
In all seriousness, here's the type of plot that I encourage for showing relative abundance by taxonomic data. Not fully polished, but you get the idea. Here each diagnosis has about 160 samples. With fewer samples, I'd use geom_jitter rather than geom_histogram
I prefer the boxplot/jitter plot because it allows the viewer to directly compare what I think is important. It also shows the variation in the data. Here's more polished version.
You can see how to do this for other taxonomic levels, incorporate statistical analysis to pick levels to show, and how to add a log scale on y-axis at https://t.co/U30ehefQPE. Thanks for attending my twitter workshop.

More from Data science

I have always emphasized on the importance of mathematics in machine learning.

Here is a compilation of resources (books, videos & papers) to get you going.

(Note: It's not an exhaustive list but I have carefully curated it based on my experience and observations)

📘 Mathematics for Machine Learning

by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong

https://t.co/zSpp67kJSg

Note: this is probably the place you want to start. Start slowly and work on some examples. Pay close attention to the notation and get comfortable with it.


📘 Pattern Recognition and Machine Learning

by Christopher Bishop

Note: Prior to the book above, this is the book that I used to recommend to get familiar with math-related concepts used in machine learning. A very solid book in my view and it's heavily referenced in academia.


📘 The Elements of Statistical Learning

by Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie

Mote: machine learning deals with data and in turn uncertainty which is what statistics teach. Get comfortable with topics like estimators, statistical significance,...


📘 Probability Theory: The Logic of Science

by E. T. Jaynes

Note: In machine learning, we are interested in building probabilistic models and thus you will come across concepts from probability theory like conditional probability and different probability distributions.
To my JVM friends looking to explore Machine Learning techniques - you don’t necessarily have to learn Python to do that. There are libraries you can use from the comfort of your JVM environment. 🧵👇

https://t.co/EwwOzgfDca : Deep Learning framework in Java that supports the whole cycle: from data loading and preprocessing to building and tuning a variety deep learning networks.

https://t.co/J4qMzPAZ6u Framework for defining machine learning models, including feature generation and transformations, as directed acyclic graphs (DAGs).

https://t.co/9IgKkSxPCq a machine learning library in Java that provides multi-class classification, regression, clustering, anomaly detection and multi-label classification.

https://t.co/EAqn2YngIE : TensorFlow Java API (experimental)

You May Also Like

Oh my Goodness!!!

I might have a panic attack due to excitement!!

Read this thread to the end...I just had an epiphany and my mind is blown. Actually, more than blown. More like OBLITERATED! This is the thing! This is the thing that will blow the entire thing out of the water!


Has this man been concealing his true identity?

Is this man a supposed 'dead' Seal Team Six soldier?

Witness protection to be kept safe until the right moment when all will be revealed?!

Who ELSE is alive that may have faked their death/gone into witness protection?


Were "golden tickets" inside the envelopes??


Are these "golden tickets" going to lead to their ultimate undoing?

Review crumbs on the board re: 'gold'.


#SEALTeam6 Trump re-tweeted this.
I like this heuristic, and have a few which are similar in intent to it:


Hiring efficiency:

How long does it take, measured from initial expression of interest through offer of employment signed, for a typical candidate cold inbounding to the company?

What is the *theoretical minimum* for *any* candidate?

How long does it take, as a developer newly hired at the company:

* To get a fully credentialed machine issued to you
* To get a fully functional development environment on that machine which could push code to production immediately
* To solo ship one material quanta of work

How long does it take, from first idea floated to "It's on the Internet", to create a piece of marketing collateral.

(For bonus points: break down by ambitiousness / form factor.)

How many people have to say yes to do something which is clearly worth doing which costs $5,000 / $15,000 / $250,000 and has never been done before.