BC DATA SCIENCE

Saved by @Jacobtldr

Pat Schloss
@PatSchloss 5 years, 9 months ago 1056 views

Save to PDF Share See On Twitter

Wellll... A few weeks back I started working on a tutorial for our lab's Code Club on how to make shitty graphs. It was too dispiriting and I balked. A twitter workshop with figures and code:

When are you doing pie charts?
— #BlackLivesMatter (@surt_lab) October 13, 2020

Here's the code to generate the data frame. You can get the "raw" data from https://t.co/jcTE5t0uBT

Obligatory stacked bar chart that hides any sense of variation in the data

Obligatory stacked bar chart that shows all the things and yet shows absolutely nothing at the same time

STACKED Donut plot. Who doesn't want a donut? Who wouldn't want a stack of them!?! This took forever to render and looked worse than it should because coord_polar doesn't do scales="free_x".

More donuts. Let's get rid of all that messy variation in the data

One pie for @surt_lab, one for @watermicrobe, and one waiting to explode for @a2binny

Fine. Here's a pie for those of you that are still watching... This also took forever to render. The numbers are subject IDs

In all seriousness, here's the type of plot that I encourage for showing relative abundance by taxonomic data. Not fully polished, but you get the idea. Here each diagnosis has about 160 samples. With fewer samples, I'd use geom_jitter rather than geom_histogram

I prefer the boxplot/jitter plot because it allows the viewer to directly compare what I think is important. It also shows the variation in the data. Here's more polished version.

You can see how to do this for other taxonomic levels, incorporate statistical analysis to pick levels to show, and how to add a log scale on y-axis at https://t.co/U30ehefQPE. Thanks for attending my twitter workshop.

More from Data science

Data Professor
@thedataprof

Cheat sheet that summarizes #DataScience in 10 pages
(Links in the comments below 👇)

2/ Link to the cheatsheet by Maverick

$Sunrit Jana \U0001f680$

Sunrit Jana 🚀
@JanaSunrise

Pandas is an amazing data analysis and manipulation library for python, Really popular when working with ML, Data science or more.

It has a robust data structure, Dataframe for manipulation and analyzing data.

Here's some tips to help you work better with pandas. Let's go! ↓

If you're not aware about what a Dataframe is, It's an optimized data structure for loading data, analysing it, manipulating data in it, and Mostly gathering insights.

It uses Cython backend which transpiles into C for optimized code.

Here's how a dataframe looks like,

Before we start, You need to ensure, you have pandas installed. If you don't, Do that before moving ahead!

Here are the tips, Let's go!

1/ Convert PD series to Dataframe

We all have struggled, when we deal with pandas series. It's always easier to work with Dataframes, rather than series. Here is how you can convert series to dataframe easily.

2/ How to create dummy Dataframe for testing

We always need dataframes for testing and analysing normally, if we do not have data ready. Here is how you can use Pandas API to generate different types of data.

Advertisement

Ryan J. Gallagher
@ryanjgallag

Tired of word clouds? Want to do better sentiment analysis? Not sure how to look at the words underneath your measures?

Our long overdue paper on generalized word shift graphs is finally here!
https://t.co/lIBXvbMJWX
https://t.co/vSL1REYT8V

So what are they?

1/n

If we have two texts, there are many ways we can compare them. Weighted averages are a particularly useful measure because they're flexible and interpretable

Proportions, Shannon entropy, the KLD, the JSD, and dictionary methods can all be written as weighted averages

2/n

But weighted avgs are also slippery. When we try to compress complex phenomena like happiness, surprise, divergence, or diversity into a single number, it can be unclear what we're measuring

If the measure goes up, what does that mean? Why did it do that? Can we trust it?

3/n

Very often, that's the end of the line and we're left with an uneasy feeling in the pit of our stomach that our weighted avg is actually picking up a data artifact or some other unintended peculiarity

Word shift graphs help us address those concerns

4/n

First, word shifts look under the hood of weighted averages to see what's going on

All weighted averages are a sum of contributions from individual words. We can pull out those words, and rank which ones contribute the most to the difference between two texts

5/n

Greg Yang
@TheGregYang

1/ A ∞-wide NN of *any architecture* is a Gaussian process (GP) at init. The NN in fact evolves linearly in function space under SGD, so is a GP at *any time* during training. https://t.co/v1b6kndqCk With Tensor Programs, we can calculate this time-evolving GP w/o trainin any NN

2/ In this gif, narrow relu networks have high probability of initializing near the 0 function (because of relu) and getting stuck. This causes the function distribution to become multi-modal over time. However, for wide relu networks this is not an issue.

3/ This time-evolving GP depends on two kernels: the kernel describing the GP at init, and the kernel describing the linear evolution of this GP. The former is the NNGP kernel, and the latter is the Neural Tangent Kernel (NTK).

4/ Once we have these two kernels, we can derive the GP mean and covariance at any time t via straightforward linear algebra.

5/ So it remains to calculate the NNGP kernel and NT kernel for any given architecture. The first is described in https://t.co/cFWfNC5ALC and in this thread

elvis
@omarsar0

I have always emphasized on the importance of mathematics in machine learning.

Here is a compilation of resources (books, videos & papers) to get you going.

(Note: It's not an exhaustive list but I have carefully curated it based on my experience and observations)

📘 Mathematics for Machine Learning

by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong

https://t.co/zSpp67kJSg

Note: this is probably the place you want to start. Start slowly and work on some examples. Pay close attention to the notation and get comfortable with it.

📘 Pattern Recognition and Machine Learning

by Christopher Bishop

Note: Prior to the book above, this is the book that I used to recommend to get familiar with math-related concepts used in machine learning. A very solid book in my view and it's heavily referenced in academia.

📘 The Elements of Statistical Learning

by Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie

Mote: machine learning deals with data and in turn uncertainty which is what statistics teach. Get comfortable with topics like estimators, statistical significance,...

📘 Probability Theory: The Logic of Science

by E. T. Jaynes

Note: In machine learning, we are interested in building probabilistic models and thus you will come across concepts from probability theory like conditional probability and different probability distributions.

You May Also Like

Max Fagin
@MaxFagin

November is here, and that means a massive shift is coming. And by "massive" I am of course referring to the redefinition of the kilogram unit of mass that the world has been building up to for more than 100 years. Let me explain:

1/ I've had an unhealthy fascination with metrology (the study of measurement) ever since my 2nd year as a physics major when I took a class devoted to duplicating historic physics experiments, so please indulge me for going into heavy detail (get it?) about the kilogram.

2/ So what actually *defines* a unit of measurement? If you're American, you probably know a mile is 5280 feet and a foot is 12 inches and an inch is 2.54 centimeters etc. But where does this chain of definitions end? Is it turtles all the way down?

3/ It's actually not! For all units (even the imperial units used in America) the answers all end with the Système International (SI) unit definitions established and maintained for over 100 years by the Bureau International des Poids et Mesures (BIMP) in France.

4/ At the base of this tower are the SI base units. Just 7 SI base units define every other unit in existence. They are:

Kilogram, kg (mass)
Meter, m (distance)
Second, s (time)
Kelvin, K (temp)
Ampere, A (electric current)
Candela, cd (luminous intensity)
Mole, mol (quantity)

Buzz Chronicles
@buzz_chronicles

@SaveToBookmarks Namaste! You can find it here. Original tweet by @AchuthArora: "@OllieEmberson @buzz_chronicles @rattibha @SaveToBookmarks save as Workouts". Thanks! ✌

Hi! Pleasure is mine. Original tweet by @SaveToBookmarks: "@AchuthArora @OllieEmberson @buzz_chronicles @rattibha Cool mate!

I saved this ...". See You soon! 👌

Advertisement

$Kitze @ \U0001f1e9\U0001f1ea\ufe0f$

Kitze @ 🇩🇪️...
@thekitze

To people who are under the impression that you can get rich quickly by working on an app, here are the stats for https://t.co/az8F12pf02

📈 ~12000 vistis
☑️ 109 transactions
💰 353€ profit (285 after tax)

I have spent 1.5 months on this app. You can make more $ in 2 days.

🤷‍♂️

I'm still happy that I launched a paid app bcs it involved extra work:

- backend for processing payments (+ permissions, webhooks, etc)
- integration with payment processor
- UI for license activation in Electron
- machine activation limit
- autoupdates
- mailgun emails

etc.

These things seemed super scary at first. I always thought it was way too much work and something would break. But I'm glad I persisted. So far the only problem I have is that mailgun is not delivering the license keys to certain domains like https://t.co/6Bqn0FUYXo etc. 👌

omg I just realized that me . com is an Apple domain, of course something wouldn't work with these dicks

$ArmaniTalks \U0001f399\U0001f525$

ArmaniTalks 🎙🔥...
@ArmaniTalks

Your Most PRECIOUS Currencies:

-Energy

-Time

-Attention

-Trust

-Love

-Loyalty

-Respect

Only give these currencies to high value people. Never spend these currencies on low value shitheads eg: trolls, weasels, snakes, naysayers etc.

'Wait, I thought money was the most important currency?'

Nah.

Once you adopt abundance mentality, you realize their is no shortage of money.

Making the dollars your most important currency will have you leading an empty life.

Time to flip your perspective.

👇

▪Energy

When you strip yourself to the core, you are an emotional creature.

Emotions are your internal worlds energy.

Harnessing that energy is crucial for leveraging yourself to obtain whatever you want.

You have a finite amount everyday, so spend it wisely.

▪Time

A second that is lost will never be returned.

You start valuing the hell out of this currency the more you mature.

As the years start adding up, you realize time is precious.

You must always have a scarcity mindset towards time.

Once you do so, you will not be lazy.

▪Attention

You can be here, but not present.

Attention is completely mental.

Giving someone your attention means you are clearing up mental bandwidth to make room for them.

Only give your attention to people who help you grow.

For the negative ones?

Ignore their existence

$Emperor\U0001f451$

Emperor👑
@EmperorBTC

Technical Analysis Masterclass- Part 2

'Entering the trade'

Entering a trade without ascertaining a certain things is gambling.

In this masterclass we will learn the pre-requisites to enter a trade.

DON'T ENTER A TRADE WITHOUT DETERMINNG THE FOLLOWING.

Please share.

We understand what reward to risk (popularly called risk to reward) is.
It will be denoted by R:R.

We will also try to bust a few myths about R:R and how to avoid losing trades.

Before entering a trade, you need to determine 3 things.

1. Entry trigger
2. Stop loss
3. Target

1. Entry trigger = Reasons for entering a trade. There could be multiple reasons or a single reason for entry.

Generally a set of reasons AKA confluence is a higher probability trade and a generally a safer entry.

Example of an entry trigger.

2. Stop Loss.

The price in the opposite direction of the trade where the trade is exited, at a loss.

At this level, the reason for the entry becomes invalidated according to TA and the price can then move in the opposite direction, probabilistically.

3. Target is the possible price level that the asset might touch based on previous trends or confluence AND where a possible reversal could occur.

Target is the next path of least resistance from where the price might reverse.

We will always ONLY use TA to determine all 3.