Here's how I got started with my first machine learning project and you can too.
Let's take a look.

(This thread will take from zero to being a hero in machine learning, trust me)

(1 / 22)
πŸ§΅πŸ‘‡

Getting started with your first machine learning project might actually much easier than it seems, if I can do it, certainly anyone can.

I did not use:
- Any Math
- An expensive computer
- Complex programming concepts

(2 / 22)
Here's what I did use:

- A free GPU on Google Colab
- Python
- TensorFlow
- Numpy
- Pandas
- Kaggle
- Scikit-Learn
- Google
- StackOverFlow

(3 / 22)
This project was actually a Kaggle challenge based on the MNIST dataset which is a collection images of 70,000 hand written digits.

You can find the dataset hereπŸ‘‡
πŸ”—//kaggle.com/c/digit-recognizer

(4 / 22)
Before we go over the code of this project, it is highly reccomended that you complete this free course on YouTubeπŸ‘‡

Machine Learning foundations course
πŸ”—//youtu.be/_Z9TRANg4c0

CodeπŸ‘‡
πŸ”—//colab.research.google.com/github/PrasoonPratham/Kaggle/blob/main/MNIST.ipynb

(5 / 22)
Now let's look at the code.

We'll first download the dataset for this project using the kaggle API for Python.
Keep in mind that you'll have to provide an API key so that this code works.

(6 / 22)
There are some issues with the names of the files, so we'll rename and then unzip them using the zip library.

(7 / 22)
We'll end with 3 files, we can discard the sample_submission.csv as we won't need it. test.csv and train.csv is what we are interested in.

The train.csv will be used for training our neural and test. csv will be used for making predictions.
(8 / 22)
The prediction will be sent to kaggle.

Using pandas we can load both of them as dataframes, which basically converts .csv file data(excel like data) python arrays so that we can put them in our neural network.
We'll also import TensorFlow and Numpy while we are here.

(9 / 22)
Let's look at the data, train.csv is what we are interested in.

The training data set, (train.csv), has 785 columns. The first column, called "label", is the digit that was drawn by the user. The rest of the columns contain the pixel-values of the associated image

(10 / 22)
Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker.

(11 / 22)
This pixel-value is an integer between 0 and 255, inclusive. We will pass the pixel values in our neural net and exclude the label, we don't want it to know what number is in the image! It'll have to learn that on its own.

(12 / 22)
This code "drops" the label column and stores it in the Y_train variable, we will also divide each pixel value by 255 to make it a value between 0-1 as neural networks perform better with these values and now we "reshape" the values which go into our neural net.

(13 / 22)
Remember how I said, we'll only use the train.csv for training our neural net? We'll split the train.csv into 2 parts, one for actual training and the other for validating how well our neural net did at the end of each iteration.This boosts the accuracy of our model.

(14 / 22)
Note that here I have chosen 10% of our dataset for validation and the rest for training, you can experiment with the values yourself if you wish to do so.

(15 / 22)
Here comes the fun part, we'll now define our neural network. The images will pass through these and our model will be trained. We'll be using this thing called a "convolution" and "pooling".

(16 / 22)
A convolution in simple terms is like applying a filter (like you do on instagram) to a photo, this increases some of the details in the images and helps improve our neural network's accuracy.

(17 / 22)
Pooling does a similar thing by taking the most prominent pixel in an area and throwing out the others.

I found this amazing thread on convolutions by @aumbark which you should definately check outπŸ‘‡

https://t.co/CnsQvMpHZE

(18 / 22)
Now we simply pass our data 15 times (aka epochs) through our neural network and validate it each time using the validation data we made earlier.

(19 / 22)
You'll notice that we get certain metrics in the output, at the end you should see a loss and accuracy similar to the one in the photo.

(20 / 22)
Congrats!πŸ₯³ You've trained the neural network, now can make the predictions on the test data and store them in a csv file which we'll submit to kaggle. (You can use the file icon in the left to browse through the files)

(21 / 22)
You've learnt a lot by this point, be proud of yourself and dive deeper into machine learning, good luck! πŸ™Œ

(22 / 22 πŸŽ‰)

More from Pratham Prasoon πŸš€

More from Machine learning

Really enjoyed digging into recent innovations in the football analytics industry.

>10 hours of interviews for this w/ a dozen or so of top firms in the game. Really grateful to everyone who gave up time & insights, even those that didnt make final cut πŸ™‡β€β™‚οΈ https://t.co/9YOSrl8TdN


For avoidance of doubt, leading tracking analytics firms are now well beyond voronoi diagrams, using more granular measures to assess control and value of space.

This @JaviOnData & @LukeBornn paper from 2018 referenced in the piece demonstrates one method
https://t.co/Hx8XTUMpJ5


Bit of this that I nerded out on the most is "ghosting" β€” technique used by @counterattack9 & co @stats_insights, among others.

Deep learning models predict how specific players β€” operating w/in specific setups β€” will move & execute actions. A paper here: https://t.co/9qrKvJ70EN


So many use-cases:
1/ Quickly & automatically spot situations where opponent's defence is abnormally vulnerable. Drill those to death in training.
2/ Swap target player B in for current player A, and simulate. How does target player strengthen/weaken team? In specific situations?

You May Also Like

So the cryptocurrency industry has basically two products, one which is relatively benign and doesn't have product market fit, and one which is malignant and does. The industry has a weird superposition of understanding this fact and (strategically?) not understanding it.


The benign product is sovereign programmable money, which is historically a niche interest of folks with a relatively clustered set of beliefs about the state, the literary merit of Snow Crash, and the utility of gold to the modern economy.

This product has narrow appeal and, accordingly, is worth about as much as everything else on a 486 sitting in someone's basement is worth.

The other product is investment scams, which have approximately the best product market fit of anything produced by humans. In no age, in no country, in no city, at no level of sophistication do people consistently say "Actually I would prefer not to get money for nothing."

This product needs the exchanges like they need oxygen, because the value of it is directly tied to having payment rails to move real currency into the ecosystem and some jurisdictional and regulatory legerdemain to stay one step ahead of the banhammer.
@franciscodeasis https://t.co/OuQaBRFPu7
Unfortunately the "This work includes the identification of viral sequences in bat samples, and has resulted in the isolation of three bat SARS-related coronaviruses that are now used as reagents to test therapeutics and vaccines." were BEFORE the


chimeric infectious clone grants were there.https://t.co/DAArwFkz6v is in 2017, Rs4231.
https://t.co/UgXygDjYbW is in 2016, RsSHC014 and RsWIV16.
https://t.co/krO69CsJ94 is in 2013, RsWIV1. notice that this is before the beginning of the project

starting in 2016. Also remember that they told about only 3 isolates/live viruses. RsSHC014 is a live infectious clone that is just as alive as those other "Isolates".

P.D. somehow is able to use funds that he have yet recieved yet, and send results and sequences from late 2019 back in time into 2015,2013 and 2016!

https://t.co/4wC7k1Lh54 Ref 3: Why ALL your pangolin samples were PCR negative? to avoid deep sequencing and accidentally reveal Paguma Larvata and Oryctolagus Cuniculus?
I'm going to do two history threads on Ethiopia, one on its ancient history, one on its modern story (1800 to today). πŸ‡ͺπŸ‡Ή

I'll begin with the ancient history ... and it goes way back. Because modern humans - and before that, the ancestors of humans - almost certainly originated in Ethiopia. πŸ‡ͺπŸ‡Ή (sub-thread):


The first likely historical reference to Ethiopia is ancient Egyptian records of trade expeditions to the "Land of Punt" in search of gold, ebony, ivory, incense, and wild animals, starting in c 2500 BC πŸ‡ͺπŸ‡Ή


Ethiopians themselves believe that the Queen of Sheba, who visited Israel's King Solomon in the Bible (c 950 BC), came from Ethiopia (not Yemen, as others believe). Here she is meeting Solomon in a stain-glassed window in Addis Ababa's Holy Trinity Church. πŸ‡ͺπŸ‡Ή


References to the Queen of Sheba are everywhere in Ethiopia. The national airline's frequent flier miles are even called "ShebaMiles". πŸ‡ͺπŸ‡Ή