11 key concepts of Machine Learning.

β€” Supervised Learning Edition β€”

πŸ§΅πŸ‘‡

😜

Before starting, remember that, if you follow me, one of your enemies will be immediately destroyed (and you'll get to read more of these threads, of course.)

And if you don't follow me, well, you just hurt my feelings.

😜
1. Labels

(Also referred to as "y")

The label is the piece of information that we are predicting.

For example:

- the animal that's shown in a picture
- the price of a house
- whether a message is spam or not

πŸ‘‡
2. Features

(Also referred to as "x")

These are the input variables to our problem. We use these features to predict the "label."

For example:

- pixels of a picture
- number of bedrooms of a house
- square footage of a house

πŸ‘‡
3. Samples

(This is also known as "examples.")

A sample is a particular instance of data (features or "x.") It could be "labeled" or "unlabeled."

πŸ‘‡
4. Labeled sample

Labeled samples are used to train and validate the model. These are usually represented as (x, y), where "x" is a vector containing all the features, and "y" is the corresponding label.

For example, a labeled sample could be:

([3, 2, 1500], 350000)
5. Unlabeled sample

Unlabeled samples contain features, but they don't contain the label: (x, ?)

We usually use a model to predict the labels of unlabeled samples.

πŸ‘‡
6. Model

A model defines the relationship between features and the label.

You can think of a model as a set of rules that, given certain features, determines the corresponding label.

For example, given the # of bedrooms, bathrooms, and square footage, we get the price.

πŸ‘‡
7. Training

Training is a process that builds a model.

We show the model labeled samples during training and allow the model to gradually learn the relationships between features and the label.

πŸ‘‡
8. Validation

Validation is the process that lets us know whether a model is any good.

Usually, we run a set of (unseen) labeled samples through a model to ensure that it can predict the labels.

πŸ‘‡
9. Inference

Inference is the process of applying a trained model to unlabeled samples to obtain the corresponding labels.

In other words, "inference" is the process of making predictions using a model.

πŸ‘‡
10. Regression

A regression model predicts continuous values, for example:

- the value of a house
- the price of a stock
- tomorrow's temperature

πŸ‘‡
11. Classification

A classification model predicts discrete values, for example:

- the picture is showing a dog or a cat
- the message is spam or not
- the forecast is sunny or overcast

More from Santiago

More from Machine learning

Starting a new project using #Angular? Here is a list of all the stuff i use to launch my projects the fastest i can.

A THREAD πŸ‘‡

Have you heard about Monorepo? I created one with all my Angular (and Nest) projects using
https://t.co/aY5llDtXg8.

I can share A LOT of code with it. Ex: Everytime i start a new project, i just need to import an Auth lib, that i created, and all Auth related stuff is set up.

Everyone in the Angular community knows about https://t.co/kDnunQZnxE. It's not the most beautiful component library out there, but it's good and easy to work with.

There's a bunch of state management solutions for Angular, but https://t.co/RJwpn74Qev is by far my favorite.

There's a lot of boilerplate, but you can solve this with the built-in schematics and/or with your own schematics

Are you not using custom schematics yet? Take a look at this:

https://t.co/iLrIaHVafm
https://t.co/3382Tn2k7C

You can automate all the boilerplate with hundreds of files associates with creating a new feature.

You May Also Like

Recently, the @CNIL issued a decision regarding the GDPR compliance of an unknown French adtech company named "Vectaury". It may seem like small fry, but the decision has potential wide-ranging impacts for Google, the IAB framework, and today's adtech. It's thread time! πŸ‘‡

It's all in French, but if you're up for it you can read:
β€’ Their blog post (lacks the most interesting details):
https://t.co/PHkDcOT1hy
β€’ Their high-level legal decision: https://t.co/hwpiEvjodt
β€’ The full notification: https://t.co/QQB7rfynha

I've read it so you needn't!

Vectaury was collecting geolocation data in order to create profiles (eg. people who often go to this or that type of shop) so as to power ad targeting. They operate through embedded SDKs and ad bidding, making them invisible to users.

The @CNIL notes that profiling based off of geolocation presents particular risks since it reveals people's movements and habits. As risky, the processing requires consent β€” this will be the heart of their assessment.

Interesting point: they justify the decision in part because of how many people COULD be targeted in this way (rather than how many have β€” though they note that too). Because it's on a phone, and many have phones, it is considered large-scale processing no matter what.