Vladimir Haltakov's Threads

AI Job Interviews - another good example of bias in ML 🤦‍♂️

Two journalists tested some AI tools for assessing job candidates. Even when the candidate read a Wiki article in German instead of answering questions in English, the AI systems gave them good scores 🤷‍♂️

Let's unpack 👇

The Setup 🔬

The journalists created a fake job posting on two AI interview platforms. They specified the traits of the ideal candidate and provided the questions that need to be answered during the interview.

Then they started experimenting... 👇

The Positive Test ✅

One of them did a fake interview giving all the right answers and predictably got very high scores - 8.5 out of 9 👍

Then she tried something different... 👇

The Negative Test ❌

In a second interview, instead of answering the questions in English, she just read the article on psychometrics from the German Wikipedia 😁

One system gave her a score of 6 out of 9, while the other determined she is a 73% match for the job.

Oops... 👇

What happened? 🔍

Interestingly, one of the systems generated a transcript which was obviously meaningless.

This means that the machine learning model behind the tool likely captured nuances of the intonation of the speaker instead of the meaning of the actual words.

👇

Vladimir Haltakov
@haltakov

Machine Learning Paper Reviews 🔎📜

Check out this thread for short reviews of some interesting Machine Learning and Computer Vision papers. I explain the basic ideas and main takeaways of each paper in a Twitter thread.

👇 I'm adding new reviews all the time! 👇

AlexNet - the paper that started the deep learning revolution in Computer Vision!

It's finally time for some paper review! \U0001f4dc\U0001f50d\U0001f9d0

I promised the other day to start posting threads with summaries of papers that had a big impact on the field of ML and CV.

Here is the first one - the AlexNet paper! pic.twitter.com/QNLPIMZSIa
— Vladimir Haltakov (@haltakov) September 28, 2020

DenseNet - reducing the size and complexity of CNNs by adding dense connections between layers.

ML paper review time - DenseNet! \U0001f578\ufe0f

This paper won the Best Paper Award at the 2017 Conference on Computer Vision and Pattern Recognition (CVPR) - the best conference for computer vision problems.

It introduces a new CNN architecture where the layers are densely connected. pic.twitter.com/DuHytaoXia
— Vladimir Haltakov (@haltakov) October 15, 2020

Playing for data - generating synthetic GT from a video game (GTA V) and using it to improving semantic segmentation models.

Time for another ML paper review - generating synthetic ground truth data from video games! \U0001f3ae

I love this paper, because it pushes the boundaries of creating realistic synthetic ground truth data and shows that you can use it for training and improve your model.

Details \U0001f447 pic.twitter.com/fBgORYG8Lz
— Vladimir Haltakov (@haltakov) October 5, 2020

Transformers for image recognition - a new paper with the potential to replace convolutions with a transformer.

Another paper review, but a little different this time... \U0001f937\u200d\u2642\ufe0f

The paper is not published yet, but is submitted for review at ICLR 2021. It is getting a lot of attention from the CV/ML community, though, and many speculate that it is the end of CNNs... \U0001f447https://t.co/bh6wUxYfxu pic.twitter.com/dZGBYB8A5U
— Vladimir Haltakov (@haltakov) October 5, 2020

Vladimir Haltakov
@haltakov

Let's talk about a common problem in ML - imbalanced data ⚖️

Imagine we want to detect all pixels belonging to a traffic light from a self-driving car's camera. We train a model with 99.88% performance. Pretty cool, right?

Actually, this model is useless ❌

Let me explain 👇

The problem is the data is severely imbalanced - the ratio between traffic light pixels and background pixels is 800:1.

If we don't take any measures, our model will learn to classify each pixel as background giving us 99.88% accuracy. But it's useless!

What can we do? 👇

Let me tell you about 3 ways of dealing with imbalanced data:

▪️ Choose the right evaluation metric
▪️ Undersampling your dataset
▪️ Oversampling your dataset
▪️ Adapting the loss

Let's dive in 👇

1️⃣ Evaluation metrics

Looking at the overall accuracy is a very bad idea when dealing with imbalanced data. There are other measures that are much better suited:
▪️ Precision
▪️ Recall
▪️ F1 score

I wrote a whole thread on

How to evaluate your ML model? \U0001f4cf

Your accuracy is 97%, so this is pretty good, right? Right? No! \u274c

Just looking at the model accuracy is not enough. Let me tell you about some other metrics:
\u25aa\ufe0f Recall
\u25aa\ufe0f Precision
\u25aa\ufe0f F1 score
\u25aa\ufe0f Confusion matrix

Let's start \U0001f447
— Vladimir Haltakov (@haltakov) August 31, 2021

2️⃣ Undersampling

The idea is to throw away samples of the overrepresented classes.

One way to do this is to randomly throw away samples. However, ideally, we want to make sure we are only throwing away samples that look similar.

Here is a strategy to achieve that 👇