BC DS

10 machine learning questions you should be able to answer.

I started experimenting with a questionnaire before interviewing candidates. Applicants submit a video or written answers to apply for the position.

Thread: Let's look at 10 of these questions.

The questionnaire solves a couple of problems:

1. Weeds out candidates that aren't ready for the job.

2. It significantly cuts down the time we spend talking synchronously.

One problem: It requires extra time from the candidate before the phone interview.
Here are some numbers on a job posting before using the questionnaire:

• 249 applicants
• 13 good applications

That's only 5%.

This sucks.
Numbers after using the questionnaire:

• 44 applicants
• 21 good applications

That's 47%.

This is much better.
I look at it this way:

Sooner or later, I'll ask these questions, and the candidate will have to answer them.

I'd rather give them space and the opportunity to think about their answers without any pressure.

Those who can't or don't want to answer won't apply. That's fine.
I'm not worried about candidates searching for answers or asking someone else.

They will fool the questionnaire, not the interview that will come later.

Remember:

• The goal of the questionnaire and the resume is just to decide who moves to an interview.
Questions must be open-ended. No gotchas.

I want to allow candidates to think creatively and elaborate on their answers.

Preferred: A video response with the answers. Fallback is written answers if the candidate is not comfortable with video.
There are 10 questions on the questionnaire.

These cover basic principles that we want to ensure candidates understand.

Every candidate receives the same questions.

Let's go over them:
1. Provide a couple of examples where you used Supervised and Unsupervised machine learning techniques.

2. Walk us through the process you follow to select the best machine learning algorithm to solve a problem.

3. Explain the trade-off between bias and variance.
4. How do you decide how you should split your dataset when working in a supervised machine learning problem?

5. How do you handle imbalanced datasets in a classification problem?

6. Explain how you would approach reducing overfitting on a model.
7. Explain three different problems where using accuracy, precision, and recall is the best metric, respectively.

8. What would you expect to see as you vary the batch size when training a neural network?
9. What would you expect to see as you vary the learning rate when training a neural network?

10. What are the advantages of a Convolutional Neural Network over a fully connected network for image classification?
I post threads like this every week. You can find them here: @svpino.

Follow along for a good bunch of practical tips and epic stories about my experience with machine learning.

More from Santiago

You gotta think about this one carefully!

Imagine you go to the doctor and get tested for a rare disease (only 1 in 10,000 people get it.)

The test is 99% effective in detecting both sick and healthy people.

Your test comes back positive.

Are you really sick? Explain below 👇

The most complete answer from every reply so far is from Dr. Lena. Thanks for taking the time and going through


You can get the answer using Bayes' theorem, but let's try to come up with it in a different —maybe more intuitive— way.

👇


Here is what we know:

- Out of 10,000 people, 1 is sick
- Out of 100 sick people, 99 test positive
- Out of 100 healthy people, 99 test negative

Assuming 1 million people take the test (including you):

- 100 of them are sick
- 999,900 of them are healthy

👇

Let's now test both groups, starting with the 100 people sick:

▫️ 99 of them will be diagnosed (correctly) as sick (99%)

▫️ 1 of them is going to be diagnosed (incorrectly) as healthy (1%)

👇
10 machine learning YouTube videos.

On libraries, algorithms, and tools.

(If you want to start with machine learning, having a comprehensive set of hands-on tutorials you can always refer to is fundamental.)

🧵👇

1⃣ Notebooks are a fantastic way to code, experiment, and communicate your results.

Take a look at @CoreyMSchafer's fantastic 30-minute tutorial on Jupyter Notebooks.

https://t.co/HqE9yt8TkB


2⃣ The Pandas library is the gold-standard to manipulate structured data.

Check out @joejamesusa's "Pandas Tutorial. Intro to DataFrames."

https://t.co/aOLh0dcGF5


3⃣ Data visualization is key for anyone practicing machine learning.

Check out @blondiebytes's "Learn Matplotlib in 6 minutes" tutorial.

https://t.co/QxjsODI1HB


4⃣ Another trendy data visualization library is Seaborn.

@NewThinkTank put together "Seaborn Tutorial 2020," which I highly recommend.

https://t.co/eAU5NBucbm

More from Ds

1/

Get a cup of coffee.

In this thread, I'll walk you through 2 probability concepts: Standard Deviation (SD) and Mean Absolute Deviation (MAD).

This will give you insight into Fat Tails -- which are super useful in investing and in many other fields.


2/

Recently, I watched 2 probability "mini-lectures" on YouTube by Nassim Taleb.

One ~10 min lecture covered SD and MAD. The other ~6 min lecture covered Fat Tails.

In these ~16 mins, @nntaleb shared so many useful nuggets that I had to write this thread to unpack them.

3/

For those curious, here are the YouTube links to the lectures:

SD and MAD (~10 min):
https://t.co/0TwubymdE6

Fat Tails (~6 min):

4/

The first thing to understand is the concept of a Random Variable.

In essence, a Random Variable is a number that depends on a random event.

For example, when we roll a die, we get a Random Variable -- a number from the set {1, 2, 3, 4, 5, 6}.

5/

Every Random Variable has a Probability Distribution.

This tells us all the possible values the Random Variable can take, and their respective probabilities.

For example, when we roll a fair die, we get a Random Variable with this Probability Distribution:

You May Also Like