Kumar_praveen96 Authors Simone Scardapane

7 days 30 days All time Recent Popular
*Reproducible Deep Learning*

The first two exercises are out!

We start quick and easily, with some simple manipulation on Git branches, scripting, audio classification, and configuration with @Hydra_Framework.

Small thread with all information 🙃 /n


Reproducibility is associated to production environments and MLOps, but it is a major concern today also in the research community.

My biased introduction to the issue is here:
https://t.co/PqWH6uL5eT


The local setup is on the repository: https://t.co/9mhtZoJhE9

The use case for the course is a small audio classification model trained on event detection with the awesome @PyTorchLightnin library.

Feel free to check the notebook if you are unfamiliar with the task. /n

I spent some time understanding how to make the course as modular and "reproducible" as possible.

My solution is to split each exercise into a separate Git branch containing all the instructions, and a separate branch with the solution.

Two branches for now (Git and Hydra). /n


How well do you *really* know Git? The more I learn, the more I find it incredible.

I summarized most of the information on a separate set of slides: https://t.co/6dSmK3IfWB

Be sure to check them out before continuing! /n
*Reproducible deep learning*
Lectures 3 and 4 are out!

With code versioning out of the way, it is time to look at data versioning (@DVCorg) and environment isolation (@Docker).

All information in a small thread. 👇 /n


If you know Git, you (almost) know @DVCorg!

A fantastic tool to secure your data in a number of remotes, or to create "data repositories" from which to immediately get folders and artifacts.

My intro to DVC:
https://t.co/2m3cXGAPN6

/n


For the course, I created a simple exercise tasking you with initializing DVC on the repository, and syncing the data locally and remotely.

To simulate an S3-like interface, we use a small https://t.co/91bFj7KSPG server and boto3.

Code: https://t.co/KDSX80aqJs

/n


Next up, it is time to "dockerize" your environment!

Docker has become an almost de-facto standard, and knowing it is practically indispensable today.

A very quick introduction, glossing over a number of details: https://t.co/XSrUZNhd3g

/n


In the corresponding exercise, you will learn about creating a working environment in Docker, packaging the entire training loop, and pushing/pulling an image from the Hub.

Code is here: