Image from Sharon McCutcheon z Pexels

The dream of democratising ML never seemed closer! Tutorial on how to get labelled data for your data hungry models.

Maciej Zalwert

--

Note: snorkel offers labelling functions and models that can be easily replaced by a few python lines of code. The Main advantage of using snorkel is optimisation and simple and flexible interface.

To come up with a nicely tuned and somewhat generalisable models we need tons of labelled data — that is very expensive and time-consuming. This led to a situation that for many years only large corporations, governments or research institutes had a possibility to get enough data to train ML models. Fortunately, in some domains there exist public datasets that contributed to groundbreaking advancements in ML, but they are usually useful only for an academic research and basic business applications.

These times are over.

In this tutorial I will show how you can create labelled data for your supervised model with a little effort.

In 2016 at Stanford started a project called Snorkel. They did a simple technical bet: that it would increasingly be the training data, not the models, algorithms, or infrastructure, that decided whether a machine learning project succeeded or failed.

Currently, the project is a complete success and widely used by many…

--

--

Maciej Zalwert
Maciej Zalwert

Written by Maciej Zalwert

Experienced in building data-intensive solutions for diverse industries