Data labelling for machine learning brings its own set of challenges and misconceptions (see our previous blog in the series). We needed a better approach to labelling data that values human expertise and manages costs: a machine teaching approach. In this blog, we discuss collaborative processes and tools that can enhance the machine teaching role, with a focus on bias detection and building trust in machine learning models.

A collaborative process from the start

As a data collection and annotation team, we realized that early identification of bias in datasets was central to machine teaching. Detecting bias is difficult, and can be hard to quantify, yet…

The language often used to describe the activity of building datasets for supervised or semi-supervised machine learning can be somewhat reductionist in that data is simply “labelled” (i.e., a feature input is matched to a label output). Instead, we argue that to label data for the purposes of machine learning is to do much more than simply annotate and assign labels. Reframing the language around data labelling matters. The “small” vocabulary change from labelling to teaching has a significant impact towards changing systems that currently reduce the impact and visibility of humans who teach machine learning models.

As introduced in…

For the past two years, I was part of the team responsible for building, expanding and improving datasets for supervised or semi-supervised learning at Element AI. Supervised and semi-supervised learning require labelled data examples to train models, since these techniques first rely on learning from labelled data, then making predictions on unlabelled data. Labelled data is where humans have annotated/assigned a label to data points so that a feature input has been matched to a label output. We use supervised or semi-supervised techniques when we already know (some or all of) the target values that we want a model to…

Ross Young

Plant enthusiast and data strategist

