Speechmatics: How to Make The Most of Data Surplus


Have you been wondering what to do with all your data? For most of the history of machine learning, data has been a precious commodity.

By necessity, the field has had to spend time developing techniques that made optimal use of small amounts of data. Of late, however, large amounts of data are becoming increasingly available.

On a global scale, there are commentators talking about data following Moore's law, with the amount of raw data doubling roughly every two years. This is great, right? With the explosion in the use of deep learning, which is even more data hungry than more traditional machine learning methods, more data will help us learn better, more nuanced models! Well yes, but only up to a point.