The interview with our data scientist Christian to the discovery of the world of Artificial Intelligence and Machine Learning has now arrived at its last episode! In the previous posts of this series, we discussed two different approaches to teaching a device how to carry out a task.
Exactly! On the one hand, we have seen how the more general Artificial Intelligence teaches a machine to “take decisions” on the basis of rules that were hard-coded into it in advance, so that the machine itself has no room for any own initiative, but has “just” to follow mechanically what was instructed to do. In this scenario, the prominent example is that of Deep Blue, the computer that, in 1997, outperformed the by-then World Champion in playing chess.
On the other hand, we have discovered how a device can undergo an actual process of learning during which, starting from some inputs (images, for instance), the machine gradually recognizes and learns the “rules” by itself without being explicitly told what they are. This approach is what is called Machine Learning (ML) and it is nowadays the standard for tasks like computer vision.
At the end of the last post you stressed that, though the human intervention enters to a lesser extent within Machine Learning, its development took actually longer than the brute-force approach at the basis of the Deep Blue. Why? Which issues slowed down ML, in spite of the fact that, compared to the brute-force way, programmers have to code less given that, apparently, a larger part of the job is left to the machine itself?
To answer this question, let’s go back to the metaphor of the previous post, where we described the software architecture as a dashboard and the parameters as a series of knobs that serve to manage the training process. In this case, in order to enable any machine to find the optimal setup of the knobs on the dashboard it is given, one has first to engineer and build the dashboard itself. Out of the metaphor, one of the reasons why the development of image processing took longer is simply because it took years of work on the part of researchers to find the mathematical framework – the architecture, as it was referred to before – that could properly handle pictures. Only afterwards it was possible to tell the computer: “Ok, all the knobs are finally there and their overall arrangement on the dashboard is the right one: now, go ahead and turn each knob appropriately so that, together, they can allow you to carry out the classification task in the optimal way”. For the record, when it comes to image processing, model parameters are usually organized into a Convolutional Neural Network (CNN): different domains of application of Machine Learning – including, among many others, speech recognition, financial market analysis, medical diagnosis and telecommunication – usually ask for different architectures, each one still provided with parameters whose optimal setup keeps on being determined following the same learning logic described above.
You did mention Convolution Neural Networks. Can you give us some more details?
Back to why image processing took longer than Deep Blue playing chess, it is worth stressing that parameters within a CNN are typically millions and their interactions are highly complex. In order to carry out the training process, it is mandatory to have a very large amount of input data at disposal – pictures in digital form, in the present case. Only the massive digitization of recent years has made the needed heaps of data available to machines; previously, data were not enough or, anyway, they were not easily readable by computers (think of analogical images, for example).
I would imagine that the development of computer vision was initially slowed down also by some technological deficiencies. Can you spend a few words on this?
You are right! In fact, besides producing plenty of data in digital form, one also needs facilities to store them (i.e., lots of computer memory/storage room) and tools to rapidly retrieve said data inside these facilities (otherwise the training would be too time-consuming). The need for computer memory is made even bigger by the extremely large number of parameters whose values have to be updated and eventually stored for later use. Moreover, due to some rather technical aspects, the updating process itself could hardly fit the architecture of a Central Processing Unit (CPU), that is, the most widely used processor within the scientific community till some years ago. Overcoming all these technical problems has become possible only thanks to recent technological advances in computer as well as software architecture. For example, CNNs are not implemented on CPUs any more but on Graphics Processing Units (GPUs): the latter were initially developed for graphics applications (videogames, most notably) but it later turned out that their technical features also suit the process of parameter update within CNNs perfectly.
Thanks Christian, this interview has given us a clear picture on a topic that is not easy to understand as it is often described in very technical terms. Thanks for the practical examples and metaphors you have provided us, it’s all clearer now!