It has already been three years (wow, time really flies!) since U-Hopper started to offer consulting services focused on data science and machine learning (ML). I had the chance of working at various projects in different vertical domains and I tried to collect a few – and in my opinion key – insights that should always be considered when tackling such a topic.
Obviously machine learning is not trivial but, if approached and handled with the necessary precautions, it allows us to create value out of data (which is by the way the core mission of U-Hopper) while simplifying and automating processes in various domains. A couple of examples taken by my own experience? Estimating the number of people in a particular and well-defined environment and providing predictive maintenance have been for sure two of the most challenging and captivating use cases.
However, Machine Learning is not for all! There are situations in which it cannot help. Not because it is the wrong solution to use, but just because the right conditions for making it work properly are missing. I can see now a question mark over your heads, claiming “So, when should machine learning be applied”? The answer is simple and should never be forgotten:
Only when there is enough meaningful data available.
But let’s try to be clearer.
The training of a machine learning model requires access to a big-enough amount of data – and unfortunately, this data is not always easy to get! When available, such data makes sure that the rules necessary for identifying a particular situation do not need to be defined and set by engineers or, more in general, by domain specialists. In fact, it is sufficient for data scientists to correctly set all the parameters of that extremely complicated mathematical formula that – luckily – no one needs to even know (I don’t want here to dive into the technicalities, but if you are interested to discover how ML works, you cannot miss an article by written by our data scientist Christian !).
Yes, the data science work is definitely complex. But I am getting more and more convinced that the most difficult step to achieve when organising and carrying on a project is actually making sure that prospects and clients truly understand how important data availability is. Data that, let’s not give this for granted, should describe in details the use case we are trying to cover.
An example I often propose is the following: if we are trying to take advantage of machine learning for predicting possible malfunctions in a production plant, having a lot of data collected while the process is running smoothly is not going to help much. The model will definitely need data describing the malfunctioning cases – how else is it going to be able to identify and predict such cases?
This is the reason why there are situations in which machine learning could – and probably should – be used and others that definitely require the application of an alternative solution.
To make your life easier, here’s when a machine learning solution should NOT be applied:
In all such cases, the accuracy offered by the trained model is not going to be acceptable. Actually, the proposed result is probably going to resemble a random choice – definitely not a good solution. The smartest choice would be to apply right away a solution alternative to machine learning; it is going to save not only time but also – and especially – money. Software engineers and domain experts should be able to easily come up with everything needed for setting up a rule engine that is going to work well enough.
On the contrary, if there is enough data describing in detail the use case we want to cover, we are in the perfect place to apply machine learning. Some patience is going to be required – we may call it also hammering – but once ready the solution is for sure going to automatise and greatly simplify everyday work.
Because yes, the objective of machine learning is providing tools able to predict and anticipate issues while supporting decision making. So if you have the chance, give it a try while remembering and taking into consideration the key importance that data is going to have on the end result.