How do machine learning algorithms differ from traditional algorithms?
In this ask the expert, ParallelM CTO Nisha Talagala lays out the similarities and differences between traditional software engineering and machine learning.
How is machine learning like and unlike software engineering? It's a question that seems to be growing in popularity these days.
Perhaps that's because the bones of machine learning algorithms and traditional algorithms are the same -- they're both code. That's one of the points Nisha Talagala made when we posed the question to her. Talagala is the CTO and vice president of engineering at ParallelM Inc., a software startup that builds enterprise software for operationalizing machine learning and data science. Prior to joining ParallelM, she was a fellow at SanDisk, a flash memory product manufacturer; a fellow at Fusion-io, a flash memory technology company; and a technology lead for server flash at Intel. She holds more than four dozen patents.
In this ask the expert, which has been edited for brevity and clarity, Talagala describes how machine learning and software engineering are a similar branch of knowledge. She also explains how machine learning algorithms are pushing beyond the constraints of software engineering and posing new challenges for the enterprise.
From your vantage point as a software development expert, what do you see as the key similarities and differences between machine learning algorithms and traditional algorithms?
Nisha Talagala: At the most basic level, machine learning programs are code. So, they're code written in Python or Java or some programming language. And many people have chosen to put their code in source control repositories like Git. So, at that level, it is similar.
Additionally, they share some stages of code development. For example, in typical code development, there's a development situation, and then you've got a QA-like situation, then you have some preproduction staging and then you have production.
We are seeing customers apply similar staging to machine learning as well, where you have some experimental development, followed by some tests, followed by early stage sandbox-like trials, followed by full-scale production.
For those kinds of things, at a structural level, they are similar. But then I think the similarities end about there. Everything else is quite different. I'll give some examples.
One of the fundamental differences is that machine learning can have a range of outcomes that are all valid but cannot necessarily be determined upfront. For example, if you ask a transactional system for a bank record, there is only one correct answer to that question. Either you're going to get the question right or wrong. But if you ask a machine learning system to predict what you should buy, there are multiple answers to that question -- some of them are reasonable and some of them are not.
The machine learning algorithm behavior is determined by what it learned during its training cycle and then how it compares to what it's seeing in real life -- in production. That kind of characteristic is very different from most common algorithms, and it requires companies be able to assess model performance in ways that are unique to machine learning algorithms.
For example, it's possible for a machine learning program running in production to display no errors at all, have completely healthy behavior from any kind of metric like CPU utilization but still be turning out bad predictions. That's a very basic example for how machine learning is different and how those differences manifest in production.
A second fundamental challenge that's unique to machine learning algorithms is that, within the code of the algorithm itself, it is the expression of the algorithm or a mathematical function, and frequently the only people who understand it and understand its behavior are the data scientists.
So, in order to put machine learning into production and manage it effectively, you need collaboration between data scientists, data engineering and operations. Examples of this exist in other fields. In database administration, for example, [database administrators] exist to combine the expertise of databases with the expertise of production. Similarly, you have practical issues in machine learning where you need to combine the expertise of data scientists with the expertise of operations.
Another challenge is that because machine learning algorithms are increasingly starting to impact the lives of everyday individuals -- on everything from healthcare recommendations to showing you what to buy to deciding your credit score -- people increasingly want to understand how the algorithms behave. And how the algorithms behave is something that, in production, you have to be able to control because it's a complex function of what it learns, what data set it used, what data scientist decisions were made to tune it and so on.
What we're seeing emerge is a series of compliance and regulatory challenges around these algorithms that are unique to machine learning, but that businesses putting them into production have to be aware of so that they can manage their machine learning optimally and safeguard against risk.