Deep learning's role in the evolution of machine learning
Machine learning has continued to evolve since its beginnings some seven decades ago. Learn how deep learning has catalyzed a new phase in the evolution of machine learning.
Machine learning had a rich history long before deep learning reached fever pitch. Researchers and vendors were using machine learning algorithms to develop a variety of models for improving statistics, recognizing speech, predicting risk and other applications.
While many of the machine learning algorithms developed over the decades are still in use today, deep learning -- a form of machine learning based on multilayered neural networks -- catalyzed a renewed interest in AI and inspired the development of better tools, processes and infrastructure for all types of machine learning.
Here, we trace the significance of deep learning in the evolution of machine learning, as interpreted by people active in the field today.
The birth of machine learning
The story of machine learning starts in 1943 when neurophysiologist Warren McCulloch and mathematician Walter Pitts introduced a mathematical model of a neural network. The field gathered steam in 1956 at a summer conference on the campus of Dartmouth College. There, 10 researchers came together for six weeks to lay the ground for a new field that involved neural networks, automata theory and symbolic reasoning.
The distinguished group, many of whom would go on to make seminal contributions to this new field, gave it the name artificial intelligence to distinguish it from Cybernetics, a competing area of research focused on control systems. In some ways these two fields are now starting to converge with the growth of IoT, but that is a topic for another day.
Early neural networks were not particularly useful -- nor deep. Perceptrons, the single-layered neural networks in use then, could only learn linearly separable patterns. Interest in them waned after Marvin Minsky and Seymour Papert published the book Perceptrons in 1969, highlighting the limitations of existing neural network algorithms and causing the emphasis in AI research to shift.
"There was a massive focus on symbolic systems through the '70s, perhaps because of the idea that perceptrons were limited in what they could learn," said Sanmay Das, associate professor of computer science and engineering at Washington University in St. Louis and chair of the Association for Computing Machinery's special interest group on AI.
The 1973 publication of Pattern Classification and Scene Analysis by Richard Duda and Peter Hart introduced other types of machine learning algorithms, reinforcing the shift away from neural nets. A decade later, Machine Learning: An Artificial Intelligence Approach by Ryszard S. Michalski, Jaime G. Carbonell and Tom M. Mitchell further defined machine learning as a domain driven largely by the symbolic approach.
"That catalyzed a whole field of more symbolic approaches to [machine learning] that helped frame the field. This led to many Ph.D. theses, new journals in machine learning, a new academic conference, and even helped to create new laboratories like the NASA Ames AI Research branch, where I was deputy chief in the 1990s," said Monte Zweben, CEO of Splice Machine, a scale-out SQL platform.
In the 1990s, the evolution of machine learning made a turn. Driven by the rise of the internet and increase in the availability of usable data, the field began to shift from a knowledge-driven approach to a data-driven approach, paving the way for the machine learning models that we see today.
The beginnings of deep learning
The turn toward data-driven machine learning in the 1990s was built on research done by Geoffrey Hinton at the University of Toronto in the mid-1980s. Hinton and his team demonstrated the ability to use backpropagation to build deeper neural networks.
"This was a major breakthrough enabling new kinds of pattern recognition that were previously not feasible with neural nets," Zweben said. This added new layers to the networks and a way to strengthen or weaken connections back across many layers in the network, leading to the term deep learning.
Although possible in a lab setting, deep learning did not immediately find its way into practical applications, and progress stalled.
"Through the '90s and '00s, a joke used to be that 'neural networks are the second-best learning algorithm for any problem,'" Washington University's Das said.
Meanwhile, commercial interest in AI was starting to wane because the hype around developing an AI on par with human intelligence had gotten ahead of results, leading to an AI winter, which lasted through the 1980s. What did gain momentum was a type of machine learning using kernel methods and decision trees that enabled practical commercial applications.
Still, the field of deep learning was not completely in retreat. In addition to the ascendancy of the internet and increase in available data, another factor proved to be an accelerant for neural nets, according to Zweben: namely, distributed computing.
Democratization and the rise of deep learning
Machine learning requires a lot of compute. In the early days, researchers had to keep their problems small or gain access to expensive supercomputers, Zweben said. The democratization of distributed computing in the early 2000s enabled researchers to run calculations across clusters of relatively low-cost commodity computers.
"Now, it is relatively cheap and easy to experiment with hundreds of models to find the best combination of data features, parameters and algorithms," Zweben said. The industry is pushing this democratization even further with practices and associated tools for machine learning operations that bring DevOps principles to machine learning deployment, he added.
Machine learning is also only as good as the data it is trained on, and if data sets are small, it is harder for the models to infer patterns. As the data created by mobile, social media, IoT and digital customer interactions grew, it provided the training material deep learning techniques needed to mature.
By 2012, deep learning attained star status after Hinton's team won ImageNet, a popular data science challenge, for their work on classifying images using neural networks. Things really accelerated after Google subsequently demonstrated an approach to scaling up deep learning across clusters of distributed computers.
"The last decade has been the decade of neural networks, largely because of the confluence of the data and computational power necessary for good training and the adaptation of algorithms and architectures necessary to make things work," Das said.
5 ways deep learning is changing the field of machine learning
Even when deep neural networks are not used directly, they indirectly drove -- and continue to drive -- fundamental changes in the field of machine learning, including the following:
Framing a problem
Deep learning's predictive power has inspired data scientists to think about different ways of framing problems that come up in other types of machine learning.
"There are many problems that we didn't think of as prediction problems that people have reformulated as prediction problems -- language, vision, etc. -- and many of the gains in those tasks have been possible because of this reformulation," said Nicholas Mattei, assistant professor of computer science at Tulane University and vice chair of the Association for Computing Machinery's special interest group on AI.
In language processing, for example, a lot of the focus has moved toward predicting what comes next in the text. In computer vision as well, many problems have been reformulated so that, instead of trying to understand geometry, the algorithms are predicting labels of different parts of an image.
Automation of human insight
The power of big data and deep learning is changing how models are built. Human analysis and insights are being replaced by raw compute power.
"Now, it seems that a lot of the time we have substituted big databases, lots of GPUs, and lots and lots of machine time to replace the deep problem introspection needed to craft features for more classic machine learning methods, such as SVM [support vector machine] and Bayes," Mattei said, referring to the Bayesian networks used for modeling the probabilities between observations and outcomes.
The art of crafting a machine learning problem has been taken over by advanced algorithms and the millions of hours of CPU time baked into pretrained models so data scientists can focus on other projects or spend more time on customizing models.
More efficient use of data
Deep learning is also helping data scientists solve problems with smaller data sets and to solve problems in cases where the data has not been labeled.
"One of the most relevant developments in recent times has been the improved use of data, whether in the form of self-supervised learning, improved data augmentation, generalization of pretraining tasks or contrastive learning," said Juan José López Murphy, AI and big data tech director lead at Globant, an IT consultancy.
These techniques reduce the need for manually tagged and processed data. This is enabling researchers to build large models that can capture complex relationships representing the nature of the data and not just the relationships representing the task at hand. López Murphy is starting to see transfer learning being adopted as a baseline approach, where researchers can start with a pretrained model that only requires a small amount of customization to provide good performance on many common tasks.
Better context
There are specific fields where deep learning provides a lot of value, in image, speech and natural language processing, for example, as well as time series forecasting.
"The broader field of machine learning is enhanced by deep learning and its ability to bring context to intelligence. Deep learning also improves [machine learning's] ability to learn nonlinear relationships and manage dimensionality with systems like autoencoders," said Luke Taylor, founder and COO at TrafficGuard, an ad fraud protection service.
For example, deep learning can find more efficient ways to auto encode the raw text of characters and words into vectors representing the similarity and differences of words, which can improve the efficiency of the machine learning algorithms used to process it. Deep learning algorithms that can recognize people in pictures make it easier to use other algorithms that find associations between people.
More recently, there have been significant jumps using deep learning to improve the use of image, text and speech processing through common interfaces. People are accustomed to speaking to virtual assistants on their smartphones and using facial recognition to unlock devices and identify friends in social media.
"This broader adoption creates more data, enables more machine learning refinement and increases the utility of machine learning even further, pushing even further adoption of this tech into people's lives," Taylor said.
Democratizing the tools
Early machine learning research required expensive software licenses. But deep learning pioneers began open sourcing some of the most powerful tools, which has set a precedent for all types of machine learning.
"Earlier, machine learning algorithms were bundled and sold under a licensed tool. But, nowadays, open source libraries are available for any type of AI applications, which makes the learning curve easy," said Sachin Vyas, vice president of data, AI and automation products at LTI, an IT consultancy.
Another factor in democratizing access to machine learning tools has been the rise of Python.
"The wave of open source frameworks for deep learning cemented the prevalence of Python and its data ecosystem for research, development and even production," Globant's López Murphy said.
Many of the different commercial and free options got replaced, integrated or connected to a Python layer for widespread use. As a result, Python has become the de facto lingua franca for machine learning development.
Deep learning has also inspired the open source community to automate and simplify other aspects of the machine learning development lifecycle. "Thanks to things like graphical user interfaces and [automated machine learning], creating working machine learning models is no longer limited to Ph.D. data scientists," Carmen Fontana, IEEE member and cloud and emerging tech practice lead at Centric Consulting, said.
Conclusion
For machine learning to keep evolving, enterprises will need to find a balance between developing better applications and respecting privacy.
Data scientists will need to be more proactive in understanding where their data comes from and the biases that may inadvertently be baked into it, as well as develop algorithms that are transparent and interpretable. They also need to keep pace with new machine learning protocols and the different ways these can be woven together with various data sources to improve applications and decisions.
"Machine learning provides more innovative applications for end users, but unless we're choosing the right data sets and advancing deep learning protocols, machine learning will never make the transition from computing a few results to providing actual intelligence," said Justin Richie, director of data science at Nerdery, an IT consultancy.
"It will be interesting to see how this plays out in different industries and if this progress will continue even as data privacy becomes more stringent," Richie said.