Explore the foundations of artificial neural network modeling
Dive into Giuseppe Bonaccorso's recent book 'Mastering Machine Learning Algorithms' with a chapter excerpt on modeling neural networks.
Deep learning neural networks are usually rife with challenges. For all their layered capabilities, the algorithms themselves are hard to create and even harder to manage. From the demand for millions of data points used in model training to the black box decision-making process, data scientists are fighting an uphill battle right from the start of neural network creation. However, with bigger risk comes bigger reward: Deep learning artificial neural networks can produce state-of-the-art performance in regression, image classification and business applications.
This anxiety around training methods and limitations of the algorithms is not lost on Giuseppe Bonaccorso, who recently wrote a 700-page how-to manual. For enterprises that want to take a dive into artificial neural network modeling, Bonaccorso, who is the global head of innovative data science at Bayer, offers his take on popular issues, training strategies and why building a model can work.
What are the benefits of creating an artificial neural network from scratch, especially when there are so many prepackaged vendor offerings?
Giuseppe Bonaccorso: The rationale behind the choice of a new model or an existing one should be rooted in the nature of the problem. For example, in image recognition, there are several high-performance networks that can be adapted to specific roles, but there are also problems that require more customized solutions. In these cases, building a model from scratch is likely to be the optimal strategy.
Neural networks are very flexible models. There are cases when existing architectures can simplify the work, as well as pretrained models where only some layers are retrained to meet specific requirements in transfer learning. Start with simple networks. If the results are poor, it's possible to increase complexity. However, the simplest model that guarantees both accuracy and generalization ability [is the right one].
Which toolkits are best suited to model and create a neural network?
Bonaccorso: My primary choice is TensorFlow 2, which now includes Keras, which is a high-level module. Using TensorFlow, the data scientist can easily start with Keras models based on predefined layer structures and, in case it's necessary, she can switch to more advanced features. There are also other frameworks, like PyTorch. I believe there are no silver bullets, but it's important that once a framework is chosen, all its features are thoroughly studied and evaluated. Even if not immediately helpful, some features can, in fact, become essential to solving some problems in the most effective way.
Neural networks are famous for being difficult and hard to manage. What are the most common problems when modeling a deep learning network?
Bonaccorso: Deep neural networks are extremely complex models with tens of millions of parameters. Training them means finding the optimal set of parameters to achieve a predefined goal -- and the training can easily remain stuck in suboptimal solutions. In order to mitigate this problem, several optimization algorithms have been proposed. The role of a data scientist is to pick the most appropriate algorithm and tune up its hyperparameters to maximize both the training speed and the final accuracy. Moreover, these kinds of models have an intrinsically large capacity; the more parameters you introduce, the more complex the system becomes and, consequently, its ability to learn the training set increases very quickly.
When small data sets are employed, deep learning models can easily overfit and learn to associate each training input with the correct label but lose the ability to generalize. Generalizing is a key concept in learning because we'd like to model systems that can abstract from some examples to derive a generic 'concept' representing a specific class.
Unfortunately, when working with deep neural networks, overfitting is a very common issue. However, data scientists can employ regularization, dropout and batch normalization techniques to correct issues.
How can data scientists keep their models accurate, fast and optimized over time with a model that is hard to retrain?
Bonaccorso: Once a model is properly trained, it becomes stable in its underlying data-generation process. However, many models are based on training sets that represent time-changing processes.
In fact, one common problem when retraining networks is that they tend to forget past knowledge when a new one is submitted. In order to avoid this problem, the new training sets must contain data sampled from the new data-generating process. For example, if we have trained a model to distinguish between cats and dogs and we want to extend it also to tigers, we cannot simply create a tiger data set -- we need to create a new set containing all three classes to learn to distinguish among features.
Current learning algorithms are very sensitive to drastic changes in the training sets, so it's important to keep this concept in mind when you need to update or retrain an algorithm. A more complex problem arises when the current architecture doesn't have enough capacity to learn more classes. In this case, the model would underfit, showing a very low accuracy, and the data scientist would have to consider a deeper or more complex architecture and a larger data set is needed to avoid overfitting the model.
Images or volumetric data are based on easily searched features using special operators (like convolutions), which work with the geometric structure of the samples. This intuition is based on direct observations of the structure of biological vision systems, where subsequent layers are responsible to extract more and more detailed features.
Convolutional deep neural networks are the starting point of every image-related problem, and, given the advancements in neural computation, their complexity is becoming easier to manage. Of course, convolutional layers are not enough to solve all the problems. Other helpful layers (like pooling, padding and up/down-sampling) are necessary to achieve specific goals. Aspiring data scientists should study and learn how to apply all layers in the most accurate and reasonable way.
Dive into Mastering Machine Learning Algorithms
Click here to read Chapter 17, "Modeling Neural Networks," of Bonaccorso's book.