ktsdesign - stock.adobe.com
Introduction to using machine learning
The first part of our machine learning series, excerpted from training materials for Arcitura's Machine Learning Specialist certification, introduces algorithms, models and model training.
This article is excerpted from the course "Fundamental Machine Learning," part of the Machine Learning Specialist certification program from Arcitura Education. It is the first part of the 13-part series, "Using machine learning algorithms, practices and patterns."
Machine learning is a field of study focused on empowering computers to self-learn without being explicitly programmed. Machine learning software enables computers to learn from data and feedback from measurements of results based on performed tasks.
As data grows exponentially, it can be challenging for humans to analyze it while increasing the speed and accuracy at which it is processed. Pattern finding and predictive decision analysis of the world's data is impossible to be carried out by humans. More time is required to compute a greater volume of data. Machine learning provides a solution to enable faster processing of large sets of data at a reduced processing time to produce potential value for individuals and organizations.
At the heart of a machine learning system is its core ability for the system to "learn." This is accomplished by the use of intelligent programs (referred to as algorithms), which help shape the logic behind different machine learning approaches (referred to as models).
Specifically, machine learning relies on a library of different algorithms that enable computers to automatically learn from data to:
- Identify rules and patterns;
- Identify commonalities and formulate predictions; and
- Classify data.
Once an analyst identifies a problem to solve, they choose an algorithm and model. The algorithm is used to expose the model to historical data relevant to the problem. Iterations of this exposure enable the model to learn about the nature of the data, thereby optimizing it in support of providing a solution to the problem. Once the model has been sufficiently trained, it can be deployed and used to solve the problem with new data as its input (Figure 1).
With traditional data analysis and business intelligence approaches, analysts have a set of input data and a pre-defined analytical process they use to produce a set of results, generally distributed as reports to decision-makers.
With machine learning, analysts first subject a model to training data (old or historical data relevant to the problem) so that the model can be trained, refined and optimized in support of the problem (Figure 3). The training of the model is what makes machine learning a reality.
The analyst first chooses historical training data and uses a machine learning tool to run this data through a preliminary model. This produces results that are assessed to help improve and evolve the model. Once the model has been sufficiently "trained," it is deployed for use with the new input data relevant to solving the problem. The analyst keeps an eye on these results to see if the model can be further refined in the future.
The basics of machine learning models
A machine learning model exists as a mathematical equation that accepts input data and produces a result. A key objective of machine learning is to train a model so that it becomes as refined and optimized as possible in relation to the data it is responsible for processing.
The model being trained has an equation that continues to evolve as it learns more by processing increasing amounts of data. Each data record processed by a model provides an opportunity for the model to learn more of the overall data set, thereby for its equation to be further refined. A model in training will produce output used for learning purposes. A model already trained -- or a trained model -- will produce output used to make decisions.
The basics of machine learning algorithms
A machine learning algorithm is a procedure designed to process data in support of a specific type of machine learning analysis. Machine learning algorithms are used to train models until they are ready to be deployed. Algorithm logic is executed by machine learning tools to process training data that helps models become refined and trained.
Different algorithms are used for solving different types of problems. An analyst needs to assess a given problem and then choose an algorithm that best suits the problem. There are many different machine learning algorithms available to choose from, and sometimes selecting the right algorithm can be confusing and difficult. It is common to try out different algorithms before identifying the most suitable one.
Training the machine learning model
As previously mentioned, a key feature of machine learning (and the reason it's referred to as machine learning) is the ability for a model to learn by processing large amounts of historical data, referred to as training data. As the model repeatedly processes training data, it continues to learn and can be refined and optimized (Figure 4) until it reaches a stage where it is considered a trained model.
Understanding machine learning by example
To help illustrate and distinguish the different types of algorithms, the next two articles in this series will include a set of scenarios based on a simple example relating to a toy company and its goals for improving how it can better reach its customers using different types of data.
The toy company sells its toy product lines online and via retail outlets. It provides a wide variety of toy products for different age groups and activities and across different international regions. The toy company has been operational for many years and has, during that time, accumulated a large amount of quantitative and qualitative data from internal and external sources. Internal data includes customer data, marketing data, transaction data and customer comment data (star rankings and natural language comments). External data includes social media data referencing the company and its products from one or more social media platforms.
Upcoming machine learning topics
There are many different algorithms, methods and techniques that comprise the machine learning field. Choosing the correct algorithm and approach for a given machine learning problem is critical and may require experimentation.
This next two articles in this series provide an overview of common machine learning algorithm types and related practices. The remaining articles in the series then delve into a collection of common machine learning practices (documented as patterns), several of which drill down into the application of these algorithms.
View the full series
This lesson is one in a 13-part series on using machine learning algorithms, practices and patterns. Click the titles below to read the other available lessons.
Course overviewLesson 1
Lesson 2: The supervised approach to machine learning
Lesson 3: Unsupervised machine learning: Dealing with unknown data
Lesson 4: Common ML patterns: central tendency and variability
Lesson 5: Associativity, graphical summary computations aid ML insights
Lesson 6: How feature selection, extraction improve ML predictions
Lesson 7: 2 data-wrangling techniques for better machine learning
Lesson 8: Wrangling data with feature discretization, standardization
Lesson 9: 2 supervised learning techniques that aid value predictions
Lesson 10: Discover 2 unsupervised techniques that help categorize data
Lesson 11: ML model optimization with ensemble learning, retraining
Lesson 12: 3 ways to evaluate and improve machine learning models
Lesson 13: Model optimization methods to cut latency, adapt to new data