March Madness analytics, AI help data scientist fill bracket
To create his March Madness bracket predictions, the head of data science at DataRobot uses a host of machine learning algorithms and some predictive analytics.
Zach Deane-Mayer, director of data science at DataRobot Inc., has been participating in an annual March Madness pool since he was about 10 years old. It's a family thing for him. For more than two decades, he's vied in pools against his father and his brothers, who, as Deane-Mayer puts it, "have traditionally been much better at it."
However, that underdog status appears to be changing. Over the last few years, Deane-Mayer has been applying machine learning algorithms to March Madness analytics data, giving his picks a boost and himself more of a viable chance in the annual competition.
Making brackets with DataRobot
Staying loyal to his employer, Deane-Mayer has been creating his algorithms using DataRobot's automated machine learning tools since starting at the Boston-based startup in 2015. He used different platforms before that, and they worked well, he said, but they were less intuitive to use and not as automated.
"For me, the automation that DataRobot provides for fitting a lot of different models is incredibly valuable," Deane-Mayer said.
Zach Deane-MayerDirector of data science, DataRobot
The automated machine learning platform enables him to spend less time setting up his models, while giving him more time to search for new data sources to power those models, a task that is not only fun, but also essential to doing well in the competition, he said.
Deane-Mayer said he uses three primary types of March Madness analytics data for his models: power ratings from various analysts, ranking discrepancies between seeds, and betting odds and lines from different games.
What ultimately goes into the DataRobot platform is a mix of current and historical data sets going back about 10 years. Deane-Mayer said he uses a variety of different models and DataRobot AI tools to make his predictions, but over the years, he's found the Eureqa engine to work the best for this.
Making models
Created by the company Nutonian, one of the startups DataRobot has acquired over the last few years, Eureqa is an AI-powered modeling engine that enables users to generate and update models automatically.
For Deane-Mayer, who generates between 20 and 50 models to make his predictions, relatively easy-to-use model automation is essential. With the automated tools, "[it] probably takes less than an hour to run autopilot and get predictions for one model," he said.
However, factor in the number of models he makes and the time he spends gathering new data, and the entire process takes hours.
In the end, however, it's worth it. His models add a layer of fun and competition to his yearly family pool, even if, Deane-Mayer said with a laugh, his family "absolutely does not" think they give him an unfair advantage.
"I'm looking at all the same data they are," he said. "I'm just using a different approach to synthesizing that data."
Machine learning Grammy predictions
DataRobot employees are no strangers to using analytics and machine learning to predict contest winners. Earlier this year, Taylor Larkin, a data science evangelist at DataRobot, used the DataRobot platform to help predict Song of the Year at the 2019 Grammy Awards.
According to a DataRobot blog post, Larkin used Spotify tools to extract information including the total word count, the beats per minute, the amount of profanity and the genre of each nominated song.
Using the DataRobot platform, he automatically generated over 100 models that predicted Childish Gambino's "This is America" would be the winner. As Grammy-watchers know, the song ended up winning Song of the Year.
As for Deane-Mayer, this year's March Madness analytics and machine learning models predicted Duke, a longtime NCAA tournament favorite, to triumph. However, the team was knocked out of the tournament in the Elite Eight round, a reminder that no prediction is perfect.