chris - Fotolia
Databricks AI in next-generation transportation
Virgin Hyperloop turned to Databricks to horizontally scale out and speed up its processes as it works to create a new form of transportation.
Virgin Hyperloop, a next-age transportation company, isn't aiming to just build a car. Instead, it's building a new way of travel, and all the policies and infrastructure that go along with it.
The 2014 startup envisions a future of hyperloops -- a form of transportation that shoots cargo and human-filled pods through pressurized tubes at high speeds. First publicly proposed by Elon Musk in 2012, various companies, including Virgin Hyperloop, have since created prototype versions of the transportation system.
"It's something not exactly like trains, planes, cars," said Jerome Wei, senior director of machine intelligence and analytics at the company. Virgin Hyperloop maintains a 500-meter test track in Las Vegas, where it has run over 400 tests, hitting top speeds of 240 mph.
To help turn its vision into reality, the startup is using Databricks AI and analytics technology.
The Los Angeles-based transportation company is among many Databricks users and partners presenting during the Spark + AI Summit 2020 conference, held virtually this year from June 22 to June 26.
Jerome WeiSenior director of machine intelligence and analytics, Virgin Hyperloop
The developers who originally created Apache Spark went on to found Databricks, the data science and engineering platform vendor that sponsors of the conference.
Needing compute
"We're trying to take on these very big questions and very ambitious activities," Wei said. To that end, he added, Virgin Hyperloop needs as many compute resources as possible.
A few years ago, the startup evaluated available options. It used Open Source programs and built on them to create some of its own tools.
Virgin Hyperloop could have created an entire on-premises system to handle its substantial compute needs, Wei said. But he added that the company does things in a "bursty" way, meaning it uses massive amounts of compute in short amounts of time. His team tends to create an update, do a design experiment, absorb the results and then move to the next iteration, Wei said.
An on-premises system would not only take up a lot of space, but would also remain underutilized most of the time.
Figuring a cloud-based system would work better, Virgin Hyperloop evaluated vendors with cloud platforms that could handle its AI and analytics workloads. Databricks, Wei found, connected well with the open source software Virgin Hyperloop was using.
Many open source tools are integrated into Databricks. The Databrick platform automatically maintains and updates those tools, along with the rest of the Databricks platform -- capabilities Wei said he was looking for.
The Databricks platform also runs on Apache Spark, a cluster-computing framework that fit Virgin Hyperloop's need for speed and memory in its main analytics engine.
Databricks AI
So Virgin Hyperloop decided to use Databricks for its AI and analytics needs. The initial setup about a year ago was relatively painless, according to Wei.
Virgin Hyperloop had a lot of code written in pandas (Python Data Analysis Library), an open source data analysis and manipulation tool for Python, but as the company's needs and data expanded, the team needed to scale horizontally. Pandas, which does not scale well to big data, presented a problem.
Wei's team considered learning Scala, which would have cost too much time and money. But, then, Databricks released Koalas, an open source tool that enables developers to bring pandas quickly into Spark.
With minimal code changes, Virgin Hyperloop scaled its pandas code on Spark, speeding up processes and reducing its data processing time by as much as 95%, Wei said.
Databricks' MLFlow, an open source machine learning management tool, also fit Virgin Hyperloop's needs and use cases well. Virgin Hyperloop used MLFlow to track experiments and assess outputs of its simulation hyperloop runs.
A team from Virgin Hyperloop will present in a panel session June 25 titled "Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on Quick-Insights Analytics and Demand Modeling."