michelangelus - Fotolia

Oracle Data Science efforts advance with new services

Oracle adds news new services to its cloud infrastructure platform in a bid to provide data scientists, as well as data analysts, with data management and query functionality.

Oracle introduced new data services that expand the number of services available on its cloud platform.

The marquee new service from the software giant is the Oracle Cloud Infrastructure Data Science offering -- an evolved version of the DataScience.com platform that Oracle acquired in 2018.

The Oracle Data Science service provides an automated workflow for machine learning and data analysis. Oracle is also launching a new data catalog service that helps users organize data for analysis. Another new capability is the Cloud SQL service that enables users query cloud data stores, while the Data Flow service enables users to run Apache Spark big data analysis as a service.

Oracle is playing to its strength in data with the new services, unveiled Feb. 12, according to Nucleus Research analyst Daniel Elman.

"Oracle made its name on database technology and remains to this day a preeminent leader in the space," Elman said. "With these services, it's leveraging this expertise with data management and offering its thousands of database customers a natural route to enabling data science initiatives without having to migrate data or learn new specialized tools."

Oracle Data Science positioned for ease of use

Oracle is marketing the Data Science service as a way for teams of data scientists to work together collaboratively to generate machine learning models and then apply them to production applications.

Screenshot of Data Science notebook environment for Oracle Cloud Infrastructure
Oracle Cloud Infrastructure Data Science notebook environment

The data science service has a project environment that sets up all the infrastructure and the networking needed to access data assets, as well as providing the tools needed for data science, explained Greg Pavlik, senior vice president of product development, data and AI services at Oracle. Among the tools is an automated machine learning feature that provides these capabilities for common data science tasks such as algorithm selection.

Oracle getting into the data catalog market

Alongside the Oracle Data Science service, the vendor launched a new data catalog to help organizations track all the data sets that come into a cloud deployment.

Oracle made its name on database technology and remains to this day a preeminent leader in the space. With these services, it's leveraging this expertise with data management and offering its thousands of database customers a natural route to enabling data science initiatives.
Daniel ElmanAnalyst, Nucleus Research

"Say you're setting up a data warehouse, we can introspect the data warehouse model, and allow users -- it could be data scientists, it could be data stewards, it can be analysts -- to find out what data is available, who owns it and what it's meant to be used for," Pavlik said.

The Oracle data catalog also provides tagging capabilities that enable administrators to define taxonomies and start to organize data sets hierarchically.

Data Flow service enables Apache Spark Big Data

The new Data Flow service also helps meet a different need, enabling users to run Apache Spark jobs as service in the Oracle cloud. One of the challenges some organization face with running Spark analytics jobs is that they are often running on top of Hadoop clusters, which introduces additional complexity, Pavlik noted.

All that's needed to run a big data workload in the Data Flow service is to upload the script, click on an application that is sort of the pointer to the script, and then specify how many CPUs the job should run on, Pavlik explained.

"We will synthesize the job on the fly in a totally serverless architecture, executing in tens of seconds," he said. "We really think about this as a big generational leapfrog in terms of how to how to make big data workloads consumable by the enterprise."

Oracle is also expanding the ability of users to query data in the cloud with the new

Oracle Cloud SQL offering. Users can use the SQL capability to query against cloud-based object stores.

"So you can reach out into a cloud-based data lake and apply the full semantic richness of the Oracle database," Pavlik said.

Data integration service is coming

In addition to the Oracle Data Science services, the vendor has more data services in the works, among them a data integration service. Pavlik said that an upcoming data integration service will provide data preparation and ETL capabilities.

"It figures out where's the most cost-effective way to run elements of the flow so that it's filtering data and minimizing data movement," Pavlik said. "It's also filled with a data immersive view, so you can really drill down, understand your datasets and manipulate the data."

Next Steps

Apache Drill improves big data SQL query engine

Dig Deeper on Database management