michelangelus - Fotolia
Tools manage performance for big data cloud applications
Tools such as Unravel and Pepperdata offer a way to measure performance of big data cloud applications, which may aid companies with on-premises configuration issues.
As application performance management vendors introduce new capabilities for users moving big data cloud applications to the cloud, their focus often is on establishing baselines that set the stage for efficient data migrations.
One APM vendor, Unravel Data, on March 26 unveiled a platform update that supports Amazon AWS, Microsoft Azure and Google public cloud IaaS deployments, and PaaS deployments on Azure HDInsight and Amazon EMR. The Unravel 4.5 platform provides a view of existing big data configurations that may have never truly been inventoried.
New in Unravel 4.5 are reporting capabilities for on-premises-to-cloud applications mapping, cluster discovery and out-of-memory errors for Apache Hive applications.
The Unravel platform uses predictive analytics and machine learning to create baseline views of on-premises big data application activity. The software then measures performance of big data cloud application versions so users can compare that to on-premises results.
These are important steps as teams try to move nascent big data prototypes to the cloud, where they are meant to dependably drive regular operations of the organization, according to Bernd Harzog, CEO at APM Experts, an Atlanta-based consultancy focused on application performance management.
The difficulty of the journey from skunkworks prototype to truly managed application is familiar in technology history, he indicated, but there are new twists.
Mixing big data cloud applications
To support service-level agreements (SLAs), Unravel software monitors the big data application, and traces work throughout the software stack. Unravel, Harzog said, can tune running applications in real time.
"The AI in the Unravel product allows it to learn what's going on, and the data science behind it allows it to translate that automatically into actions to optimize the performance of the environment," he said.
Bernd HarzogCEO, APM Experts
The larger issue the software is intended to address is that advanced technology can be easier to prototype than it is to put into daily operations. This is not the first time this has happened, Harzog said.
"We as an industry are absolutely excellent at one thing -- that is, putting innovations into production without having any clue that they actually work," Harzog said.
That vignette has played out again with clustered big data environments, especially as they begin to incorporate AI and machine learning approaches.
"When you mix AI workloads to support business decisions, it becomes a competitive advantage for your organization, and it has to work well," Harzog said.
Know your SLAs
In targeting the move to the cloud, the latest Unravel release aims at one of the key trends of the day.
Administrators of big data applications are finding it difficult to determine future cloud infrastructure requirements; in some cases, that is because they don't have sufficient baseline information on performance of the big data apps they already have running on premises.
"Problems occur that cause these apps to slow down or fail, which happens frequently," said Kunal Agarwal, CEO and co-founder at Unravel Data. He cited end-to-end pipeline performance issues such as data skewing, bad configurations and incorrect container sizing.
Identifying and fixing such flaws helps organizations establish rational SLAs for big data cloud applications, Agarwal said.
Big data stack performance
For many IT cloud shops, cloud deployments are a favorable alternative to hosting big data applications such as Hadoop and Spark in their own data centers. Configuration issues, alone, are daunting for beleaguered administration staff working on on-premises bare-metal setups.
"First-generation big data applications were all run on premises on bare metal, but now people are discovering some of their big data workloads can run on the public cloud," Harzog said. "Running on the cloud is not more expensive, but it is a lot less aggravating."
It continues to be difficult for companies to find people who can run big data software stacks that achieve acceptable application performance, he added. "That is the primary driver behind people moving big data applications to the cloud."
Once data managers decide to move to the cloud, they have to decide which applications to move first, and Unravel performance measures can help in this, according to Harzog.
Cloud data migrations
In its efforts to monitor and manage big data applications, Unravel's platform competes with software from vendors such as Cloudera, Dynatrace, New Relic, Pepperdata and others.
For its part, Pepperdata also introduced software targeted at big data cloud migrations.
According to John Armstrong, head of product marketing at Pepperdata, the company has begun work with analytics applications migration specialist Cloudwick Inc. on services for moving big data to AWS Cloud and others.
Pepperdata's goal is to establish baseline views of on-premises performance, to map workloads to cloud architecture and to assess performance in order to meet SLAs.
Armstrong said Pepperdata's platform provides profiles of CPU and memory requirements as they exist on premises, and then maps them to different configuration options available on the cloud, so that users can make intelligent decisions about which applications are the most immediate fits for cloud migration.
On-premises configurations are mapped to cloud computing instances, which cloud providers offer in a wide -- and sometimes dizzying -- variety of options. Armstrong said Pepperdata software guides users through decisions based on whether they wish to opt for general-purpose, CPU-optimized or memory-optimized options.