Mike_Kiev - Fotolia

Tip

BigQuery vs. Redshift debate centers around cost, management

Google BigQuery and Amazon Redshift remain two popular options for a data warehouse in the cloud. But, to choose between them, enterprises need to factor in some distinct differences.

While the public cloud market, in general, houses a lot of competition around big data, two services, in particular, are pitted against each other more than others: Amazon Redshift and Google BigQuery.

These data warehouse services share broad similarities that appeal to big data users, but their differences, particularly around cost and management, require careful consideration before adoption.  Enterprises must also assess their IT staff, and whether it can handle the infrastructure and resource requirements needed to support these big data services.

Cost, management differences

The BigQuery vs. Redshift debate isn't exactly new, according to Mike Leone, senior analyst at Enterprise Strategy Group Inc. While Redshift remains ahead in terms of user adoption, BigQuery is catching up. Cost and management have been two major factors in Google's ability to close the gap.

This is likely because, while Redshift and BigQuery both fall under the broad platform-as-a-service umbrella, BigQuery's serverless architecture takes the infrastructure guess work out of the equation, Leone said. Administrators don't have to worry about certain things, such as instance type selection or provisioning additional capacity to support usage bursts, as BigQuery abstracts away these resource management tasks. This can also lead to potential cost-savings, as it enables IT to focus on other projects that drive more business value.

"The simplicity factor that BigQuery provides is a twofold advantage -- it's perceived as being easier for both IT admins and end users to use the platform, [and] it also provides the potential for higher opex savings," Leone said.

In contrast, Redshift requires skilled and trained IT administrators to manage the underlying infrastructure, understand resource requirements, maintain those resources and respond to end users when something goes wrong.

Redshift does, however, provide more granular control of the underlying infrastructure, which lets IT teams make configurations based on workload requirements, according to Leone. "IT administrators with the skill set to understand all aspects of the data warehouse and workload gain the flexibility to optimize the infrastructure to best meet performance requirements," he said.

This means, with optimal configuration, Redshift can outperform BigQuery in some use cases -- although, these performance differences are usually minor. "We're not talking about Redshift taking minutes and BigQuery taking hours to complete queries; it's mostly a seconds-to-minutes comparison," Leone said.

IT shops that have traditionally managed, and prefer to manage, their own CPU and storage resources for a data warehouse will likely gravitate toward Redshift, echoed Michael Fauscette, chief research officer at G2 Crowd, a company that compiles user reviews on a range of IT products.

What's more, Redshift's pricing often comes across as more predictable, Fauscette said. Redshift, for example, uses an hourly approach that's more consistent and attractive to larger companies. BigQuery, on the other hand, lets you switch back and forth between fixed pricing and pay-per-use. Ultimately, though, the cost of each service will vary, depending on how an enterprise uses it.

"My theory is that Redshift is probably a little more expensive, but it depends on specifics," Fauscette said.

Other factors in the BigQuery vs. Redshift debate

In addition to cost and resource management, Redshift and BigQuery differ in terms of their data load processes.

In Redshift, you can copy data directly from S3, and also stream data with Amazon Kinesis. BigQuery, on the other hand, supports bulk loads from CSB or JSON files, with some limitations. BigQuery also supports streaming data, but that tends to drive up your costs, Fauscette said. What's more, AWS' dominance in the public cloud market, as a whole, might give the vendor a bit of a leg up in the Redshift vs. BigQuery race.

"Amazon has definitely built out their ecosystem faster and has been the leader," he said. For the many companies that already host some type of application or workload on AWS infrastructure, adopting Redshift as a data warehouse would be a natural move.

A broader market

Of course, to choose the best data warehouse for their big data needs, enterprises shouldn't just focus on the BigQuery vs. Redshift debate, but assess other potential options as well.

After all, it isn't just a two-horse race, said Greg Schulz, senior advisory analyst at StorageIO. Microsoft, for example, also has its Azure SQL Data Warehouse. And, just as users who have already invested in AWS might flock to Redshift, enterprises with a big Azure footprint might do the same with SQL Data Warehouse.

"If you are a Microsoft shop, there is something to be said [about] looking at Google and AWS, but talking to Azure first," Schulz said.

Dig Deeper on AWS database and analytics strategy