10 big data challenges and how to address them 5V's of big data
Definition

What is big data analytics?

Big data analytics is the process of examining big data to uncover information -- such as hidden patterns, correlations, market trends and customer preferences -- that can help organizations make informed business decisions.

On a broad scale, data analytics technologies and techniques enable organizations to analyze data sets and gather new information. Big data analytics is a form of advanced analytics that involves more complex methods that include elements such as predictive models, statistical algorithms and what-if analysis powered by analytics systems.

An example of big data analytics can be found in the healthcare industry, where millions of patient records, medical claims, clinical results, care management records and other data must be collected, aggregated, processed and analyzed. Traditional data analysis methods can't support this level of complexity at scale, leading to the need for big data analytics systems.

Big data analytics is also commonly used for accounting, decision-making, predictive analytics and many other purposes. The data found in big data analytics varies greatly in type, quality and accessibility, presenting significant challenges but also offering tremendous benefits.

Why is big data analytics important?

Organizations can use big data analytics systems and software to make data-driven decisions to improve business outcomes. The benefits can include more effective marketing, new revenue opportunities, customer personalization and improved operational efficiency. With an effective strategy, these benefits can provide advantages over competitors.

How does big data analytics work?

Data analysts, data scientists, predictive modelers, statisticians and other analytics professionals collect, process, clean and analyze growing volumes of structured transaction data and other forms of data not used by conventional business intelligence (BI) and analytics programs.

The following is an overview of the four steps of the big data analytics process:

1. Data professionals collect data from a variety of different sources. Often, it's a mix of semistructured and unstructured data. While each organization uses different data streams, some common sources include the following:

  • Internet clickstream data.
  • Web server logs.
  • Cloud applications.
  • Mobile applications.
  • Social media content.
  • Text from customer emails and survey responses.
  • Mobile phone records.
  • Machine data captured by sensors connected to the internet of things (IoT).

2. Data is prepared and processed. After data is collected and stored in a data warehouse or data lake, data professionals must organize, configure and partition the data properly for analytical queries. Thorough data preparation and processing results in higher performance from analytical queries. Sometimes this processing is batch processing, where large data sets are analyzed over time; other times, it takes the form of stream processing, where small data sets are analyzed in near real time, which can increase the speed of analysis.

3. Data is cleansed to improve its quality. Data professionals scrub the data using scripting tools or data quality software. They look for any errors or inconsistencies, such as duplications or formatting mistakes, and organize and tidy the data.

4. The collected, processed and cleaned data is analyzed using analytics software. This includes using tools for the following:

  • Data mining, which sifts through data sets in search of patterns and relationships.
  • Predictive analytics, which builds models to forecast customer behavior and other future actions, scenarios and trends.
  • Machine learning, which taps various algorithms to analyze large data sets.
  • Deep learning, which is a more advanced offshoot of machine learning.
  • Text mining and statistical analysis software.
  • Artificial intelligence.
  • Mainstream BI software.
  • Data visualization tools.

The 5V's of big data analytics

The 5V's are a set of characteristics of big data that defines the opportunities and challenges of big data analytics. These include the following:

  1. Volume. This refers to the massive amounts of data generated from different sources. For example, this can consist of data from IoT devices, sensors, transaction logs and social media.
  2. Velocity. Velocity refers to the speed at which this data is generated and how fast it's processed and analyzed. If data is needed quickly, real-time or near-real-time data processing might be needed.
  3. Variety. This refers to the data types, including structured, semistructured and unstructured data. It also refers to the data's format, such as text, videos or images. The variety in data means that organizations must have a flexible data management system to handle, integrate and analyze different data types.
  4. Veracity. Veracity refers to the accuracy and quality of data. The data must be reliable and should contain minimal noise or anomalies. This is why tools that can clean, validate and verify data are important.
  5. Value. Value refers to the overall worth that big data analytics should provide. Large data sets should be processed and analyzed to provide real-world meaningful insights that can positively affect an organization's decisions.

There's also sometimes a sixth V added to the list, variability. This V refers to the sometimes

inconsistent flow of data, where the data's meaning or structure can change rapidly. This typically makes the data more difficult to analyze.

Types of big data analytics

There are several types of big data analytics, each with its own application within the enterprise.

  • Descriptive analytics. This is the simplest form of analytics, where data is analyzed for general assessment and summarization. For example, an organization can use such data in sales reporting to analyze marketing efficiency.
  • Diagnostic analytics. This refers to analytics that determines why a problem occurred. For example, this could include gathering and studying competitor pricing data to determine when a product's sales fell off because the competitor undercut it with a price drop.
  • Predictive analytics. This refers to analysis that predicts what comes next. For example, this could include monitoring the performance of machines in a factory and comparing that data to historical data to determine when a machine is likely to break down or require maintenance or replacement.
  • Prescriptive analytics. This form of analysis follows diagnostics and predictions. After identifying an issue, it recommends what can be done about it. For example, this could include addressing supply chain inconsistencies that are causing pricing problems by identifying suppliers whose performance is unreliable and suggesting their replacement.
  • Real-time analytics. This refers to the processing and analyzing of data as it's generated. Real-time analytics is useful in settings where large amounts of data are generated and quick decisions need to be made based on that data. For example, this would be useful in fraud detection systems.

What's the difference between big data and traditional data?

Several characteristics define both big data analytics and traditional data analytics. Some of the largest differences are in the scale, type of data being handled and management.

Traditional data analytics typically deals with structured data measured in gigabytes and terabytes. Due to its limited size, the data can be stored in a database on a limited number of servers. Traditional data analytics is typically managed using a conventional database system, such as structured query language, or SQL, databases.

Big data analytics, on the other hand, typically deals with a mix of structured, semistructured and unstructured data formats measured in and above the petabyte level. All the data is commonly managed in a distributed computing system across multiple servers to handle large data volumes or in cloud storage. Big data analytics also relies on more advanced tools with machine learning and data mining features to analyze data in or near real time.

A chart showing the features big data tools offer.
Big data tools analyze, manage and store large volumes of data that are typically too big for traditional databases.

Key big data analytics technologies and tools

The following tools and technologies are used to support big data analytics processes:

  • Hadoop is an open source framework for storing and processing big data sets. Hadoop can handle large amounts of structured and unstructured data.
  • Predictive analytics hardware and software process large amounts of complex data and use machine learning and statistical algorithms to predict future event outcomes. Organizations use predictive analytics tools for fraud detection, marketing, risk assessment and operations.
  • Stream analytics tools filter, aggregate and analyze big data that might be stored in different formats or platforms.
  • Distributed storage data is replicated, generally on a nonrelational database. This can measure independent node failures, lost or corrupted big data, or provide low-latency access.
  • NoSQL databases are nonrelational data management systems that are useful when working with large sets of distributed data. NoSQL databases don't require a fixed schema, which makes them ideal for raw and unstructured data.
  • A data lake is a large data storage repository that holds native-format raw data until it's needed. Data lakes use a flat architecture.
  • A data warehouse is a repository that stores large amounts of data collected by different sources. Data warehouses typically store data using predefined schemas.
  • Knowledge discovery and big data mining tools help businesses mine large amounts of structured and unstructured big data.
  • In-memory data fabric distributes large amounts of data across system memory resources. This helps provide low latency for data access and processing.
  • Data virtualization enables data access without technical restrictions.
  • Data integration software enables big data to be streamlined across different platforms, including Apache, Hadoop, MongoDB and Amazon Elastic MapReduce.
  • Data quality software cleanses and enriches large data sets.
  • Data preprocessing software prepares data for further analysis. Data is formatted and unstructured data is cleansed.
  • Apache Spark is an open source cluster computing framework used for batch and stream data processing.
  • End-to-end analytics platforms such as Microsoft Power BI and Tableau provide big data analytics capabilities to the desktop and deliver insights through dashboards, with full suites of tools for analysis and reporting.

Big data analytics applications often include data from both internal systems and external sources, such as weather data or demographic data on consumers compiled by third-party information service providers. In addition, streaming analytics applications are becoming more common in big data environments as users perform real-time analytics on data fed into Hadoop systems through stream processing engines, such as Spark, Flink and Storm.

Early big data systems were mostly deployed on premises, particularly in large organizations that collected, organized and analyzed massive amounts of data. But cloud platform vendors, such as Amazon Web Services (AWS), Google and Microsoft, have made it easier to set up and manage Hadoop clusters in the cloud. The same goes for Hadoop suppliers such as Cloudera, which supports the distribution of the big data framework on AWS, Google and Microsoft Azure clouds. Users can spin up clusters in the cloud, run them for as long as they need and then take them offline with usage-based pricing that doesn't require ongoing software licenses.

Big data has become increasingly beneficial in supply chain analytics. Big supply chain analytics uses big data and quantitative methods to enhance decision-making processes across the supply chain. Specifically, big supply chain analytics expands data sets for increased analysis that goes beyond the traditional internal data found on enterprise resource planning and supply chain management systems. Also, big supply chain analytics implements highly effective statistical methods on new and existing data sources.

A diagram showing uses for big data.
Big data can be used for various purposes, depending on an organization's mission and goals.

Big data analytics uses and examples

The following are some examples of how big data analytics can be used to help organizations:

  • Customer acquisition and retention. Consumer data can help companies' marketing efforts, which can act on trends to increase customer satisfaction. For example, personalization engines for Amazon, Netflix and Spotify can improve customer experiences and create customer loyalty.
  • Healthcare. The healthcare field requires the storage, management and analysis of millions of patient records, insurance plans, prescriptions and related data, which is typically a combination of structured and unstructured data.
  • Targeted ads. Personalization data from sources such as past purchases, interaction patterns and product page viewing histories can help generate compelling, targeted ad campaigns for users on the individual level and on a larger scale.
  • Product development. Big data analytics can provide insights to inform organizations about product viability, development decisions, progress measurement and steer improvements to best fit customer needs.
  • Price optimization. Retailers can opt for pricing models that use and model data from a variety of data sources to maximize revenues.
  • Supply chain and channel analytics. Predictive analytical models can help with preemptive replenishment, business-to-business supplier networks, inventory management, route optimizations and notifying customers of potential delivery delays.
  • Risk management. Big data analytics can identify new risks from data patterns for effective risk management strategies.
  • Improved decision-making. Insights business users extract from relevant data can help organizations make quicker and better decisions.

Big data analytics benefits

The benefits of using big data analytics include the following:

  • Better customer engagement. A better understanding of customer needs, behavior and sentiment can lead to better marketing insights and provide information for product development.
  • Cost. Organizations can save money from new business process efficiencies and optimizations.
  • Improved decision-making. Effective strategizing can benefit and improve the supply chain, operations and other areas of strategic decision-making.
  • Increased market insight. Tracking purchasing behavior at scale and conducting predictive analytics can increase an organization's awareness of its market.
  • Optimize risk management strategies. Big data analytics improve risk management strategies by enabling organizations to address threats in real time.
  • Real-time analytics. Organizations can quickly analyze large amounts of real-time data from different sources, in many different formats and types.

Big data analytics challenges

Despite the wide-reaching benefits that come with using big data analytics, its use also comes with the following challenges:

  • Data accessibility. With larger amounts of data, storage and processing become more complicated. Big data should be stored and maintained properly to ensure it can be used by less experienced data scientists and analysts.
  • Data quality maintenance. With high volumes of data coming in from various sources and in different formats, data quality management for big data requires significant time, effort and resources to properly maintain it.
  • Data security. The complexity of big data systems presents unique security challenges. Properly addressing security concerns within such a complicated big data ecosystem can be a complex undertaking.
  • Choosing the right tools. Selecting from the vast array of big data analytics tools and platforms available on the market can be confusing, so organizations must know how to pick the best tool that aligns with users' needs and infrastructure.
  • Talent shortages. With a potential lack of internal analytics skills and the high cost of hiring experienced data scientists and engineers, some organizations are finding it hard to fill the talent gaps.

History and growth of big data analytics

The term big data was first used to refer to increasing data volumes in the mid-1990s. In 2001, Doug Laney, then an analyst at consultancy Meta Group Inc., expanded the definition of big data. This expansion described the increase of three of the five V's -- volume, velocity and variety. Those three factors became known as the 3V's of big data. Gartner popularized this concept in 2005 after acquiring Meta Group and hiring Laney. Over time, the 3V's became the 5V's.

Another significant development in the history of big data was the launch of the Hadoop distributed processing framework. Hadoop was launched in 2006 as an Apache open source project. This planted the seeds for a clustered platform built on top of commodity hardware that could run big data applications. The Hadoop framework of software tools is widely used for managing big data.

By 2011, big data analytics began to take a firm hold in organizations and the public eye, along with Hadoop and various related big data technologies.

Initially, as the Hadoop ecosystem took shape and started to mature, big data applications were primarily used by large internet and e-commerce companies such as Yahoo, Google and Meta, as well as analytics and marketing services providers.

More recently, many users have embraced big data analytics as a key technology driving digital transformation. Users include retailers, financial services firms, insurers, healthcare organizations, manufacturers, energy companies and other enterprises. A number of other trends have also started to appear, such as pairing generative AI with big data analytics.

High-quality decision-making using data analysis can help contribute to a high-performance organization. Learn which roles and responsibilities are important to a data management team.

This was last updated in December 2024

Continue Reading About What is big data analytics?

Dig Deeper on Data science and analytics