Elastic Stack (ELK Stack)
What is the Elastic Stack?
The Elastic Stack is a group of open source products from Elastic designed to help users take data from any type of source and in any format, and search, analyze and visualize that data in real time. The product group was formerly known as the ELK Stack for the core products in the group -- Elasticsearch, Logstash and Kibana -- but has been rebranded as the Elastic Stack. A fourth product, Beats, was subsequently added to the stack. The Elastic Stack can be deployed on premises or made available as software as a service (SaaS). Elasticsearch supports Amazon Web Services (AWS), Google Cloud Platform and Microsoft Azure.
What are the core products of the Elastic Stack?
The company Elastic was founded in Amsterdam in 2012 to support the development of Elasticsearch and related commercial products and services.
The following are the core products of the Elastic Stack along with their functionalities:
- Elasticsearch is a RESTful distributed search engine built on top of Apache Lucene and released under an Apache license. It is Java-based and can ingest data as well as search and index document files in diverse formats.
- Logstash is a data collection engine that unifies data from multiple sources, offers database normalization and distributes the data. The product was originally optimized for log data, but has expanded the scope to take data from all sources.
- Kibana is an open source data visualization and exploration tool that is specialized for large volumes of streaming and real-time data. The software makes complex data streams more easily and quickly understandable through graphic representation.
- Beats are data shippers that are installed on servers as agents used to send different types of operational data to Elasticsearch either directly or through Logstash, where the data might be enhanced or archived.
Who uses the Elastic Stack and why?
The Elastic Stack presents a steeper learning curve than some comparable products, as well as more setup, owing in part to its open source nature.
The following are popular use cases of the Elastic Stack:
- Big data. Companies with large volumes of unstructured, semistructured and structured data sets can use the Elastic Stack to run their data operations. Netflix, Facebook and LinkedIn are examples of successful companies using the stack.
- Applications with complex search requirements. Any application with complex search requirements can benefit greatly by using the Elastic Stack as the underlying engine for advanced searches.
- Other prominent use cases. The Elastic Stack is used in infrastructure metrics and container monitoring, logging and log analytics, application performance monitoring, geospatial data analysis and visualization, security and business analytics, and scraping and combining public data.
The following are important reasons organizations might consider integrating the Elastic Stack into their daily operations:
- Free and open source. One of the biggest advantages of using the ELK Stack is that it is open source and free to use. Companies don't have to pay any upfront or ongoing software licensing fees to take advantage of this stack platform.
- Centralized logging. The ELK Stack offers centralized logging capabilities to aggregate server logs from complex cloud environments into a single searchable index. This helps with security monitoring and root cause analysis, as data can be correlated from multiple sources.
- Multiple hosting options. The ELK Stack provides a range of hosting options. An organization with the right resources can choose to install it on a local server and manage it in house. Companies can also choose to deploy the ELK Stack as a managed service with products like Amazon OpenSearch by partnering with a specialist managed service provider.
- Real-time data visualization and analysis. The Elastic Stack's Kibana enables users to interpret and understand complex structures in real time by converting data into visual representations, such as graphs and histograms.
- Scalability. The ELK Stack deploys at scale and works across all types of infrastructures, including SaaS, containers or bare metal, private cloud and public cloud. For example, Elasticsearch is a distributed document store that saves complex data as JavaScript Object Notation documents. This makes it easy to scale and implement in any large organization.
- Clients in multiple programming languages. The Elastic Stack enables companies to conduct Elasticsearch for multiple languages in the codebase. The Elastic Stack has released official clients for a minimum of 12 programming languages, including JavaScript, Python, .NET and Perl. Elastic also offers support and bug fixes for all the clients and provides support for queries.
How to use the Elastic Stack
To use the Elastic Stack, users should first download the three open source software products -- Elasticsearch, Logstash and Kibana -- from their respective links on the Elastic website. After the files are unzipped, users can set up these programs on their local system.
After getting started with the ELK stack, these components can be deployed together to aggregate, index and search log data, transform processes, and produce data visualizations.
How are successful organizations using the Elastic Stack?
The following are real-world examples of a few successful organizations and how they use the stack:
- Netflix. A popular movie and content streaming service, Netflix greatly depends on the Elastic Stack to monitor and analyze customer service operations and security-related logs. The company also uses ELK for its automatic replication, flexible scheme and numerous plugins.
- LinkedIn. This popular social networking platform for professionals uses the Elastic Stack with Apache Kafka to monitor performance and security and to ingest and process its data streams in real time. LinkedIn's ELK operations include more than 100 clusters across more than 20 teams and six data centers.
- SoundCloud. An online audio streaming and distribution platform, SoundCloud uses Elasticsearch for its real-time search and analytics engine that serves millions of users worldwide.
- Lyft. This ride-sharing app that connects passengers with drivers successfully incorporates Elasticsearch for its operational log analytics.
- GitHub. The world's largest repository for developers to store and manage their code, GitHub uses Elasticsearch to index more than 8 million code repositories as well as critical event data sources.
- Tripwire. Focusing on security and compliance automation, Tripwire uses the Elastic Stack to perform information packet log analysis.
Elastic Stack challenges and fixes
While the ELK Stack yields unsurpassed benefits to organizations, sometimes issues and challenges creep up. The following are a few known challenges with the Elastic Stack and some fixes:
Limited storage capacity. Vast amounts of data can be generated if an ELK Stack is deployed in a multisystem and application environment. If a company does not filter, analyze and discard the noncritical logs efficiently, storage space and costs can spiral out of control. The issue commonly occurs in on-premises ELK Stack deployments where a large number of log files might be stored on traditional disk storage, leaving insufficient storage capacity for the ELK outputs. This is also true for mission-critical log files that would first need to be backed up and then stored separately in an isolated environment, further reducing the storage capacity.
- Fixes. Cloud-based storage is a great option, as it offers a higher degree of flexibility with the log files and the scalability of on-demand resources. Cloud-based storage is also cheaper than traditional disk storage, but most companies maintain in-house experts for managing the underlying cloud infrastructure. For instance, if a company decides to go with the AWS web-based cloud storage service Amazon Simple Storage Service, it can assess and select from the various Amazon S3 storage classes based on performance needs and storage requirements.
Poor indexing. The data indexed in Elasticsearch and the ELK Stack is stored in one or more indices. These indices are responsible for both data distribution and separation, but sometimes this can cause complications. Since the entire ELK Stack is interconnected, if one aspect of the stack is upgraded, the write indices function will likely be affected. This is also a known problem when upgrading to Beats 7.x, which renders all indices created by earlier versions of Beats incompatible with Kibana and can cause other performance issues.
- Fixes. Elastic recommends on its website that a full upgrade of Elasticsearch and Kibana to version 7.0 should be done before upgrading Beats to solve the problem. To keep the ELK Stack healthy, options such as adding multiple shards, configuring throttling and increasing the indices buffer should be considered.
Networking problems. Specific networking rules are applied to an ELK Stack, and any networking issue can affect the entire stack. For example, if Logstash is hosted on the ELK server, the client servers could disconnect or time out.
- Fixes. To avoid networking issues with the stack, proper routing rules should be implemented. If the network problems are still occurring after the evaluation of the routing rules, then the firewall rules and port configurations should be inspected.
Noisy logs. Applications produce millions of low-priority logs that can sometimes become cluttered. If not managed correctly, these logs can force ELK Stack users to query through this irrelevant data. This can affect productivity, increasing the time required to track down a bug or gain new business insights.
- Fixes. Organizations can use a machine learning-powered logging product, such as Splunk or Coralogix, to help them avoid the adverse effects of these logs.
Besides providing complete end-to-end log management, the Elastic Stack is also useful for detecting security loopholes. Learn how the Elastic Security app enables affordable threat hunting.