kentoh - Fotolia
Manage Elasticsearch documents with indices and shards
Before they can put the search capabilities of Elasticsearch to use for their organization, admins need to properly organize and manage documents via indices, shards, replicas and mapping.
Elasticsearch is a NoSQL JSON document database that provides search functionality for diverse endeavors, such as IT systems management and monitoring or customer behavior analysis.
Elasticsearch provides data storage and retrieval capabilities and supports diverse search types. Organizations can use Elasticsearch to find specific files and relationships in information.
Before standing up Elasticsearch, understand the concept of indices for data and document organization; shards -- which are subdivisions of an index -- for workload distribution; and document mapping to define how documents are indexed and stored. Follow this tutorial to manage Elasticsearch documents.
Editor's note: Check out the author's companion articles to connect Elasticsearch nodes to a cluster, and dive deeper into the use of shards for workload distribution.
Indices and shards
An index is a collection of documents, and a shard is a subset thereof. Elasticsearch uses a hashing algorithm to calculate a value over the document, which it then uses to distribute data across nodes in a cluster.
Indices are named in a URL, such as http://localhost:9200/index. For example, a customer index URL could be http://localhost:9200/customers, and an index for internal employees could be https://localost:9200/employees. To view a list of all indices in Elasticsearch, use curl -XGET http://localhost:9200/_cat/indices.
The index type is a logical partition to store different document types within a single index. While it is an obsolete concept in Elasticsearch, the tool still supports backward compatibility for it, so administrators managing Elasticsearch documents should be familiar with how it works. An index type is the field following the index marker in the curl path -- http://localhost:9200/index/type -- and, by convention, it has the value _doc.
To use this function, write a document to the URL http://localhost:9200:index/_doc. Any attempt to write data to a different type in the same index -- represented by xxx in the example below -- will result in an error:
illegal_argument_exception","reason":"Rejecting mapping update to [customers] as the final mapping would have more than 1 type: [_doc, xxx]"}]
Work with Elasticsearch documents
Users assign each document in Elasticsearch a field _id, which is known as the document ID. To give the document ID, post to the URL http://localhost:/index/_doc/_id. The _id is added to the doc:
{
"_index" : "customers",
"_type" : "_doc",
"_id" : "1",
To create a document, use the POST command (XPOST with curl) with the -d option, which runs Elasticsearch as a daemon, and declare the use of JSON formatting in the document. We post this document, which catalogs customer Fred, who has ordered socks and shoes, to index customers with Document ID 1.
curl -XPOST --header 'Content-Type: application/json' http://localhost:9200/customers/_doc/1 -d '{
"customers": {
"name": "Fred",
"age": 42,
"orders": [{
"order number": 1,
"item": "socks"
},
{
"order number": 2,
"item": "shoes"
}
]
}
}'
Use the wildcard character q=* to list all documents in the index:
curl -XGET http://localhost:9200/customers/_search?pretty=true&q=*
This command will return the following results, since Fred is the only customer whose information is currently documented in the customers index:
{
"_index" : "customers",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"customers" : {
"name" : "Fred",
"age" : 42,
"orders" : [
{
"order number" : 1,
"item" : "socks"
},
{
"order number" : 2,
"item" : "shoes"
}
]
}
}
}
]
}
Mapping
Elasticsearch document mapping is essentially a schema. Mapping also indicates the number of shards, along with the number of replicas, which are copies of shards.
Elasticsearch creates mapping automatically, as documents are added to an index, but admins can also define mapping themselves. Manual mapping is useful to call out a structure that Elasticsearch's automated approach wouldn't detect or for a more granular level of control over the index.
There are multiple commands to view the index's mapping, including these two examples that direct Elasticsearch to pretty print the JSON-formatted information, which should make it easy for a person to read:
curl -XGET http://localhost:9200/customers?pretty
curl -XGET http://localhost:9200/customers/_mapping/_doc?pretty
There are important elements to accurate mapping, such as:
- The element "properties" defines JSON documents. Each mapping should only contain one "properties" line; multiple instances indicate a nested JSON document.
- A "type" keyword is required to query text fields. This indexes documents in the most efficient way possible.
- The default index settings are listed at the bottom of the mapping information. These defaults are set because the document was created prior to the mapping. To create mapping prior to a document, use curl -XPUT http://localhost:9200/index alongside the necessary JSON definition.
The document mapping, under the defined conditions, shows the hierarchy of information in indexes. For example, we know that there is an index of customers and, within that, the properties associated with them, such as customers' names, ages and orders:
{
"customers" : {
"aliases" : { },
"mappings" : {
"_doc" : {
"properties" : {
"customers" : {
"properties" : {
"age" : {
"type" : "long"
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"orders" : {
"properties" : {
"item" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"order number" : {
"type" : "long"
}
}
}
}
}
}
}
},
"settings" : {
"index" : {
"creation_date" : "1550335409699",
"number_of_shards" : "5",
"number_of_replicas" : "1",
"uuid" : "MobWrunWRUW7y2pbtcjTfQ",
"version" : {
"created" : "6040299"
},
"provided_name" : "customers"
}
}
}
}