What is taxonomy in computing?
Taxonomy is the science of classification according to a predetermined system, with the resulting catalog used to provide a conceptual framework for discussion, analysis or information retrieval. The word is rooted in two Greek words: taxis meaning arrangement or division, and nomos meaning law.
Taxonomy is a methodology that systematically classifies elements in a defined hierarchical form. The concept is common in life sciences, botany and zoology, where taxonomies are used to classify living things, such as plants and animals. Figure 1 depicts the taxonomy structure used in biology and related sciences.

In these areas, taxonomists identify, describe and arrange different plant or animal species in hierarchical groups, including superior and subordinate elements. Creating a taxonomy helps classify living organisms based on their characteristics and simplifies information retrieval and cross-referencing for zoologists and botanists.
One of the best-known and most commonly used taxonomies in biology is the one devised by the Swedish scientist Carl Linnaeus. In the Linnaean system of binomial nomenclature, every organism is classified based on its genus and species. The nomenclature is known as a binomial one since two terms are used -- both Latinized. For example, the American robin is classified in the Linnaean system as Turdus migratorius, while modern humans are known as Homo sapiens.
Biologists use a classification system to describe biological diversity. Classification of organisms results in the scientific names of existing and new species.
In addition to biology, taxonomies are also created in other real-world areas, such as computing and business.
Taxonomy in computing and web design
For example, in web portal design, taxonomies describe categories and subcategories of topics found on the website. The categorization of words on Informa TechTarget's WhatIs site is similar to any web portal taxonomy.
In website design, a taxonomy systematically classifies the various content elements that will go into the site. It shows how things are -- or will be -- organized on the site and classifies them as pages, sections or categories, for example. The elements are organized logically to simplify navigation for the site's users, allowing them to understand its setup and purpose.
A website's taxonomic structure organizes its content and URLs. URL structure refers to how URLs, including domains and subdomains, are organized to reflect the content in each webpage. Typically, the structure includes all the directories and subdirectories within the main domain.
As the content on each page becomes more specific, subdirectories and URL slugs change while the main domain remains the same. Thus, TechTarget's main domain will always be www.techtarget.com. However, for the page titled "Taxonomy," the URL will change to www.techtarget.com/searchcontentmanagement/definition/taxonomy to reflect this page's unique content. Figure 2 discusses the considerations when developing a website taxonomy.

Taxonomies for digital content
Digital content is expanding at a rapid pace. To maintain its usability and usefulness, it must be organized into a proper structure with multiple interrelated components. A taxonomy helps create this structure by grouping information into logical chunks. The taxonomy can include metadata, which is data that describes the data stored in the taxonomy. That structure and metadata make it easy to understand what each piece of content is about, simplifying content management retrieval and use.
Advances in natural language processing (NLP) technology, for example, have made it possible for NLP applications to combine with content taxonomies to tag, classify, organize and even summarize natural language text. Programs with taxonomic capabilities -- programmatic taxonomy -- provide hierarchical context to digital content so that it becomes discoverable, not only for human users but also for machines.
Programmatic taxonomy tools also automate document classification, information extraction and speech-to-text conversion by identifying keywords and tagging niche terms. These capabilities are particularly useful in fields where the content generated is often complex, jargony and impossible for traditional speech-to-text applications to understand. Examples include medical and legal documents. Figure 3 shows how Microsoft evolved its taxonomy for threat organizations.

Qualities of a good taxonomy
A good taxonomic classification takes into account the importance of separating elements of a group, or taxon, into subgroups, or taxa, that are mutually exclusive and unambiguous and -- taken together -- include all known possibilities. These possibilities should help disseminate knowledge about the topic of the taxonomy.
The taxonomy itself should be simple, easy to remember and easy to use. It should follow a hierarchical format and use a uniform and easily understood nomenclature to enable information understanding and retrieval.
Regardless of the subject or domain, the taxonomy should be based on rules that help classify and categorize the objects or elements in that domain. Moreover, the rules should be easy to understand and should be implemented consistently to maintain the taxonomy's rigor and integrity.
Ideally, a taxonomy should be rigorous enough to ensure all newly discovered objects can fit into a category. Each object can inherit the properties of its superior class -- the class above it -- and also have its own additional or unique properties. Figure 4 presents steps for classifying data.

Taxonomies used in business
Taxonomies are also important in business. Finding the right information quickly is an important daily business activity. A well-organized and administered content taxonomy and content management system (CMS) can simplify the search process and increase employee efficiency.
Metadata -- data about data -- helps users locate information. However, without taxonomic planning, search engines might have limited functionality and value. This means that metadata tags, such as keywords and phrases must be properly applied to content. This can be greatly helped through training and a taxonomy for identifying and tagging content.
A content strategy with associated procedures can help mitigate enterprise search challenges, thus helping users get the information they need. Such a strategy can improve search and content administration features of content management systems, workflows and business intelligence platforms.
Examples of business systems that can use content taxonomies include knowledge management systems, document management systems, enterprise resource planning (ERP) and customer resource management systems.
Creating a content taxonomy
When developing a content taxonomy, the first step is capturing as much data from users on their data and content requirements. Interview representatives from key departments to help define multiple data hierarchies and enterprise-wide search strategies. In time, users can search on single words, phrases or even full text to find what they need. Training helps users understand the taxonomy so they can leverage the search function efficiently.
The structure of content taxonomies must accurately reflect user needs but also enterprise characteristics. Identify application suites that are key candidates for a taxonomy, such as human resources, finance, legal and research and development. Capture relevant metadata and keywords and phrases that can be used by search engines. For global enterprises, it might be necessary to identify where the content might reside, such as specific countries. Then, the taxonomy can be structured by location, division, department, office function and type of content, for example.
Use the following steps when launching a content taxonomy initiative:
- Identify current methods and technology for identifying and tagging data. This means understanding how metadata and other content tools are used and determining how they can be efficiently used in searches.
- Reuse existing keywords and phrases for search tags. When using a content management system, it helps to use existing tags to formulate a formal taxonomy structure.
- Capture important metadata and search tags from employees. As noted earlier, this is one of the most important parts of the process, as employees are the best source of taxonomy data.
- Position specific tags within relevant systems. This means locating search and content tags within specific systems, such as those in human resources and finance. These can be in addition to an enterprise-level content tagging taxonomy.
- Link keywords and related search elements to each other. This strategy recognizes that a search can originate from different places yet use a single search engine. If multiple systems can use content, ensure that links are embedded to support content mapping
- Perform initial and refresher training. This ensures that new and existing employees use search functions efficiently.
- Review content taxonomies periodically. Assuming that employees might create, modify or delete tags, ensure that content administrators can update the taxonomy to keep it current and usable.
Software to facilitate taxonomies
When developing an enterprise-level search capability, it may be useful to consider established third-party search engines. Here is an unranked, alphabetical list of some enterprise search software tools:
- Algolia.
- AlphaSense.
- Apache Lucene.
- Elasticsearch.
- Glean.
- Guru.
- IBM Watson Discovery.
- Lucidworks Platform.
- Luigi's Box.
- Searchspring.
Cloud-based search-as-a-service tools are also available, such as Microsoft Azure AI Search and Amazon CloudSearch.
Enterprise search software offers the following benefits:
- Better user experience and satisfaction by quickly locating information across multiple systems.
- Reduce risk by ensuring that multiple versions of the same information can be validated as the true version.
- Delivering enhanced business intelligence by capturing and displaying structured and unstructured content together.
Benefits of taxonomies in the digital age
Any digital application that generates or uses content can benefit from the content structure and nomenclature provided by a taxonomy. Digital taxonomies group, categorize and organize content to make information easier for users or applications to find.
A systematic hierarchical taxonomy also does the following:
- Helps users to find related information and correlations.
- Simplifies navigation, thus improving the overall user experience.
- Reduces the time and effort needed to track, manage or update content.
- Improves content quality and usefulness.
- Organizes content metadata and improves its quality.
- Enables content owners to govern and protect data assets.
- Facilitates data analysis and reporting.
- Supports data governance and compliance requirements.
Disadvantages of taxonomies
While a well-organized and automated taxonomy offers many important benefits, a few downsides include the costs, time and resources needed to develop a taxonomy; the need for periodic reviews by subject matter and domain experts; and the need to keep the process focused to minimize unnecessary changes or reclassifications.
The impact of AI on the future of taxonomy
Many search engine software products and cloud-based services have artificial intelligence (AI) capabilities and are available worldwide. This can strengthen the overall search process, use natural language processing to improve search capabilities using conversational language, personalize search results based on a user profile, provide enhanced search analytics and security by filtering out suspicious content. The use of AI-supported search engines means that the use of taxonomies will continue and is likely to be enhanced through AI.
Developing an enterprise taxonomy helps users locate content more efficiently when searching through files in a content management system. Learn how a content tagging taxonomy improves enterprise search.