Data and data management

Terms related to data, including definitions about data warehousing and words and phrases about data management.
  • 3 V's (volume, velocity and variety) - The 3 V's (volume, velocity and variety) are three defining properties or dimensions of big data.
  • 5V's of big data - The 5 V's of big data -- velocity, volume, value, variety and veracity -- are the five main and innate characteristics of big data.
  • 99.999 (Five nines or Five 9s) - In computers, 99.
  • ACID (atomicity, consistency, isolation, and durability) - In transaction processing, ACID (atomicity, consistency, isolation, and durability) is an acronym and mnemonic device used to refer to the four essential properties a transaction should possess to ensure the integrity and reliability of the data involved in the transaction.
  • address space - Address space is the amount of memory allocated for all possible addresses for a computational entity -- for example, a device, a file, a server or a networked computer.
  • Allscripts - Allscripts is a vendor of electronic health record systems for physician practices, hospitals and healthcare systems.
  • alternate data stream (ADS) - An alternate data stream (ADS) is a feature of Windows New Technology File System (NTFS) that contains metadata for locating a specific file by author or title.
  • Amazon Simple Storage Service (Amazon S3) - Amazon Simple Storage Service (Amazon S3) is a scalable, high-speed, web-based cloud storage service.
  • Anaplan - Anaplan is a web-based enterprise platform for business planning.
  • Apache Solr - Apache Solr is an open source search platform built upon a Java library called Lucene.
  • Apple User Enrollment - Apple User Enrollment (UE) is a form of mobile device management (MDM) for Apple products that supports iOS 13 and macOS Catalina.
  • atomic data - In a data warehouse, atomic data is the lowest level of detail.
  • availability bias - In psychology, the availability bias is the human tendency to rely on information that comes readily to mind when evaluating situations or making decisions.
  • Azure Data Studio (formerly SQL Operations Studio) - Azure Data Studio is a Microsoft tool, originally named SQL Operations Studio, for managing SQL Server databases and cloud-based Azure SQL Database and Azure SQL Data Warehouse systems.
  • big data - Big data is a combination of structured, semi-structured and unstructured data that organizations collect, analyze and mine for information and insights.
  • big data analytics - Big data analytics is the often complex process of examining big data to uncover information -- such as hidden patterns, correlations, market trends and customer preferences -- that can help organizations make informed business decisions.
  • big data as a service (BDaaS) - Big data as a service (BDaS) is the delivery of data platforms and tools by a cloud provider to help organizations process, manage and analyze large data sets so they can generate insights to improve business operations and gain a competitive advantage.
  • big data engineer - A big data engineer is an information technology (IT) professional who is responsible for designing, building, testing and maintaining complex data processing systems that work with large data sets.
  • big data management - Big data management is the organization, administration and governance of large volumes of both structured and unstructured data.
  • big data storage - Big data storage is a compute-and-storage architecture that collects and manages large data sets and enables real-time data analytics.
  • block diagram - A block diagram is a visual representation of a system that uses simple, labeled blocks that represent single or multiple items, entities or concepts, connected by lines to show relationships between them.
  • blockchain storage - Blockchain storage is a way of saving data in a decentralized network, which utilizes the unused hard disk space of users across the world to store files.
  • box plot - A box plot is a graphical rendition of statistical data based on the minimum, first quartile, median, third quartile, and maximum.
  • brontobyte - A brontobyte is an unofficial measure of memory or data storage that is equal to 10 to the 27th power of bytes.
  • business analytics - Business analytics (BA) is a set of disciplines and technologies for solving business problems using data analysis, statistical models and other quantitative methods.
  • business continuity - Business continuity is an organization's ability to maintain critical business functions during and after a disaster has occurred.
  • capacity management - Capacity management is the broad term describing a variety of IT monitoring, administration and planning actions that ensure that a computing infrastructure has adequate resources to handle current data processing requirements, as well as the capacity to accommodate future loads.
  • chatbot - A chatbot is a software or computer program that simulates human conversation or "chatter" through text or voice interactions.
  • chief data officer (CDO) - A chief data officer (CDO) in many organizations is a C-level executive whose position has evolved into a range of strategic data management responsibilities related to the business to derive maximum value from the data available to the enterprise.
  • CICS (Customer Information Control System) - CICS (Customer Information Control System) is middleware that sits between the z/OS IBM mainframe operating system and business applications.
  • clickstream data (clickstream analytics) - Clickstream data and clickstream analytics are the processes involved in collecting, analyzing and reporting aggregate data about which pages a website visitor visits -- and in what order.
  • clinical data analyst - A clinical data analyst -- also referred to as a 'healthcare data analyst' -- is a healthcare information professional who verifies the validity of scientific experiments and data gathered from research.
  • clinical decision support system (CDSS) - A clinical decision support system (CDSS) is an application that analyzes data to help healthcare providers make decisions and improve patient care.
  • cloud audit - A cloud audit is an assessment of a cloud computing environment and its services, based on a specific set of controls and best practices.
  • Cloud Data Management Interface (CDMI) - The Cloud Data Management Interface (CDMI) is an international standard that defines a functional interface that applications use to create, retrieve, update and delete data elements from cloud storage.
  • cloud SLA (cloud service-level agreement) - A cloud SLA (cloud service-level agreement) is an agreement between a cloud service provider and a customer that ensures a minimum level of service is maintained.
  • cloud storage - Cloud storage is a service model in which data is transmitted and stored on remote storage systems, where it is maintained, managed, backed up and made available to users over a network (typically the internet).
  • cloud storage API - A cloud storage API is an application programming interface that connects a locally based application to a cloud-based storage system so that a user can send data to it and access and work with data stored in it.
  • cloud storage service - A cloud storage service is a business that maintains and manages its customers' data and makes that data accessible over a network, usually the internet.
  • cluster quorum disk - A cluster quorum disk is the storage medium on which the configuration database is stored for a cluster computing network.
  • cold backup (offline backup) - A cold backup is a backup of an offline database.
  • complex event processing (CEP) - Complex event processing (CEP) is the use of technology to predict high-level events.
  • compliance as a service (CaaS) - Compliance as a service (CaaS) is a cloud service that specifies how a managed service provider (MSP) helps an organization meet its regulatory compliance mandates.
  • conflict-free replicated data type (CRDT) - A conflict-free replicated data type (CRDT) is a data structure that lets multiple people or applications make changes to the same piece of data.
  • conformed dimension - In data warehousing, a conformed dimension is a dimension that has the same meaning to every fact with which it relates.
  • consumer data - Consumer data is the information that organizations collect from individuals who use internet-connected platforms, including websites, social media networks, mobile apps, text messaging apps or email systems.
  • containers (container-based virtualization or containerization) - Containers are a type of software that can virtually package and isolate applications for deployment.
  • content personalization - Content personalization is a branding and marketing strategy in which webpages, email and other forms of content are tailored to match the characteristics, preferences or behaviors of individual users.
  • Continuity of Care Document (CCD) - A Continuity of Care Document (CCD) is an electronic, patient-specific document detailing a patient's medical history.
  • Continuity of Care Record (CCR) - The Continuity of Care Record, or CCR, provides a standardized way to create electronic snapshots about a patient's health information.
  • core banking system - A core banking system is the software that banks use to manage their most critical processes, such as customer accounts, transactions and risk management.
  • correlation - Correlation is a statistical measure that indicates the extent to which two or more variables fluctuate in relation to each other.
  • correlation coefficient - A correlation coefficient is a statistical measure of the degree to which changes to the value of one variable predict change to the value of another.
  • CRM (customer relationship management) analytics - CRM (customer relationship management) analytics comprises all of the programming that analyzes data about customers and presents it to an organization to help facilitate and streamline better business decisions.
  • CRUD cycle (Create, Read, Update and Delete Cycle) - The CRUD cycle describes the elemental functions of a persistent database in a computer.
  • cryptographic nonce - A nonce is a random or semi-random number that is generated for a specific use.
  • curation - Curation is a field of endeavor involved with assembling, managing and presenting some type of collection.
  • customer data integration (CDI) - Customer data integration (CDI) is the process of defining, consolidating and managing customer information across an organization's business units and systems to achieve a "single version of the truth" for customer data.
  • data abstraction - Data abstraction is the reduction of a particular body of data to a simplified representation of the whole.
  • data analytics (DA) - Data analytics (DA) is the process of examining data sets to find trends and draw conclusions about the information they contain.
  • data anonymization - Data anonymization describes various techniques to remove or block data containing personally identifiable information (PII).
  • data availability - Data availability is a term used by computer storage manufacturers and storage service providers to describe how data should be available at a required level of performance in situations ranging from normal through disastrous.
  • data breach - A data breach is a cyber attack in which sensitive, confidential or otherwise protected data has been accessed or disclosed in an unauthorized fashion.
  • data catalog - A data catalog is a software application that creates an inventory of an organization's data assets to help data professionals and business users find relevant data for analytics uses.
  • data center chiller - A data center chiller is a cooling system used in a data center to remove heat from one element and deposit it into another element.
  • data center services - Data center services provide the supporting components necessary to the proper operation of a data center.
  • data citizen - A data citizen is an employee who relies on data to make decisions and perform job responsibilities.
  • data classification - Data classification is the process of organizing data into categories that make it easy to retrieve, sort and store for future use.
  • data clean room - A data clean room is a technology service that helps content platforms keep first person user data private when interacting with advertising providers.
  • data cleansing (data cleaning, data scrubbing) - Data cleansing, also referred to as data cleaning or data scrubbing, is the process of fixing incorrect, incomplete, duplicate or otherwise erroneous data in a data set.
  • data collection - Data collection is the process of gathering data for use in business decision-making, strategic planning, research and other purposes.
  • data curation - Data curation is the process of creating, organizing and maintaining data sets so they can be accessed and used by people looking for information.
  • data de-identification - Data de-identification is decoupling or masking data, to prevent certain data elements from being associated with the individual.
  • data destruction - Data destruction is the process of destroying data stored on tapes, hard disks and other forms of electronic media so that it's completely unreadable and can't be accessed or used for unauthorized purposes.
  • data dignity - Data dignity, also known as data as labor, is a theory positing that people should be compensated for the data they have created.
  • Data Dredging (data fishing) - Data dredging -- sometimes referred to as data fishing -- is a data mining practice in which large data volumes are analyzed to find any possible relationships between them.
  • data engineer - A data engineer is an IT professional whose primary job is to prepare data for analytical or operational uses.
  • data exploration - Data exploration is the first step in data analysis involving the use of data visualization tools and statistical techniques to uncover data set characteristics and initial patterns.
  • data feed - A data feed is an ongoing stream of structured data that provides users with updates of current information from one or more sources.
  • data governance policy - A data governance policy is a documented set of guidelines for ensuring that an organization's data and information assets are managed consistently and used properly.
  • data gravity - Data gravity is the ability of a body of data to attract applications, services and other data.
  • data historian - A data historian is a software program that records the data created by processes running in a computer system.
  • data in motion - Data in motion, also referred to as data in transit or data in flight, is a process in which digital information is transported between locations either within or between computer systems.
  • data in use - Data in use is data that is currently being updated, processed, accessed and read by a system.
  • data ingestion - Data ingestion is the process of obtaining and importing data for immediate use or storage in a database.
  • data integration - Data integration is the process of combining data from multiple source systems to create unified sets of information for both operational and analytical uses.
  • data integrity - Data integrity is the assurance that digital information is uncorrupted and can only be accessed or modified by those authorized to do so.
  • data lake - A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed for analytics applications.
  • data lakehouse - A data lakehouse is a data management architecture that combines the key features and the benefits of a data lake and a data warehouse.
  • data lifecycle management (DLM) - Data lifecycle management (DLM) is a policy-based approach to managing the flow of an information system's data throughout its lifecycle: from creation and initial storage to when it becomes obsolete and is deleted.
  • data literacy - Data literacy is the ability to derive meaningful information from data, just as literacy in general is the ability to derive information from the written word.
  • data loss - Data loss is the intentional or unintentional destruction of information.
  • data management platform (DMP) - A data management platform (DMP), also referred to as a unified data management platform (UDMP), is a centralized system for collecting and analyzing large sets of data originating from disparate sources.
  • data marketplace (data market) - A data marketplace, or data market, is an online store where people can buy data.
  • data masking - Data masking is a method of creating a structurally similar but inauthentic version of an organization's data that can be used for purposes such as software testing and user training.
  • data mesh - Data mesh is a decentralized data management architecture for analytics and data science.
  • data migration - Data migration is the process of transferring data between data storage systems, data formats or computer systems.
  • data minimization - Data minimization aims to reduce the amount of collected data to only include necessary information for a specific purpose.
  • data mining - Data mining is the process of sorting through large data sets to identify patterns and relationships that can help solve business problems through data analysis.
  • data modeling - Data modeling is the process of creating a simplified visual diagram of a software system and the data elements it contains, using text and symbols to represent the data and how it flows.