Data and data management

Terms related to data, including definitions about data warehousing and words and phrases about data management.
  • data processing - Data processing refers to essential operations executed on raw data to transform the information into a useful format or structure that provides valuable insights to a user or organization.
  • data profiling - Data profiling refers to the process of examining, analyzing, reviewing and summarizing data sets to gain insight into the quality of data.
  • data protection as a service (DPaaS) - Data protection as a service (DPaaS) involves managed services that safeguard an organization's data.
  • data protection authorities - Data protection authorities (DPAs) are public authorities responsible for enforcing data protection laws and regulations within a specific jurisdiction.
  • data protection management (DPM) - Data protection management (DPM) is the administration, monitoring and management of backup processes to ensure backup tasks run on schedule and data is securely backed up and recoverable.
  • data quality - Data quality is a measure of a data set's condition based on factors such as accuracy, completeness, consistency, reliability and validity.
  • data residency - Data residency refers to the physical or geographic location of an organization's data or information.
  • data retention policy - In business settings, data retention is a concept that encompasses all processes for storing and preserving data, as well as the specific time periods and policies businesses enforce that determine how and for how long data should be retained.
  • data scientist - A data scientist is an analytics professional who is responsible for collecting, analyzing and interpreting data to help drive decision-making in an organization.
  • data set - A data set, also spelled 'dataset,' is a collection of related data that's usually organized in a standardized format.
  • data source name (DSN) - A data source name (DSN) is a data structure containing information about a specific database to which an Open Database Connectivity (ODBC) driver needs to connect.
  • data splitting - Data splitting is when data is divided into two or more subsets.
  • data stewardship - Data stewardship is the management and oversight of an organization's data assets to help provide business users with high-quality data that is easily accessible in a consistent manner.
  • data streaming - Data streaming is the continuous transfer of data from one or more sources at a steady, high speed for processing into specific outputs.
  • data structure - A data structure is a specialized format for organizing, processing, retrieving and storing data.
  • Data Transfer Project (DTP) - Data Transfer Project (DTP) is an open source initiative to facilitate customer-controlled data transfers between two online services.
  • data virtualization - Data virtualization is an umbrella term used to describe an approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data.
  • data warehouse - A data warehouse is a repository of data from an organization's operational systems and other sources that supports analytics applications to help drive business decision-making.
  • data warehouse as a service (DWaaS) - Data warehouse as a service (DWaaS) is an outsourcing model in which a cloud service provider configures and manages the hardware and software resources a data warehouse requires, and the customer provides the data and pays for the managed service.
  • database (DB) - A database is a collection of information that is organized so that it can be easily accessed, managed and updated.
  • database management system (DBMS) - A database management system (DBMS) is a software system for creating and managing databases.
  • database marketing - Database marketing is a systematic approach to the gathering, consolidation and processing of consumer data.
  • database replication - Database replication is the frequent electronic copying of data from a database in one computer or server to a database in another -- so that all users share the same level of information.
  • DataOps - DataOps is an Agile approach to designing, implementing and maintaining a distributed data architecture that will support a wide range of open source tools and frameworks in production.
  • Db2 - Db2 is a family of database management system (DBMS) products from IBM that serve a number of different operating system (OS) platforms.
  • decision-making process - A decision-making process is a series of steps one or more individuals take to determine the best option or course of action to address a specific problem or situation.
  • deep analytics - Deep analytics is the application of sophisticated data processing techniques to yield information from large and typically multi-source data sets comprised of both unstructured and semi-structured data.
  • demand planning - Demand planning is the process of forecasting the demand for a product or service so it can be produced and delivered more efficiently and to the satisfaction of customers.
  • descriptive analytics - Descriptive analytics is a type of data analytics that looks at past data to give an account of what has happened.
  • digital twin - A digital twin is a virtual representation of a real-world entity or process.
  • digital wallet - In general, a digital wallet is a software application, usually for a smartphone, that serves as an electronic version of a physical wallet.
  • dimension - In data warehousing, a dimension is a collection of reference information that supports a measurable event, such as a customer transaction.
  • dimension table - In data warehousing, a dimension table is a database table that stores attributes describing the facts in a fact table.
  • disambiguation - Disambiguation is the process of determining a word's meaning -- or sense -- within its specific context.
  • disaster recovery (DR) - Disaster recovery (DR) is an organization's ability to respond to and recover from an event that negatively affects business operations.
  • distributed database - A distributed database is a database that consists of two or more files located in different sites either on the same network or on entirely different networks.
  • distributed ledger technology (DLT) - Distributed ledger technology (DLT) is a digital system for recording the transaction of assets in which the transactions and their details are recorded in multiple places at the same time.
  • document - A document is a form of information that might be useful to a user or set of users.
  • Dublin Core - Dublin Core is an international metadata standard formally known as the Dublin Core Metadata Element Set and includes 15 metadata (data that describes data) terms.
  • ebXML (Electronic Business XML) - EbXML (Electronic Business XML or e-business XML) is a project to use the Extensible Markup Language (XML) to standardize the secure exchange of business data.
  • Eclipse (Eclipse Foundation) - Eclipse is a free, Java-based development platform known for its plugins that allow developers to develop and test code written in other programming languages.
  • edge analytics - Edge analytics is an approach to data collection and analysis in which an automated analytical computation is performed on data at a sensor, network switch or other device instead of waiting for the data to be sent back to a centralized data store.
  • empirical analysis - Empirical analysis is an evidence-based approach to the study and interpretation of information.
  • empiricism - Empiricism is a philosophical theory applicable in many disciplines, including science and software development, that human knowledge comes predominantly from experiences gathered through the five senses.
  • encoding and decoding - Encoding and decoding are used in many forms of communications, including computing, data communications, programming, digital electronics and human communications.
  • encryption key management - Encryption key management is the practice of generating, organizing, protecting, storing, backing up and distributing encryption keys.
  • enterprise search - Enterprise search is a type of software that lets users find data spread across organizations' internal repositories, such as content management systems, knowledge bases and customer relationship management (CRM) systems.
  • entity - An entity is a single thing with a distinct separate existence.
  • entity relationship diagram (ERD) - An entity relationship diagram (ERD), also known as an 'entity relationship model,' is a graphical representation that depicts relationships among people, objects, places, concepts or events in an information technology (IT) system.
  • Epic Systems - Epic Systems, also known simply as Epic, is one of the largest providers of health information technology, used primarily by large U.
  • erasure coding (EC) - Erasure coding (EC) is a method of data protection in which data is broken into fragments, expanded and encoded with redundant data pieces, and stored across a set of different locations or storage media.
  • exabyte (EB) - An exabyte (EB) is a large unit of computer data storage, two to the sixtieth power bytes.
  • Excel - Excel is a spreadsheet program from Microsoft and a component of its Office product group for business applications.
  • exponential function - An exponential function is a mathematical function used to calculate the exponential growth or decay of a given set of data.
  • extension - An extension typically refers to a file name extension.
  • facial recognition - Facial recognition is a category of biometric software that maps an individual's facial features to confirm their identity.
  • fact table - In data warehousing, a fact table is a database table in a dimensional model.
  • failover - Failover is a backup operational mode in which the functions of a system component are assumed by a secondary component when the primary becomes unavailable.
  • file extension (file format) - In computing, a file extension is a suffix added to the name of a file to indicate the file's layout, in terms of how the data within the file is organized.
  • file synchronization (file sync) - File synchronization (file sync) is a method of keeping files that are stored in several different physical locations up to date.
  • firmographic data - Firmographic data is types of information that can be used to categorize organizations, such as location, name, number of clients, industry and so on.
  • FIX protocol (Financial Information Exchange protocol) - The Financial Information Exchange (FIX) protocol is an open specification intended to streamline electronic communications in the financial securities industry.
  • foreign key - A foreign key is a column or columns of data in one table that refers to the unique data values -- often the primary key data -- in another table.
  • garbage in, garbage out (GIGO) - Garbage in, garbage out, or GIGO, refers to the idea that in any system, the quality of output is determined by the quality of the input.
  • Google BigQuery - Google BigQuery is a cloud-based big data analytics web service for processing very large read-only data sets.
  • GPS coordinates - GPS coordinates are a unique identifier of a precise geographic location on the earth, usually expressed in alphanumeric characters.
  • gradient descent - Gradient descent is an optimization algorithm that refines a machine learning (ML) model's parameters to create a more accurate model.
  • grid computing - Grid computing is a system for connecting a large number of computer nodes into a distributed architecture that delivers the compute resources necessary to solve complex problems.
  • gzip (GNU zip) - Gzip (GNU zip) is a free and open source algorithm for file compression.
  • Hadoop - Hadoop is an open source distributed processing framework that manages data processing and storage for big data applications in scalable clusters of computer servers.
  • Hadoop Distributed File System (HDFS) - The Hadoop Distributed File System (HDFS) is the primary data storage system Hadoop applications use.
  • hashing - Hashing is the process of transforming any given key or a string of characters into another value.
  • health informatics - Health informatics is the practice of acquiring, studying and managing health data and applying medical concepts in conjunction with health information technology systems to help clinicians provide better healthcare.
  • Health IT (health information technology) - Health IT (health information technology) is the area of IT involving the design, development, creation, use and maintenance of information systems for the healthcare industry.
  • heartbeat (computing) - In computing, a heartbeat is a program that runs specialized scripts automatically whenever a system is initialized or rebooted.
  • heat map (heatmap) - A heat map is a two-dimensional representation of data in which various values are represented by colors.
  • hierarchy - Generally speaking, hierarchy refers to an organizational structure in which items are ranked in a specific manner, usually according to levels of importance.
  • histogram - A histogram is a type of chart that shows the frequency distribution of data points across a continuous range of numerical values.
  • historical data - Historical data, in a broad context, is data collected about past events and circumstances pertaining to a particular subject.
  • IBM IMS (Information Management System) - IBM IMS (Information Management System) is a database and transaction management system that was first introduced by IBM in 1968.
  • ICD-10-CM (Clinical Modification) - The ICD-10-CM (International Classification of Diseases, 10th Revision, Clinical Modification) is a system used by physicians and other healthcare providers to classify and code all diagnoses, symptoms and procedures related to inpatient and outpatient medical care in the United States.
  • IDoc (intermediate document) - IDoc (intermediate document) is a standard data structure used in SAP applications to transfer data to and from SAP system applications and external systems.
  • in-memory analytics - In-memory analytics is an approach to querying data residing in a computer's random access memory (RAM) as opposed to querying data stored on physical drives.
  • in-memory database - An in-memory database is a type of analytic database designed to streamline the work involved in processing queries.
  • infographic - An infographic (information graphic) is a representation of information in a graphic format designed to make the data easily understandable at a glance.
  • information - Information is the output that results from analyzing, contextualizing, structuring, interpreting or in other ways processing data.
  • information asset - An information asset is a collection of knowledge or data that is organized, managed and valuable.
  • information assurance (IA) - Information assurance (IA) is the practice of protecting physical and digital information and the systems that support the information.
  • information governance - Information governance is a holistic approach to managing corporate information by implementing processes, roles, controls and metrics that treat information as a valuable business asset.
  • information lifecycle management (ILM) - Information lifecycle management (ILM) is a comprehensive approach to managing an organization's data and associated metadata, starting with its creation and acquisition through when it becomes obsolete and is deleted.
  • IT incident management - IT incident management is a component of IT service management (ITSM) that aims to rapidly restore services to normal following an incident while minimizing adverse effects on the business.
  • Java Database Connectivity (JDBC) - Java Database Connectivity (JDBC) is an API packaged with the Java SE edition that makes it possible to connect from a Java Runtime Environment (JRE) to external, relational database systems.
  • job - In certain computer operating systems, a job is the unit of work that a computer operator -- or a program called a job scheduler -- gives to the OS.
  • job scheduler - A job scheduler is a computer program that enables an enterprise to schedule and, in some cases, monitor computer 'batch' jobs (units of work).
  • job step - In certain computer operating systems, a job step is part of a job, a unit of work that a computer operator (or a program called a job scheduler) gives to the operating system.
  • JOLAP (Java Online Analytical Processing) - JOLAP (Java Online Analytical Processing) is a Java application-programming interface (API) for the Java 2 Platform, Enterprise Edition (J2EE) environment that supports the creation, storage, access, and management of data in an online analytical processing (OLAP) application.
  • key-value pair (KVP) - A key-value pair (KVP) is a set of two linked data items: a key, which is a unique identifier for some item of data, and the value, which is either the data that is identified or a pointer to the location of that data.
  • knowledge base - In general, a knowledge base is a centralized repository of information.
  • knowledge management (KM) - Knowledge management is the process an enterprise uses to gather, organize, share and analyze its knowledge in a way that's easily accessible to employees.
  • laboratory information system (LIS) - A laboratory information system (LIS) is computer software that processes, stores and manages data from patient medical processes and tests.