Data and data management

Terms related to data, including definitions about data warehousing and words and phrases about data management.
  • inline deduplication - Inline deduplication is the removal of redundancies from data before or as it is being written to a backup device.
  • IT incident management - IT incident management is a component of IT service management (ITSM) that aims to rapidly restore services to normal following an incident while minimizing adverse effects on the business.
  • Java Database Connectivity (JDBC) - Java Database Connectivity (JDBC) is an API packaged with the Java SE edition that makes it possible to connect from a Java Runtime Environment (JRE) to external, relational database systems.
  • job - In certain computer operating systems, a job is the unit of work that a computer operator -- or a program called a job scheduler -- gives to the OS.
  • job scheduler - A job scheduler is a computer program that enables an enterprise to schedule and, in some cases, monitor computer 'batch' jobs (units of work).
  • job step - In certain computer operating systems, a job step is part of a job, a unit of work that a computer operator (or a program called a job scheduler) gives to the operating system.
  • JOLAP (Java Online Analytical Processing) - JOLAP (Java Online Analytical Processing) is a Java application-programming interface (API) for the Java 2 Platform, Enterprise Edition (J2EE) environment that supports the creation, storage, access, and management of data in an online analytical processing (OLAP) application.
  • key-value pair (KVP) - A key-value pair (KVP) is a set of two linked data items: a key, which is a unique identifier for some item of data, and the value, which is either the data that is identified or a pointer to the location of that data.
  • knowledge base - In general, a knowledge base is a centralized repository of information.
  • knowledge management (KM) - Knowledge management is the process an enterprise uses to gather, organize, share and analyze its knowledge in a way that's easily accessible to employees.
  • knowledge-based systems (KBSes) - Knowledge-based systems (KBSes) are computer programs that use a centralized repository of data known as a knowledge base to provide a method for problem-solving.
  • laboratory information system (LIS) - A laboratory information system (LIS) is computer software that processes, stores and manages data from patient medical processes and tests.
  • Lambda architecture - Lambda architecture is an approach to big data management that provides access to batch processing and near real-time processing with a hybrid approach.
  • legal health record (LHR) - A legal health record (LHR) refers to documentation about a patient's personal health information that is created by a healthcare organization or provider.
  • Lisp (programming language) - Lisp, an acronym for list processing, is a functional programming language that was designed for easy manipulation of data strings.
  • LTO-8 (Linear Tape-Open 8) - LTO-8, or Linear Tape-Open 8, is a tape format from the Linear Tape-Open Consortium released in late 2017.
  • MariaDB - MariaDB is an open source relational database management system (DBMS) that is a compatible drop-in replacement for the widely used MySQL database technology.
  • Massachusetts data protection law - What is the Massachusetts data protection law?The Massachusetts data protection law is legislation that stipulates security requirements for organizations that handle the private data of residents.
  • master data - Master data is the core data that is essential to operations in a specific business or business unit.
  • medical scribe - A medical scribe is a professional who specializes in documenting patient encounters in real time under the direction of a physician.
  • metadata - Often referred to as data that describes other data, metadata is structured reference data that helps to sort and identify attributes of the information it describes.
  • Microsoft Azure - Microsoft Azure, formerly known as Windows Azure, is Microsoft's public cloud computing platform.
  • Microsoft Azure Data Lake - Microsoft Azure Data Lake is a highly scalable public cloud service that allows developers, scientists, business professionals and other Microsoft customers to gain insight from large, complex data sets.
  • Microsoft MyAnalytics - Microsoft MyAnalytics is a personal analytics application in Office 365 that enables employees to gain insights into how they spend their time at work and how they can work smarter.
  • Microsoft Office SharePoint Server (MOSS) - Microsoft Office SharePoint Server (MOSS) is the full version of a portal-based platform for collaboratively creating, managing and sharing documents and Web services.
  • Microsoft System Center - Microsoft System Center is a suite of software products designed to simplify the deployment, configuration and management of IT infrastructure and virtualized software-defined data centers.
  • middleware - Middleware is software that bridges the gap between applications and operating systems by providing a method for communication and data management.
  • Monte Carlo simulation - A Monte Carlo simulation is a mathematical technique that simulates the range of possible outcomes for an uncertain event.
  • MPP database (massively parallel processing database) - An MPP database is a database that is optimized to be processed in parallel for many operations to be performed by many processing units at a time.
  • multidimensional database (MDB) - A multidimensional database (MDB) is a type of database that is optimized for data warehouse and online analytical processing (OLAP) applications.
  • national identity card - A national identity card is a portable document, typically a plasticized card with digitally embedded information, that is used to verify aspects of a person's identity.
  • noisy data - Noisy data is a data set that contains extra meaningless data.
  • normal distribution - A normal distribution is a type of continuous probability distribution in which most data points cluster toward the middle of the range, while the rest taper off symmetrically toward either extreme.
  • object-oriented database management system (OODBMS) - An object-oriented database management system (OODBMS), sometimes shortened to ODBMS for object database management system, is a database management system (DBMS) that supports the modelling and creation of data as objects.
  • OLAP (online analytical processing) - OLAP (online analytical processing) is a computing method that enables users to easily and selectively extract and query data in order to analyze it from different points of view.
  • Open Database Connectivity (ODBC) - Open Database Connectivity (ODBC) is an open standard application programming interface (API) that allows application programmers to easily access data stored in a database.
  • operational data store (ODS) - An operational data store (ODS) is a type of database that's often used as an interim logical area for a data warehouse.
  • operational efficiency - Operational efficiency refers to an organization's ability to reduce waste of time, effort and material while still producing a high-quality service or product.
  • operational intelligence (OI) - Operational intelligence (OI) is an approach to data analysis that enables decisions and actions in business operations to be based on real-time data as it's generated or collected by companies.
  • pandemic plan - A pandemic plan is a documented strategy for business continuity in the event of a widespread outbreak of a dangerous infectious disease.
  • parallel file system - A parallel file system is a software component designed to store data across multiple networked servers.
  • pebibyte (PiB) - A pebibyte (PiB) is a unit of measure that describes data capacity.
  • performance and accountability reporting (PAR) - Performance and accountability reporting (PAR) is the process of compiling and documenting factors that quantify an organization's achievements, efficiency and adherence to budget, comparing actual results against previously articulated goals.
  • personal health record (PHR) - A personal health record (PHR) is an electronic summary of health information that a patient maintains control of themselves, as opposed to their healthcare provider.
  • PL/SQL (procedural language extension to Structured Query Language) - In Oracle database management, PL/SQL is a procedural language extension to Structured Query Language (SQL).
  • precision agriculture - Precision agriculture (PA) is a farming management concept based on observing, measuring and responding to inter- and intra-field variability in crops.
  • predictive modeling - Predictive modeling is a mathematical process used to predict future events or outcomes by analyzing patterns in a given set of input data.
  • primary key (primary keyword) - A primary key, also called a primary keyword, is a column in a relational database table that's distinctive for each record.
  • product data management (PDM) - Product data management (PDM) is the process of capturing and managing the electronic information related to a product so it can be reused in business processes such as design, production, distribution and marketing.
  • public data - Public data is information that can be shared, used, reused and redistributed without restriction.
  • radiology information system (RIS) - A radiology information system (RIS) is a networked software system for managing medical imagery and associated data.
  • raw data (source data or atomic data) - Raw data is the data originally generated by a system, device or operation, and has not been processed or changed in any way.
  • RDBMS (relational database management system) - A relational database management system (RDBMS) is a collection of programs and capabilities that enable IT teams and others to create, update, administer and otherwise interact with a relational database.
  • real-time analytics - Real-time analytics is the use of data and related resources for analysis as soon as it enters the system.
  • record - In computer data processing, a record is a collection of data items arranged for processing by a program.
  • records information management (RIM) - Records information management (RIM) is a corporate area of endeavor involving the administration of all business records through their life cycle.
  • redundancy - Redundancy is a system design in which a component is duplicated so if it fails there will be a backup.
  • refactoring - Refactoring is the process of restructuring code, while not changing its original functionality.
  • registered health information technician (RHIT) - A registered health information technician (RHIT) is a certified professional who stores and verifies the accuracy and completeness of electronic health records.
  • relational database - A relational database is a type of database that organizes data points with defined relationships for easy access.
  • Report on Compliance (ROC) - A Report on Compliance (ROC) is a form that must be completed by all Level 1 Visa merchants undergoing a PCI DSS (Payment Card Industry Data Security Standard) audit.
  • restore point - A system restore point is a backup copy of important Windows operating system (OS) files and settings that can be used to recover the system to an earlier point of time in the event of system failure or instability.
  • RFM analysis (recency, frequency, monetary) - RFM analysis is a marketing technique used to quantitatively rank and group customers based on the recency, frequency and monetary total of their recent transactions to identify the best customers and perform targeted marketing campaigns.
  • SAP BW (Business Warehouse) - SAP Business Warehouse (BW) is a model-driven data warehousing product based on the SAP NetWeaver ABAP platform.
  • schema - In computer programming, a schema (pronounced SKEE-mah) is the organization or structure for a database, while in artificial intelligence (AI), a schema is a formal expression of an inference rule.
  • security information management (SIM) - Security information management (SIM) is the practice of collecting, monitoring and analyzing security-related data from computer logs and various other data sources.
  • self-driving car (autonomous car or driverless car) - A self-driving car -- sometimes called an autonomous car or driverless car -- is a vehicle that uses a combination of sensors, cameras, radar and artificial intelligence (AI) to travel between destinations without a human operator.
  • self-service analytics - Self-service analytics is a type of business intelligence (BI) that enables business users to access, manipulate, analyze and visualize data, as well as generate reports based on their discoveries.
  • semantic network (knowledge graph) - A semantic network is a knowledge structure that depicts how concepts are related to one another and how they interconnect.
  • sensitive information - Sensitive information is data that must be protected from unauthorized access to safeguard the privacy or security of an individual or organization.
  • SequenceFile - A SequenceFile is a flat, binary file type that serves as a container for data to be used in Hadoop distributed compute projects.
  • server-based storage - Server-based storage is a re-emerging class of data storage that removes cost and complexity by housing storage media inside servers rather than in dedicated and custom-engineered storage arrays.
  • serverless database - A serverless database is a type of cloud database that is fully managed for an organization by a cloud service provider and runs on demand as needed to support applications.
  • SNOMED CT (Systematized Nomenclature of Medicine -- Clinical Terms) - SNOMED CT (Systematized Nomenclature of Medicine -- Clinical Terms) is a standardized, multilingual vocabulary of clinical terminology that is used by physicians and other health care providers for the electronic exchange of health information.
  • snowflaking (snowflake schema) - In data warehousing, snowflaking is a form of dimensional modeling in which dimensions are stored in multiple related dimension tables.
  • software-defined storage (SDS) - Software-defined storage (SDS) is a software program that manages data storage resources and functionality and has no dependencies on the underlying physical storage hardware.
  • spatial data - Spatial data is any type of data that directly or indirectly references a specific geographical area or location.
  • standard business reporting (SBR) - Standard business reporting (SBR) is a group of frameworks adopted by governments to promote standardization in reporting business data.
  • star schema - A star schema is a database organizational structure optimized for use in a data warehouse or business intelligence that uses a single large fact table to store transactional or measured data, and one or more smaller dimensional tables that store attributes about the data.
  • statistical analysis - Statistical analysis is the collection and interpretation of data in order to uncover patterns and trends.
  • storage class memory (SCM) - Storage class memory (SCM) is a type of physical computer memory that combines dynamic random access memory (DRAM), NAND flash memory and a power source for data persistence.
  • stored procedure - A stored procedure is a group of statements with a specific name, which are stored inside a database, such as MySQL or Oracle.
  • stream processing - Stream processing is a data management technique that involves ingesting a continuous data stream to quickly analyze, filter, transform or enhance the data in real time.
  • streaming data architecture - A streaming data architecture is an information technology framework that puts the focus on processing data in motion and treats extract-transform-load (ETL) batch processing as just one more event in a continuous stream of events.
  • structured data - Structured data is data that has been organized into a formatted repository, typically a database.
  • supply chain planning (SCP) - Supply chain planning (SCP) is the process of anticipating the demand for products and planning their materials and components, production, marketing, distribution and sale.
  • support vector machine (SVM) - A support vector machine (SVM) is a type of supervised learning algorithm used in machine learning to solve classification and regression tasks.
  • syslog - Syslog is an IETF RFC 5424 standard protocol for computer logging and collection that is popular in Unix-like systems including servers, networking equipment and IoT devices.
  • system of record (SOR) - A system of record (SOR) is an information storage and retrieval system that stores valuable data on an organizational system or process.
  • System Restore (Windows) - System Restore is a Microsoft Windows utility designed to protect and revert the operating system (OS) to a previous state.
  • T-SQL (Transact-SQL) - T-SQL (Transact-SQL) is a set of programming extensions from Sybase and Microsoft that add several features to the Structured Query Language (SQL), including transaction control, exception and error handling, row processing and declared variables.
  • table - A table in computer programming is a data structure used to organize information, just as it is on paper.
  • taxonomy - Taxonomy is the science of classification according to a predetermined system, with the resulting catalog used to provide a conceptual framework for discussion, analysis or information retrieval.
  • text mining (text analytics) - Text mining is the process of exploring and analyzing large amounts of unstructured text data aided by software that can identify concepts, patterns, topics, keywords and other attributes in the data.
  • text tagging - Text tagging is the process of manually or automatically adding tags or annotation to various components of unstructured data as one step in the process of preparing such data for analysis.
  • timeline - A timeline is a visual representation of a chronological sequence of events along a drawn line that helps a viewer understand time relationships.
  • transactional data - In computing, transactional data is the information collected from transactions.
  • transcription error - A transcription error is a type of data entry error commonly made by human operators or by optical character recognition (OCR) programs.
  • transportation management system (TMS) - A transportation management system (TMS) is specialized software for planning, executing and optimizing the shipment of goods.
  • tree structure - A tree data structure is an algorithm for placing and locating files (called records or keys) in a database.