Data and data management

Terms related to data, including definitions about data warehousing and words and phrases about data management.
  • U-SQL - U-SQL is a Microsoft query language that combines a declarative SQL-like syntax with C# programming, enabling it to be used to process both structured and unstructured data in big data environments.
  • unstructured text - The unstructured text collected from social media activities plays a key role in predictive analytics for the enterprise because it is a prime source for sentiment analysis to determine the general attitude of consumers toward a brand or idea.
  • utility storage - Utility storage is a service model in which a provider makes storage capacity available to an individual, organization or business unit on a pay-per-use basis.
  • virtual desktop - A virtual desktop is a computer operating system that does not run directly on the endpoint hardware from which a user accesses it.
  • volatile memory - Volatile memory is a type of memory that maintains its data only while the device is powered.
  • web analytics - Web analytics is the process of analyzing the behavior of visitors to a website by tracking, reviewing and reporting the data generated by their use of the site and its components, such as its webpages, images and videos.
  • web services - Web services are a type of internet software that use standardized messaging protocols and are made available from an application service provider's web server for a client or other web-based programs to use.
  • WebLogic - Oracle WebLogic Server is a leading e-commerce online transaction processing (OLTP) platform, developed to connect users in distributed computing production environments and to facilitate the integration of mainframe applications with distributed corporate data and applications.
  • What are data silos and what problems do they cause? - A data silo is a repository of data that's controlled by one department or business unit and isolated from the rest of an organization, much like grass and grain in a farm silo are closed off from outside elements.
  • What are graph neural networks (GNNs)? - Graph neural networks (GNNs) are a type of neural network architecture and deep learning method that can help users analyze graphs, enabling them to make predictions based on the data described by a graph's nodes and edges.
  • What are spreadsheets and how do they work? - A spreadsheet is a computer program that can capture, display and manipulate data arranged in rows and columns.
  • What is a 3-tier application architecture? - A three-tier application architecture is a modular client-server architecture that consists of a presentation tier, an application tier and a data tier.
  • What is a backpropagation algorithm? - A backpropagation algorithm, or backward propagation of errors, is an algorithm that's used to help train neural network models.
  • What is a business intelligence dashboard (BI dashboard)? - A business intelligence dashboard, or BI dashboard, is a data visualization and analysis tool that displays on one screen the status of key performance indicators (KPIs) and other important business metrics and data points for an organization, department, team or process.
  • What is a Consensus Algorithm? - A consensus algorithm is a process in computer science used to achieve agreement on a single data value among distributed processes or systems.
  • What is a data architect? - A data architect is an IT professional responsible for defining the policies, procedures, models and technologies used in collecting, organizing, storing and accessing company information.
  • What is a data flow diagram (DDF)? - A data flow diagram (DFD) is a graphical or visual representation that uses a standardized set of symbols and notations to describe a business's operations through data movement.
  • What is a data mart (datamart)? - A data mart is a repository of data that is designed to serve a particular community of knowledge workers.
  • What is a framework? - In general, a framework is a real or conceptual structure intended to serve as a support or guide for the building of something that expands the structure into something useful.
  • What is a pivot table? - A pivot table is a statistics tool that summarizes and reorganizes selected columns and rows of data in a spreadsheet or database table to obtain a desired report.
  • What is a private cloud? - Private cloud is a type of cloud computing that delivers similar advantages to public cloud, including scalability and self-service, but through a proprietary architecture.
  • What is a records retention schedule? - A records retention schedule is a policy that defines how long paper and electronic content must be kept and provides disposal guidelines for how those items should be discarded.
  • What is a validation set? How is it different from test, train data sets? - A validation set is a set of data used to train artificial intelligence (AI) with the goal of finding and optimizing the best model to solve a given problem.
  • What is a vector database? - A vector database is a type of database technology that's used to store, manage and search vector embeddings, numerical representations of unstructured data that are also referred to simply as vectors.
  • What is actionable intelligence? - Actionable intelligence is information that can be immediately used or acted upon, either tactically in direct response to an evolving situation, or strategically as the result of data analytics or some other assessment.
  • What is AI ethics? - AI ethics is a system of moral principles and techniques intended to inform the development and responsible use of artificial intelligence technology.
  • What is an API endpoint? - An API endpoint is a point at which an application programming interface -- the code that enables two software programs to communicate with each other -- connects with the software program.
  • What is an inductive argument? - An inductive argument is an assertion that uses specific premises or observations to make a broader generalization.
  • What is an information system (IS)? - An information system (IS) is an interconnected set of components used to collect, store, process and transmit data and digital information.
  • What is an NVDIMM (non-volatile dual in-line memory module)? - An NVDIMM (non-volatile dual in-line memory module) is hybrid computer memory that retains data during a service outage.
  • What is anomaly detection? An overview and explanation - Anomaly detection is the process of identifying data points, entities or events that fall outside the normal range.
  • What is bit rot? - Bit rot is the slow deterioration in the performance and integrity of data stored on storage media.
  • What is Change Healthcare? - Change Healthcare is a healthcare technology provider specializing in revenue cycle management, payment management and health information exchange solutions.
  • What is corporate performance management (CPM)? - Corporate performance management (CPM) encompasses the processes and methodologies used to align an organization's strategies and goals to its plans and actions as a business.
  • What is Current Procedural Terminology (CPT) code? - Current Procedural Terminology (CPT) is a medical code set that enables physicians and other healthcare providers to describe and report the medical, surgical, and diagnostic procedures and services they perform to government and private payers, researchers and other interested parties.
  • What is customer intelligence (CI) and how does it help business? - Customer intelligence (CI) is the process of collecting and analyzing detailed customer data from internal and external sources to gain insights about customer needs, motivations and behaviors.
  • What is customer segmentation? - Customer segmentation is the practice of dividing a customer base into groups of individuals that have similar characteristics relevant to marketing, such as age, gender, interests and spending habits.
  • What is dark data? - Dark data is digital information an organization collects, processes and stores that is not currently being used for business purposes.
  • What is data activation? - Data activation is a marketing approach that uses consumer information and data analytics to help companies gain real-time insight into target audience behavior and plan for future marketing initiatives.
  • What is data aggregation? - Data aggregation is any process whereby data is gathered and expressed in a summary form.
  • What is data architecture? A data management blueprint - Data architecture is a discipline that documents an organization's data assets, maps how data flows through IT systems and provides a blueprint for managing data, as this guide explains.
  • What is Data as a Service (DaaS)? - Data as a Service (DaaS) is an information provision and distribution model in which data files (including text, images, sounds, and videos) are made available to customers over a network, typically the Internet.
  • What is data democratization? - Data democratization makes information in a digital format accessible to the average end user.
  • What is data egress? How it works and how to manage costs - Data egress is when data leaves a closed or private network and is transferred to an external location.
  • What is data governance and why does it matter? - Data governance is the process of managing the availability, usability, integrity and security of the data in enterprise systems, based on internal standards and policies that also control data usage.
  • What is data labeling? - Data labeling is the process of identifying and tagging data samples commonly used in the context of training machine learning (ML) models.
  • What is data lifecycle? - A data lifecycle is the sequence of stages that a unit of data goes through from its initial generation or capture to its archiving or deletion at the end of its useful life.
  • What is data loss prevention (DLP)? - Data loss prevention (DLP) -- sometimes referred to as 'data leak prevention,' 'information loss prevention' or 'extrusion prevention' -- is a strategy to mitigate threats to critical data.
  • What is data management and why is it important? Full guide - Data management is the process of ingesting, storing, organizing and maintaining the data created and collected by an organization, as explained in this in-depth guide.
  • What is data management as a service (DMaaS)? - Data management as a service (DMaaS) is a type of cloud service that provides enterprises with centralized storage for disparate data sources.
  • What is data preparation? An in-depth guide - Data preparation is the process of gathering, combining, structuring and organizing data for use in business intelligence, analytics and data science applications, as explained in this guide.
  • What is data science as a service (DSaaS)? - Data science as a service (DSaaS) is a form of outsourcing that involves the delivery of information gleaned from advanced analytics applications run by data scientists at an outside company to corporate clients for their business use.
  • What is data science? The ultimate guide - Data science is the process of using advanced analytics techniques and scientific principles to analyze data and extract valuable information for business decision-making, strategic planning and other uses.
  • What is data storytelling? - Data storytelling is the process of translating complex data analyses into understandable terms to inform a business decision or action.
  • What is data validation? - Data validation is the practice of checking the integrity, accuracy and structure of data before it is used for or by one or more business operations.
  • What is data? - In computing, data is information translated into a form that is efficient for movement or processing.
  • What is database normalization? - Database normalization is intrinsic to most relational database schemes.
  • What is denormalization and how does it work? - Denormalization is the process of adding precomputed redundant data to an otherwise normalized relational database to improve read performance.
  • What is deterministic/probabilistic data? - Deterministic and probabilistic are opposing terms that can be used to describe customer data and how it is collected.
  • What is digital health (digital healthcare)? - Digital health, also known as digital healthcare, is the use of digital technologies in healthcare.
  • What is dimensionality reduction? - Dimensionality reduction is a process and technique to reduce the number of dimensions -- or features -- in a data set.
  • What is electronic data processing (EDP)? - Electronic data processing (EDP) refers to the gathering of data using electronic devices, such as computers, servers and internet of things (IoT) technologies.
  • What is employee self-service (ESS)? - Employee self-service (ESS) is a widely used human resources technology that enables employees to perform many job-related functions that were once largely paper-based, or otherwise maintained by management, administrative or HR staff.
  • What is enterprise content management? Guide to ECM - Enterprise content management is a set of defined processes, strategies and tools that enables a business to obtain, organize, store and deliver critical information to its employees, business stakeholders and customers.
  • What is Extract, Load, Transform (ELT)? - Extract, Load, Transform (ELT) is a data integration process for transferring raw data from a source server to a data system (such as a data warehouse or data lake) on a target server and then preparing the information for downstream uses.
  • What is GDPR? Compliance and conditions explained - The General Data Protection Regulation (GDPR) is legislation that updated and unified data privacy laws across the European Union (EU).
  • What is information rights management (IRM)? - Information rights management (IRM) is a discipline that involves managing, controlling and securing content from unwanted access.
  • What is master data management (MDM)? - Master data management (MDM) is a process that creates a uniform set of data on customers, products, suppliers and other business entities from different IT systems.
  • What is Microsoft Power BI? Uses, features and guide - Microsoft Power BI is a business intelligence (BI) platform that provides nontechnical business users with tools for aggregating, analyzing, visualizing and sharing data.
  • What is Microsoft Visual FoxPro (VFP)? - Microsoft Visual FoxPro (VFP) is an object-oriented programming (OOP) environment with a built-in relational database engine.
  • What is NoSQL (Not Only SQL database)? - NoSQL is an approach to database management that can accommodate a wide variety of data models, including key-value, document, columnar and graph formats.
  • What is Oracle? - Oracle is one of the largest vendors in the enterprise IT market and the shorthand name of its flagship product, a relational database management system (RDBMS) that's formally called Oracle Database.
  • What is PaaS? Platform as a service definition and guide - Platform as a service (PaaS) is a cloud computing model where a third-party provider delivers hardware and software tools to users over the internet.
  • What is picture archiving and communication system (PACS)? - Picture archiving and communication system (PACS) is a medical imaging technology used primarily in healthcare organizations to securely store and digitally transmit electronic images and clinically relevant reports.
  • What is qualitative data? - Qualitative data is descriptive information that focuses on concepts and characteristics, rather than numbers and statistics.
  • What is records management? - Records management is the supervision and administration of digital or paper records, regardless of format.
  • What is Salesforce Wave Analytics (Salesforce CRM Analytics)? - Salesforce Wave Analytics, now known as CRM Analytics, is a business intelligence (BI) and analytics platform from Salesforce that provides native analytics, visual insights and predictions powered by artificial intelligence (AI) for Salesforce CRM to help businesses make better, data-driven decisions.
  • What is SAP Basis? - SAP Basis is the technical foundation that enables SAP applications to function.
  • What is SAP Data Services? - SAP Data Services is a data integration and transformation software application that enables organizations to capture more meaning and value from their structured and unstructured data.
  • What is secure multiparty computation (SMPC)? - Secure multiparty computation (SMPC) is a form of confidential computing that protects the privacy and security of systems and data sources, while maintaining the data's integrity.
  • What is self-service business intelligence (self-service BI)? - Self-service business intelligence (BI) is an approach to data analytics that enables nontechnical business users to access and explore data sets.
  • What is semantic technology? - Semantic technology is a set of methods and tools that provide advanced means for categorizing and processing data, as well as for discovering relationships within varied data sets.
  • What is sentiment analysis? - Sentiment analysis, also referred to as 'opinion mining,' is an approach to natural language processing (NLP) that identifies the emotional tone behind a body of text.
  • What is Software as a Service (SaaS)? - Software as a service (SaaS) is a software distribution model in which a third-party provider hosts applications and makes them available to customers over the Internet.
  • What is SPSS (Statistical Package for the Social Sciences)? - SPSS (Statistical Package for the Social Sciences), also known as IBM SPSS Statistics since 2009, is a user-friendly software package used for the analysis of statistical data and to make data-driven decisions.
  • What is Structured Query Language (SQL)? - Structured Query Language (SQL) is a standardized programming language that is used to manage relational databases and perform various operations on the data in them.
  • What is the Coalition for Secure AI (CoSAI)? - Coalition for Secure AI (CoSAI) is an open source initiative to enhance artificial intelligence's security.
  • What is the Driver's Privacy Protection Act (DPPA)? - The Driver's Privacy Protection Act (DPPA) is a United States federal law designed to protect the personally identifiable information of licensed drivers from improper use or disclosure.
  • What is transfer learning? - Transfer learning is a machine learning (ML) technique where an already developed ML model is reused in another task.
  • What is user acceptance testing (UAT)? - User acceptance testing (UAT), also called application testing or end-user testing, is a phase of software development in which the software is tested in the real world by its intended audience.
  • What is user behavior analytics (UBA)? - User behavior analytics (UBA) is the tracking, collecting and assessing of user data and activities using monitoring systems.
  • wipe - Wipe, in a computing context, means to erase all data on a hard drive to render it unreadable.
  • workload - In computing, a workload is typically any program or application that runs on a computer.
  • WORM (write once, read many) - In computer media, write once, read many, or WORM, is a data storage technology that allows data to be written to a storage medium a single time and prevents the data from being erased or modified.
  • XML Schema Definition (XSD) - XML Schema Definition or XSD is a recommendation by the World Wide Web Consortium (W3C) to describe and validate the structure and content of an XML document.
  • YAML (YAML Ain't Markup Language) - YAML (YAML Ain't Markup Language) is a data serialization language used as the input format for diverse software applications.
  • yobibyte (YiB) - A yobibyte (YiB) is a unit of measure used to describe data capacity as part of the binary system of measuring computing and storage capacity.