Data Glossary

You are here:
Estimated reading time: 2 min

B

Big Data: Big Data refers to large-scale datasets or collections of information that require complex techniques to process in order to gain meaningful insights.

D

Data Extraction: The process of extracting data from one or more sources and transforming it into a format suitable for further analysis. Data Transformation: The process of converting data from a source into the desired formats or structures in order to make it suitable for further analysis. Data Load (Loading): The step in which the transformed data is loaded into a target system such as a database, warehouse, or other type of repository. Data Warehouse: A repository of information collected from multiple sources for analysis purposes, which may include ETL processes for combining the different sources and transforming the data before storage. Data Lake: A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed. As opposed to traditional relational databases, a data lake can store structured and unstructured data from different sources without needing prior transformation. Data Marts: A subset of a data warehouse that contains only a portion of the entire dataset. It is often used to provide data specific to a certain department or project within an organization. Data Governance: A set of processes, policies, and guidelines that establish how an organization manages its data over time in order to ensure accuracy, consistency, and quality throughout its operation. Data Engineer: A data engineer is responsible for developing, designing, and maintaining the data infrastructure used by an organization. This includes designing databases, data warehouses and other systems to store and manage data, as well as optimizing queries that access the data.

E

ETL Process: Abbreviation for Extract, Transform and Load; a process used in data warehousing involves extracting data from multiple sources, transforming it into a common format, and loading it into an operational database. ETL Tool: Standalone software designed to perform the various extraction, transformation, and loading steps to facilitate efficient data processing operations. ERP: Enterprise Resource Planning; software solutions that help companies manage their business processes like accounting, human resources, procurement and customer relationships.

M

Metadata: Information about a particular dataset including field names, descriptions, source details, etc. They are also referred to as ‘data about data’.

O

OLTP: Online Transaction Processing; databases optimized for short online transactions. OLAP: Online Analytical Processing; databases optimized for complex analytical queries over large volumes of data.

Q

Query Optimization: Query optimization involves using advanced techniques to improve the efficiency of a database query. This may involve restructuring query syntax, rewriting subqueries, or selecting specific indexes or controls to help speed up performance

R

Relational Database: A collection of data organized according to relationships between attributes and stored in a computerized form with normalization rules applied.
Was this article helpful?
Dislike 0