Skip to main content

Data Warehouse

A data warehouse (DWH) is a centralized repository optimized for analytical queries across multiple source systems — ERP, CRM, e-commerce, marketing, finance — rather than the transactional patterns of operational databases. Data warehousing has been the foundation of business intelligence since the 1990s, with cloud-native warehouses (Snowflake, Google BigQuery, Amazon Redshift, Databricks) dominating new deployments since the late 2010s.

Architecture and modeling

Two dominant modeling paradigms. Kimball dimensional modeling: facts (measurements like sales, order quantities) plus dimensions (descriptive attributes like time, customer, product) organized in star schemas. Optimized for query performance and business-user readability. Inmon corporate information factory: normalized data warehouse with data marts derived for specific analytical needs. More rigid but cleaner enterprise data structure. Modern cloud-warehouse practice often combines both: raw layer (operational source data preserved), staging layer (cleaned and conformed), presentation layer (dimensional for end-user consumption). Tools like dbt (data build tool) have standardized the SQL-based transformation patterns that build these layers.

Cloud data warehouses

Snowflake — cloud-native, multi-cloud (AWS, Azure, Google), strong presence across US mid-market and enterprise. Google BigQuery — serverless, pay-per-query, deep ML and AI integration. Amazon Redshift — AWS-native, more traditional warehouse model. Microsoft Azure Synapse Analytics (formerly SQL Data Warehouse) and Microsoft Fabric — integrated with the Microsoft data stack, common in organizations standardized on Microsoft 365. Databricks — data-lakehouse platform combining warehouse and data-lake capabilities. SAP Datasphere (formerly Data Warehouse Cloud) — SAP-native with direct S/4HANA integration. Open source: Apache Druid, ClickHouse, DuckDB (single-machine analytics). For US mid-market companies starting fresh, Snowflake and Microsoft Fabric are the most-commonly evaluated options.

Data lake versus lakehouse

Adjacent categories worth distinguishing. Data lake: raw-data storage in flexible formats (parquet, JSON, CSV) on object storage (S3, ADLS, GCS), supporting unstructured and semi-structured data. Lower cost than a warehouse, less optimized for SQL queries. Data lakehouse: combines lake-style storage with warehouse-style ACID transactions and SQL performance. Apache Iceberg, Delta Lake and Apache Hudi are the open table formats enabling lakehouse architectures; Databricks and Snowflake both support lakehouse patterns. For ERP-centric analytics in the US mid-market, the classical data-warehouse pattern suits most use cases; lakehouse becomes valuable when unstructured data (logs, IoT, documents) joins the analytics scope.

ERP-to-DWH integration

ERP data flows into the warehouse through ETL or ELT pipelines (see ETL). Common patterns. (1) Replication via CDC: tools like SLT (SAP), Debezium or Fivetran capture every ERP database change and stream to the warehouse with minute-level latency. (2) Scheduled batch extracts: nightly or hourly batch pulls of changed records via OData or vendor APIs. (3) Vendor data hubs: SAP Datasphere, Oracle Autonomous Data Warehouse, Microsoft Dataverse provide ERP-vendor-managed data layers that simplify integration. US mid-market companies typically combine tools: Fivetran or Airbyte for SaaS sources, vendor connectors for SAP and Oracle, custom Python for legacy on-premises ERPs. Downstream BI tools (Power BI, Tableau, Qlik) consume the warehouse for analytics and dashboards.

Related Topics

erp-software.org · the independent ERP comparison for the mid-market in Germany, Switzerland and Austria
All mentioned brand, product and company names are property of their respective owners. References are made solely for identification and comparison purposes (no indication of commercial or partnership relationships). Note pursuant to §5b German UWG (Unfair Competition Act): user reviews are manually plausibility-checked before publication – we cannot, however, determine with absolute certainty whether reviews originate exclusively from actual users. Some links on erp-software.org may lead to advertising partnerships or lead-referrals; editorial assessments are made independently of these.