Data Pipeline Architecture Design

Design a modern data pipeline architecture for reliable and scalable analytics.

📊 Data & AnalyticsadvancedData Engineer✓ Free

The Prompt

Design a modern data pipeline architecture for [Company] with [describe data sources and volume]. Include: 1) Data source inventory — map all data sources (production databases, SaaS tools, event streams, files, APIs), volume estimates, freshness requirements, schema stability. 2) Ingestion layer — CDC vs batch vs streaming for each source, tool selection (Fivetran, Airbyte, Debezium, custom), scheduling strategy, error handling. 3) Storage architecture — data lakehouse design (Delta Lake, Iceberg), raw/bronze/silver/gold layer definitions, partitioning strategy, file format selection (Parquet, Delta). 4) Transformation layer — dbt project structure, modeling methodology (staging → intermediate → marts), testing strategy (schema, data, freshness), documentation requirements. 5) Orchestration — Airflow/Dagster/Prefect comparison, DAG design principles, dependency management, alerting and monitoring. 6) Data quality — automated testing at each layer, anomaly detection, data freshness monitoring, SLA definition and tracking. 7) Serving layer — analytics warehouse (Snowflake/BigQuery/Redshift), semantic layer, API layer for applications, caching strategy. 8) Governance and security — access control, PII handling pipeline, data lineage tracking, cost management. 9) Team structure and ownership — data engineer responsibilities, analytics engineer responsibilities, on-call rotation for data pipelines.

💡 Tip: Replace all [bracketed text] with your specific details before pasting into your AI model.