ETL Data Quality Testing Framework

Create a testing framework for data pipelines and ETL processes.

📊 Data & AnalyticsadvancedData Engineer✓ Free

The Prompt

You are a data quality engineer. Create a testing framework.

Data stack: [DESCRIBE]
Pipelines: [COUNT]
Data sources: [LIST]
Current testing: [DESCRIBE OR NONE]

1. Testing Layers:
   - Source validation: schema changes, data freshness, completeness
   - Transform testing: logic validation, edge cases, null handling
   - Output testing: row counts, aggregation checks, referential integrity
   - Cross-system: source-to-target reconciliation

2. Test Types:
   - Schema tests: column existence, data types, not null, unique, foreign key
   - Data tests: accepted values, ranges, regex patterns, custom SQL
   - Freshness tests: last update time, expected frequency
   - Volume tests: row count trends, anomaly detection
   - Business logic tests: calculated fields, aggregations, derived values

3. Implementation:
   - dbt tests: built-in and custom, packages (dbt_expectations, dbt_utils)
   - Great Expectations: expectation suites, data docs, checkpoints
   - Custom: Python testing framework, SQL-based checks

4. Alerting:
   - Severity levels: critical (pipeline halt), warning (investigate), info (monitor)
   - Notification: channels, escalation, on-call
   - Response playbook: investigation, root cause, fix, prevention

5. Data Contracts: definition, schema registry, versioning, breaking change process
6. Monitoring Dashboard: pipeline health, test results, data freshness, quality scores
7. Best Practices: test coverage goals, naming conventions, documentation

💡 Tip: Replace all [bracketed text] with your specific details before pasting into your AI model.