AE Consultancy Delivery¶

Consultancy craft principles for Analytics Engineering — dbt practices, cloud platform choices, privacy & ethics, client engagement, and knowledge transfer. Grounded in dbt Labs, Snowflake well-architected framework, GDPR, and Gartner maturity models.

251|## 5. Analytics Engineering Principles 252| 253|> "Analytics engineering is the data transformation layer between raw data and business decisions." — dbt Labs 254| 255|Analytics engineering (AE) is where data architecture, software engineering, and domain knowledge converge. These principles define how AE practitioners should work, particularly with dbt. 256| 257|### 5.1 dbt Model Organisation 258| 259|Principle: Models are organised by function and domain, not by source or team. 260| 261|- Three-tier structure (raw → staging → mart): See §3.4. Never skip layers. Each layer has a defined purpose and constraint. 262|- Domain-oriented marts: Organise mart models by business domain (finance, marketing, operations), not by source system. Domains are what business users understand. 263|- One thing per model: A model does one logical transformation. If a model does three joins and an aggregation, consider splitting into intermediate + mart models. 264|- Naming conventions: 265| - Staging: stg_[source]_[entity] (e.g., stg_stripe__charges) 266| - Intermediate: int_[domain]_[description] (e.g., int_marketing__campaign_performance) 267| - Mart: [domain]_[entity]_[grain] (e.g., fct_orders, dim_customers) 268| 269|### 5.2 dbt Testing 270| 271|Principle: Testing is the primary mechanism for ensuring data quality at the transformation layer. 272| 273|- Minimum test coverage: 274| - Every model: unique on primary key, not_null on primary key and critical business fields 275| - Every foreign key: relationships test 276| - Every status/boolean field: accepted_values test 277|- Custom generic tests for domain rules: Write reusable tests for common business patterns (positive revenue, valid date ranges, referential integrity with conditions). 278|- Store test severity: Not all tests are equal. Use error_if and warn_if to differentiate hard failures from soft warnings. 279|- Test results are part of the data product: Publish test results. Consumers should be able to see data quality scores alongside the data itself. 280| 281|### 5.3 dbt Documentation 282| 283|Principle: If a model isn't documented, it isn't ready for production. 284| 285|- Every model and column described: Use description fields in properties.yml. Business descriptions, not technical ones. "Amount paid by customer in GBP" not "numeric(18,2) column." 286|- Documentation as code, not wiki: Docs live in YAML alongside models. Not in Confluence. Not in READMEs that no one reads. Generate static docs with dbt docs generate. 287|- Document assumptions and caveits: If a model excludes certain edge cases (e.g., "excludes test accounts"), document it. Future-you will thank present-you. 288| 289|### 5.4 dbt Macros and Packages 290| 291|Principle: Macros and packages reduce duplication, enforce consistency, and encode institutional knowledge. 292| 293|- DRY principle (Don't Repeat Yourself): If you write the same SQL logic more than twice, it should be a macro. Common patterns: date truncation, surrogate key generation, SCD logic. 294|- Local macros before packages: If a macro is domain-specific, keep it in the project. If it's broadly useful, consider open-sourcing it. Only use packages for truly universal patterns. 295|- Evaluate packages critically: dbt_utils and dbt_expectations are near-essential. Beyond these, justify each package. Every dependency is a maintenance burden. 296|- Pin package versions: Always specify exact versions in packages.yml. Unpinned dependencies will break your project. 297| 298|### 5.5 Metric Layer and Semantic Layer 299| 300|Principle: Business metrics are defined once, computed consistently, and served everywhere. 301| 302|- The metric layer problem: Without a central metric definition, "revenue" means different things in different dashboards. This is the root cause of most "whose number is right?" disputes. 303|- dbt Semantic Layer / metrics: Use dbt's metric definitions (in YAML) as the canonical source. Metrics should have clear descriptions, dimensions, filters, and calculation methods. 304|- Metrics as code: Metric definitions are version-controlled, tested, and reviewed like any other code. They are the contract between AE and the business. 305|- From semantic layer to consumption: The semantic layer feeds BI tools, ad-hoc queries, and operational systems. It is the single point of metric truth. 306| 307|--- 308| 309|## 6. Cloud Data Platform Principles 310| 311|> "Cloud-native data platforms maximise the separation of compute and storage, the elasticity of resources, and the governance of access." — Snowflake Architecture Principles 312| 313|These principles guide the selection and architecture of cloud data platforms, with a focus on the modern warehouse-first approach exemplified by Snowflake, BigQuery, and Databricks. 314| 315|### 6.1 Cloud-Native 316| 317|Principle: Design for the cloud's strengths — elasticity, managed services, and consumption-based pricing — not by replicating on-premises patterns. 318| 319|- Serverless where sensible: Use serverless compute (Snowflake auto-suspend, BigQuery on-demand) for variable workloads. Reserve provisioned compute for predictable, high-throughput pipelines. 320|- Managed services over self-hosted: Prefer managed Airflow (MWAA), managed dbt (dbt Cloud), and managed observability over self-hosted equivalents. Reduce operational overhead, not just cost. 321|- Resist the lift-and-shift temptation: Moving on-premises SSIS packages to EC2 is not cloud-native. Redesign pipelines for ELT patterns and cloud-native services. 322| 323|### 6.2 Warehouse-First Architecture 324| 325|Principle: The data warehouse is the central processing engine; the cloud provides storage, compute, and services around it. 326| 327|- Warehouse as transformation engine: All transformation logic (joins, aggregations, business rules) executes in the warehouse using SQL. Do not transform in Python scripts, Airflow tasks, or custom services. 328|- ELT, not ETL: Extract raw data, load it into the warehouse, then transform with dbt. The warehouse's distributed compute is far more powerful than any mid-pipeline transformation layer. 329|- Warehouse governs access: Role-based access, row-level security, and column masking are enforced in the warehouse. Not in the BI tool, not in the API layer, not in application logic. 330| 331|### 6.3 Compute-Storage Separation 332| 333|Principle: Storage and compute scale independently; never over-provision one to accommodate the other. 334| 335|- Snowflake virtual warehouses: Size warehouses to the workload. XS for dbt models, XL for large aggregations, auto-suspend after idle periods. One warehouse does not fit all. 336|- Storage is cheap, compute is expensive: Never optimise storage at the cost of compute complexity. Store data in its most usable form (wide, denormalised marts) even if it uses more disk. 337|- Multi-cluster for concurrency: Use Snowflake's multi-cluster warehouses for concurrent BI query patterns. Don't queue queries unnecessarily. 338| 339|### 6.4 Role-Based Access Control (RBAC) 340| 341|Principle: Access is granted to roles, not individuals. Roles follow the principle of least privilege and align with organisational structure. 342| 343|- Functional roles: DATA_ENGINEER, ANALYST, READONLY_REPORTER. Not JANE_DOE. 344|- Warehouse-role binding: Analysts should not run queries on the engineering warehouse. Bind roles to appropriately sized warehouses. 345|- Data classification drives access: PII data is restricted to PII_READER roles. Financial data to FINANCE_ANALYST. Public data to READONLY. Classify first, grant second. 346|- Row-level security for multi-tenant data: Use dynamic data masking and row-access policies (Snowflake) to restrict visibility without duplicating tables. 347| 348|### 6.5 Cost Optimisation 349| 350|Principle: Cloud data costs are variable and controllable; treat cost as a first-class architectural concern. 351| 352|- Monitor and attribute costs: Tag queries by team, project, and cost centre. Snowflake's QUERY_HISTORY and resource monitors make this possible. 353|- Right-size compute: Start small (XS), measure, then scale up. Over-provisioned warehouses waste more than under-provisioned ones (which just run slower). 354|- Caching is free performance: Clustering keys and result caching in Snowflake are free. Use them before scaling up compute. 355|- Credit budgets, not unlimited spending: Set resource monitors with hard and soft limits. Alert at 75%, suspend at 100%. No surprise bills. 356|- Optimise the top 10 queries: In most platforms, 10% of queries cause 90% of cost. Identify and optimise these first. 357| 358|--- 359| 360|## 7. Privacy & Ethics Principles 361| 362|> "Privacy is not the opposite of analytics — poorly designed analytics is the opposite of good analytics." — Adapted from GDPR Recital 78 363| 364|Privacy and ethics are not regulatory afterthoughts — they are design constraints that, when properly integrated, produce better, more trustworthy data products. 365| 366|### 7.1 GDPR and Regulation by Design 367| 368|Principle: Data protection is embedded into the design and operation of data systems from the outset, not bolted on at the end. 369| 370|- Privacy by design (Article 25, GDPR): Collect only what's needed, process only what's justified, retain only as long as required. Build these constraints into dbt models (filter PII in staging where possible). 371|- Data Protection Impact Assessments (DPIAs): Before building a new data product that processes personal data, conduct a DPIA. Document the purpose, necessity, and proportionality. 372|- Lawful bases matter: Know your lawful basis for each processing activity (consent, contract, legitimate interest, etc.) and document it in your data catalogue. 373|- Regulation varies by jurisdiction: UK GDPR, EU GDPR, CCPA, and others have different requirements. Design for the strictest applicable regulation and localise where necessary. 374| 375|### 7.2 Data Minimisation 376| 377|Principle: Collect and retain only the data that is strictly necessary for the stated purpose. 378| 379|- Minimise at ingestion: When loading from source systems, exclude columns you don't need. If you'll never use the middle_name column, don't load it. 380|- Minimise at transformation: Mart models consumed by analysts should contain only the columns needed for analysis. A customer dimension for marketing doesn't need national_insurance_number. 381|- Aggregation as privacy tool: Aggregate data where individual-level records aren't needed. A daily sales summary is often sufficient and inherently less risky. 382|- Document minimisation decisions: If you exclude PII from a model, note it in the model description. This is evidence of responsible design. 383| 384|### 7.3 Right to Erasure 385| 386|Principle: Data subjects have the right to request deletion of their personal data; systems must be able to comply efficiently. 387| 388|- Design for deletability: If a system cannot efficiently delete an individual's data, it was not designed for compliance. Ensure every PII-bearing model can be filtered or rebuilt for a specific subject. 389|- Cascading deletes: When personal data is deleted from a source, the deletion must propagate through staging, intermediate, and mart models. Re-running dbt after source deletion achieves this. 390|- Retention policies: Define and enforce data retention periods. Archive or anonymise data that exceeds its retention window. Automate this with dbt models or warehouse tasks. 391|- Audit trail of erasure: Log erasure requests and completions. Regulators and data subjects may ask for proof. 392| 393|### 7.4 Consent Management 394| 395|Principle: Where consent is the lawful basis, it must be freely given, specific, informed, and unambiguous — and withdrawable. 396| 397|- Consent as metadata: Store consent flags with the data they govern. A marketing consent flag should travel with the customer record, not be a separate lookup. 398|- Consent withdrawal must cascade: If a subject withdraws consent for marketing, their data must be excluded from marketing models immediately. Build this as a filter in staging, not a manual process. 399|- Don't process without consent metadata: If you cannot verify consent status for a record, treat it as unconsented. Err on the side of non-processing. 400| 401|--- 402| 403|## 8. Consultancy Delivery Principles 404| 405|> "A consultancy delivers value through expertise transfer, not expertise dependency." 406| 407|These principles govern how an Analytics Engineering consultancy engages clients, delivers work, and ensures lasting impact. 408| 409|### 8.1 Client Engagement 410| 411|Principle: Engagement begins with understanding the client's business context, not their technology stack. 412| 413|- Listen before architecting: Understand the business problem, the decision landscape, and the organisational culture before discussing Snowflake vs. BigQuery. 414|- Stakeholder mapping: Identify data producers, data consumers, executive sponsors, and potential blockers. The technical solution is often easier than the political one. 415|- Language matters: Speak the client's language. Finance teams think in margins and variances, not in fact and dimension tables. Translate accordingly. 416|- Set expectations early: Be clear about what AE can deliver, what requires upstream changes, and what is genuinely a people problem disguised as a data problem. 417| 418|### 8.2 Maturity Assessment 419| 420|Principle: Before proposing solutions, assess where the client sits on the data maturity spectrum. 421| 422|- Five maturity levels: 423| 1. Initial: Ad-hoc, no processes, hero-dependent. 424| 2. Developing: Some tools, limited standardisation, quality is inconsistent. 425| 3. Defined: Standard processes exist, but compliance varies. 426| 4. Managed: Processes are measured, monitored, and enforced. 427| 5. Optimised: Continuous improvement, automated governance, data-driven culture. 428|- Assess all dimensions: Technology alone is not maturity. Assess people (skills, culture), process (governance, workflows), data (quality, architecture), and technology (tooling, infrastructure). 429|- Diagnose the bottleneck: Most clients have one critical constraint. Identify it. A dbt implementation doesn't help if the constraint is "we have no source data pipeline." 430| 431|### 8.3 Phased Delivery 432| 433|Principle: Deliver in small, valuable increments — not in a big bang at the end. 434| 435|- Phase 0 — Discovery (1–2 weeks): Stakeholder interviews, current-state assessment, architecture review, maturity assessment. Deliverable: discovery report and roadmap. 436|- Phase 1 — Foundation (4–6 weeks): Source integration, raw layer, basic dbt project, CI/CD pipeline, initial tests. Deliverable: working end-to-end pipeline with 1–2 critical data products. 437|- Phase 2 — Acceleration (4–8 weeks): Expand marts, implement governance, add monitoring, train the client team. Deliverable: production-ready platform with documented processes. 438|- Phase 3 — Optimisation (ongoing): Performance tuning, advanced features, self-service enablement, knowledge transfer. Deliverable: client independence. 439|- Never ship without value: Each phase must deliver something the client can use. No "we spent six weeks and here's a framework." 440| 441|### 8.4 Knowledge Transfer 442| 443|Principle: The consultancy's success is measured by how little the client needs the consultancy after engagement. 444| 445|- Pair programming over documentation alone: Sit with client engineers and analysts. Walk through code together. Knowledge transfer is a social process, not a document. 446|- Train the trainers: Identify internal champions and invest disproportionately in them. They will propagate knowledge after you leave. 447|- Gradual responsibility transfer: Phase from "consultant does, client watches" to "client does, consultant watches" to "client does, consultant reviews" to "client is independent." 448| 449|### 8.5 Documentation as Deliverable 450| 451|Principle: Documentation is not an afterthought — it is a first-class deliverable, as important as the code. 452| 453|- Four documentation types: 454| 1. Architecture Decision Records (ADRs): Why we chose X over Y, with context and consequences. 455| 2. Runbooks: How to operate the platform. Incident response, pipeline restarts, monitoring interpretation. 456| 3. Data Dictionary: What each model means, its grain, freshness SLA, and known issues. 457| 4. Onboarding Guide: How a new team member gets started. Environment setup, access requests, where things live. 458|- Documentation as code: dbt YAML, README.md, and ADRs in Git. Not Word documents in a shared drive. Version-controlled documentation ages slower. 459|- Client signs off on documentation: Make documentation acceptance an explicit milestone. If the client doesn't accept the docs, the engagement isn't complete. 460| 461|---

501|

Part of [[Data-Principles-for-Analytics-Engineering]]. See also: [[Data-Quality-and-Governance]], [[Data-Architecture]], [[DataOps-and-Data-Engineering]].

Sub-pages¶

[[AE-Consultancy-Delivery — Sub-pages]]