Skip to content

Redefining Data Engineering in the Age of AI

MIT Technology Review Insights report sponsored by Snowflake. Main synthesis: [[AI-Ready-Analytics-Foundations-2026]].

Source metadata

  • Title: Redefining data engineering in the age of AI
  • Publisher: MIT Technology Review Insights
  • Sponsor / partner: Snowflake
  • Author: Denis McCauley
  • Editor: Virginia Wilson
  • Methodology: June 2025 survey by MIT Technology Review Insights with Snowflake.
  • Sample: 400 CIOs, CTOs, CDOs/CAOs, CAIOs, and senior data/technology executives.
  • Organisations: $500M+ annual revenue, seven industries, ten countries.
  • Note: Treat Snowflake partner sections as vendor-framed even where the report states editorial independence.

Core thesis

AI is moving data engineering from back-office pipeline work into strategic business capability. As enterprises depend on generative, multimodal, and agentic AI, data engineers become central to AI feasibility, data governance, architecture, tool choice, and business decision-making.

The durable line: most AI projects are data engineering projects in disguise.

Key facts

  • 72% of surveyed technology leaders say data engineers are integral to business success.
  • Among organisations with more than $10B revenue, this rises to 86%.
  • Data engineers’ average time on AI projects rose from 19% in 2023 to 37% in 2025.
  • Respondents expect data engineers to spend 61% of their time on AI projects within two years.
  • 77% say data engineering workloads are becoming increasingly heavy.
  • 74% report AI improved quantity of data engineering output over the past two years.
  • 77% report AI improved quality of data engineering work.
  • 83% have begun deploying AI-based data engineering tools.
  • 73% have begun deploying generative AI; another 21% expect to within 12 months.
  • 20% have begun deploying agentic AI; 54% expect to within 12 months.
  • Top expected agentic benefits: pipeline debugging/optimisation (42%), data integration (38%), orchestration (34%), governance/compliance (33%).
  • Top advanced-AI challenges: data security/privacy (55%), real-time pipelines (37%), synthetic data quality (32%), unstructured data growth (28%), bias reduction (27%).

Durable ideas

  • β€œNo AI without data” is the operating principle: AI initiatives require reliable, governed, high-quality data foundations.
  • Data engineering is converging with architecture, platform strategy, and AI feasibility assessment.
  • Software engineering practices are now table stakes for data teams: version control, CI/CD, infrastructure as code, modularity, testing.
  • Unstructured and multimodal data are increasing data-engineering scope beyond clean warehouse tables.
  • Agentic AI shifts data engineers from hand-coding every pipeline toward managing rules, tests, budgets, orchestration, governance, and architectural constraints.
  • AI tools create productivity and complexity together: more tool sprawl, integration burden, governance risk, lock-in, and cost uncertainty.
  • Business fluency differentiates senior data engineers: the role now requires translating architecture into business outcomes.

Adam implications

  • Consulting angle: AI readiness through data foundations.
  • Move language beyond ETL and dashboards toward AI feasibility, semantic context, platform governance, and trusted decision systems.
  • Potential service: AI data-readiness audit covering data quality, lineage, access controls, warehouse/lakehouse maturity, unstructured data readiness, real-time needs, and AI-tool fragmentation.
  • Agentic workflow wedges: pipeline debugging agents, dbt/model review agents, documentation agents, lineage QA, semantic-layer validation, data-contract checks.
  • Strong executive line: β€œGovernance is an AI accelerator, not bureaucracy.”
  • [[AI-Ready-Analytics-Foundations-2026]]
  • [[AI-Agents-in-Data-Engineering]]
  • [[Agentic-Analytics-Engineering]]
  • [[DataOps-and-Data-Engineering]]
  • [[Context-Layer-for-Enterprise-AI]]
  • [[AE-Consultancy-Delivery]]