Skip to content

Snowflake BI Estate Diagnostic PRD

One-line thesis

Before a company migrates dashboards or experiments with Streamlit in Snowflake, it should first measure its BI estate: which reports are actually used, what they cost, what business logic they contain, and which assets are worth modernising into Snowflake-native data products.

Problem statement

Many Snowflake customers still rely heavily on legacy BI tools such as Power BI, Tableau, Qlik, SSRS, and Excel exports. These environments often accumulate dashboard sprawl, duplicated semantic logic, expensive extracts, slow live queries, and unclear ownership.

At the same time, Snowflake is increasingly positioning itself as more than a warehouse: Streamlit in Snowflake, Cortex Analyst, Semantic Views, Dynamic Tables, Snowflake Native Apps, and governed data products all create new consumption patterns. But most companies do not know which existing BI assets should move, stay, be retired, be optimised, or become Snowflake-native applications.

The result is a modernisation gap: companies want better analytics products, lower cost, and AI-ready semantic foundations, but lack an evidence-based way to prioritise the BI estate.

Product / service concept

Snowflake BI Estate Diagnostic is a repeatable consulting playbook and eventual lightweight diagnostic tool that analyses a company's legacy BI estate and Snowflake telemetry to produce a ranked modernisation roadmap.

It connects:

  • Snowflake query history, warehouse usage, object access, users, roles, and lineage signals.
  • Power BI metadata, usage events, semantic models, refresh history, and report inventory.
  • Tableau metadata, repository/API usage data, workbooks, datasources, extracts, and user interactions.
  • Optional business context from interviews, ownership docs, meeting-critical reports, and executive dashboards.

The output is not a blind migration plan. It is a classification and prioritisation layer that tells the client:

  • Which reports should stay in BI.
  • Which should be optimised in place.
  • Which should be retired or consolidated.
  • Which are candidates for Streamlit/Snowflake-native data products.
  • Which are candidates for Cortex Analyst / governed semantic access.
  • Which assets should be piloted first.

Target customer

Companies that:

  • Already use Snowflake as a major warehouse or analytics platform.
  • Still have significant reporting estates in Power BI, Tableau, Qlik, SSRS, or Excel.
  • Have not yet meaningfully adopted Streamlit in Snowflake, Cortex Analyst, Semantic Views, or Snowflake-native data products.
  • Suspect dashboard/report sprawl but lack hard evidence.
  • Have duplicated definitions, slow dashboards, expensive refreshes, unclear ownership, or low trust in metrics.
  • Are considering analytics modernisation but do not know where to start.

Primary buyer personas

Head of Data / Data Platform Lead

Wants to reduce platform waste, improve analytics reliability, and create a defensible roadmap for Snowflake-native capabilities.

Analytics Engineering Manager

Wants to rationalise duplicated metrics, move logic into governed models, reduce downstream chaos, and identify where semantic modelling work has the highest leverage.

BI Manager / Reporting Lead

Wants to understand which dashboards are used, which are stale, and where legacy BI tooling remains the best surface.

CFO / RevOps / Operations Leader

Wants better operational insight, less manual reporting, and clearer return on existing Snowflake and BI platform spend.

Snowflake Platform Owner

Wants to identify costly BI-driven workloads, right-size warehouses, reduce runaway queries, and justify platform optimisation work.

Strategic positioning

This should not be positioned as:

We convert Power BI/Tableau dashboards into Streamlit apps.

That sounds like a brittle tool replacement and will trigger justified resistance from BI teams.

It should be positioned as:

We measure your BI estate and identify the highest-value opportunities to optimise, retire, consolidate, or modernise analytics assets into Snowflake-native data products.

Core phrase:

Before you migrate BI, measure the BI estate.

Goals

  1. Create a repeatable methodology for assessing BI estates connected to Snowflake.
  2. Produce an evidence-based ranking of reports, dashboards, semantic models, and datasources.
  3. Identify cost, usage, complexity, and modernisation opportunities.
  4. Classify assets into clear recommendation buckets.
  5. Define a practical 2–4 week consultancy diagnostic offer.
  6. Lay the foundation for a lightweight metadata/query-history prototype.
  7. Build a career-relevant thought-leadership asset around Snowflake-native analytics modernisation.

Non-goals

  • Rebuild every Power BI/Tableau dashboard in Streamlit.
  • Claim Streamlit is a like-for-like BI replacement.
  • Automate migration without human review.
  • Replace governance, stakeholder interviews, or business context.
  • Build a full SaaS product before validating the consulting methodology.
  • Optimise Snowflake workloads without understanding business value.

Core insight

Power BI and Tableau are not just visualisation tools. They often contain semantic logic, ownership patterns, consumption habits, refresh strategies, and business-critical workflows.

Streamlit in Snowflake is not a universal replacement for that. Its best use is as a Snowflake-native application surface for operational, guided, parameterised, or workflow-heavy data products.

The real opportunity is the serving-layer and classification problem:

  • What semantic logic should move into Snowflake/dbt/semantic views?
  • What workloads need dynamic tables, materialized aggregates, or query contracts?
  • What assets belong in BI?
  • What assets are actually internal applications trapped inside dashboards?
  • What questions should be served through Cortex Analyst or governed natural language?

Inputs

Snowflake inputs

  • SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
  • SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY
  • WAREHOUSE_METERING_HISTORY
  • LOGIN_HISTORY
  • USERS
  • ROLES
  • GRANTS_TO_USERS
  • GRANTS_TO_ROLES
  • Object metadata for databases, schemas, tables, views, materialized views, dynamic tables.
  • Query tags where available.
  • Warehouse naming conventions.
  • BI service account names.
  • Known dashboard/report warehouse patterns.

Power BI inputs

  • Admin Activity Events.
  • Workspaces.
  • Reports.
  • Datasets / semantic models.
  • Dataset refresh history.
  • Dataset lineage.
  • Gateway/data source metadata.
  • Scanner API metadata.
  • XMLA/TOM metadata where available.
  • DAX measures and model definitions where accessible.

Tableau inputs

  • Tableau Server Repository/Postgres where available.
  • Tableau REST API.
  • Tableau Metadata API.
  • Workbooks.
  • Views.
  • Datasources.
  • Extract refresh history.
  • User interactions.
  • Subscriptions/favourites.
  • Background jobs.

Human/contextual inputs

  • Known executive dashboards.
  • Known operational workflows.
  • Known report owners.
  • Pain points from BI/data teams.
  • Business-critical recurring meetings.
  • Existing governance docs.
  • Current Snowflake cost concerns.
  • Current analytics modernisation goals.

Assessment dimensions

Each BI asset should receive a score across five dimensions.

1. Usage value

Measures whether the asset is actually consumed and by whom.

Signals:

  • View frequency.
  • Unique users.
  • Recency of use.
  • Subscriptions/favourites.
  • Embedded usage.
  • Export/download behaviour.
  • Executive or high-value stakeholder usage.
  • Recurring meeting dependency.

Indicative score:

  • 0 = unused/stale.
  • 1 = occasional use.
  • 2 = team-level regular use.
  • 3 = business-critical or executive-visible.

2. Cost / performance pain

Measures whether the asset creates Snowflake or BI platform cost/performance issues.

Signals:

  • Query volume.
  • Warehouse credits consumed.
  • Large table scans.
  • Repeated query patterns.
  • Slow dashboard load times.
  • Expensive scheduled refreshes.
  • Extract failures.
  • Warehouse scaling caused by BI workloads.

Indicative score:

  • 0 = cheap / low pain.
  • 1 = moderate.
  • 2 = expensive or slow.
  • 3 = severe cost/performance issue.

3. Semantic complexity

Measures how much business logic is embedded in the BI asset.

Signals:

  • Number and complexity of measures.
  • DAX/Tableau calculations.
  • Joins/relationships.
  • Role-level security rules.
  • Distinct counts, ratios, percentiles, semi-additive metrics.
  • Business-critical definitions.
  • Duplicated or conflicting metrics.

Indicative score:

  • 0 = basic reporting.
  • 1 = some business logic.
  • 2 = meaningful semantic layer.
  • 3 = complex/high-risk semantic asset.

4. Snowflake-native data product suitability

Measures whether the asset is actually a workflow or application disguised as a dashboard.

Signals:

  • Parameter entry.
  • Scenario modelling.
  • Writeback or approval needs.
  • Guided workflows.
  • Operational decision support.
  • Repeated manual exports/manipulation.
  • Users asking for actions, not just visuals.
  • Need for app-like UX.

Indicative score:

  • 0 = classic BI is appropriate.
  • 1 = weak candidate.
  • 2 = strong candidate.
  • 3 = obvious Snowflake-native app/data product candidate.

5. Migration / change risk

Measures how careful any intervention needs to be.

Signals:

  • Executive visibility.
  • Regulatory/audit implications.
  • Fragile undocumented calculations.
  • Unclear ownership.
  • High stakeholder sensitivity.
  • Strong dependence on visual polish.
  • Complex row-level security.

Indicative score:

  • 0 = low risk.
  • 1 = moderate.
  • 2 = high.
  • 3 = do not touch without governance.

Recommendation buckets

Keep in BI

For highly polished, heavily used, visually exploratory dashboards where Power BI/Tableau remains the right consumption surface.

Recommended action:

  • Keep the frontend.
  • Improve ownership, semantics, or backend performance if needed.

Optimise in place

For valuable dashboards with expensive or slow Snowflake workloads.

Recommended actions:

  • Rewrite expensive SQL.
  • Add dynamic tables or materialized aggregates.
  • Improve clustering/partitioning strategy.
  • Move repeated transformations into dbt/Snowflake models.
  • Standardise query contracts.
  • Right-size warehouses.

Retire / archive

For stale, unused, ownerless, or duplicated reports.

Recommended actions:

  • Owner review.
  • Usage-based retirement notice.
  • Archive or remove after agreed window.

Consolidate

For duplicated reports, duplicate datasets, or conflicting metrics.

Recommended actions:

  • Create governed semantic model.
  • Consolidate duplicate workbooks/reports.
  • Standardise definitions.
  • Decommission redundant variants.

Snowflake-native data product candidate

For operational workflows, internal tools, guided analysis, parameterised workflows, or app-like use cases.

Recommended surfaces:

  • Streamlit in Snowflake.
  • Snowflake Native App.
  • Embedded React/internal app backed by Snowflake.
  • Snowflake tasks/dynamic tables/API endpoints as serving layer.

Cortex / semantic candidate

For repeated business questions and governed metrics that are better served through a semantic layer and natural-language interaction.

Recommended surfaces:

  • Cortex Analyst.
  • Snowflake Semantic Views.
  • Governed metric layer.
  • Controlled NLQ experience.

Proposed diagnostic workflow

Phase 1: Discovery and access

Objective: establish scope and gather access.

Activities:

  • Identify BI platforms in scope.
  • Identify Snowflake accounts/warehouses in scope.
  • Gather BI admin/API access.
  • Identify BI service accounts and query tags.
  • Gather known critical dashboards and stakeholder pain points.

Deliverables:

  • Scope document.
  • Access checklist.
  • Known critical asset list.

Phase 2: Metadata and telemetry extraction

Objective: build a joined telemetry model.

Activities:

  • Extract Snowflake query and warehouse usage history.
  • Extract BI report/workbook/dataset metadata.
  • Extract BI activity/usage logs.
  • Extract refresh/extract history.
  • Join BI assets to Snowflake users, warehouses, queries, and objects where possible.

Deliverables:

  • Raw telemetry extracts.
  • Normalised asset inventory.
  • Initial lineage/cost mapping.

Phase 3: Scoring and classification

Objective: rank assets by value, cost, complexity, suitability, and risk.

Activities:

  • Apply scoring model.
  • Flag high-cost/high-use assets.
  • Flag low-use/high-cost assets.
  • Flag candidate operational data products.
  • Flag semantic consolidation opportunities.
  • Flag retirement candidates.

Deliverables:

  • Scored BI asset table.
  • Recommendation bucket for each asset.
  • Top candidate shortlists.

Phase 4: Expert review

Objective: avoid false positives and validate context.

Activities:

  • Review top candidates with data/BI/platform owners.
  • Validate business criticality.
  • Confirm ownership.
  • Identify political or operational constraints.
  • Select pilot candidates.

Deliverables:

  • Validated recommendation list.
  • Pilot shortlist.
  • Risk notes.

Phase 5: Executive recommendation pack

Objective: produce a decision-ready roadmap.

Deliverables:

  • Executive summary.
  • BI estate inventory summary.
  • Usage/cost leaderboard.
  • Retirement/consolidation candidates.
  • Optimisation candidates.
  • Snowflake-native data product candidates.
  • Cortex/semantic candidates.
  • Recommended 4-week pilot.
  • Business case and estimated opportunity.
  • Risks and next steps.

Phase 6: Pilot implementation

Objective: prove the recommendation with 1–3 high-signal assets.

Possible pilot mix:

  • One optimise-in-place dashboard.
  • One retirement/consolidation case.
  • One Snowflake-native data product prototype.

Deliverables:

  • Before/after performance or cost comparison.
  • Prototype or improved asset.
  • Lessons learned.
  • Next-phase roadmap.

Minimum viable prototype

The first prototype should not attempt full Power BI/Tableau integration. It should prove the scoring concept.

MVP input options

  • CSV export of Snowflake query history.
  • CSV/export/API dump of BI assets and usage.
  • Manually curated mapping of BI assets to Snowflake users/warehouses/query tags.

MVP processing

  • Normalise BI assets into a standard table.
  • Aggregate Snowflake query cost/volume by user, warehouse, object, and query pattern.
  • Heuristically map BI assets to Snowflake usage.
  • Apply scoring dimensions.
  • Produce ranked recommendations.

MVP output

  • Scored asset table.
  • Recommendation bucket.
  • Top 10 opportunities.
  • Cost/usage charts.
  • Markdown/HTML executive summary.

Data model sketch

bi_assets

  • asset_id
  • platform β€” Power BI, Tableau, Qlik, etc.
  • asset_type β€” report, dashboard, workbook, view, dataset, datasource, semantic_model.
  • asset_name
  • workspace_or_project
  • owner
  • created_at
  • updated_at
  • last_viewed_at
  • url
  • is_certified_or_promoted

bi_usage_events

  • event_id
  • asset_id
  • user_id
  • event_type
  • event_timestamp
  • session_id

bi_refresh_events

  • refresh_id
  • asset_id
  • start_time
  • end_time
  • status
  • duration_seconds
  • failure_reason

snowflake_queries

  • query_id
  • query_text
  • user_name
  • role_name
  • warehouse_name
  • start_time
  • end_time
  • execution_time
  • bytes_scanned
  • credits_estimated
  • database_name
  • schema_name
  • query_tag

asset_query_mapping

  • asset_id
  • query_id
  • mapping_method β€” query_tag, service_account, warehouse, SQL fingerprint, datasource metadata, manual.
  • confidence_score

asset_scores

  • asset_id
  • usage_value_score
  • cost_pain_score
  • semantic_complexity_score
  • data_product_suitability_score
  • migration_risk_score
  • recommendation_bucket
  • rationale

Success metrics

For the diagnostic offer

  • Can produce a useful ranked asset list within 2–4 weeks.
  • Client agrees that the top 10 recommendations reflect real business pain.
  • At least one pilot candidate is selected from the output.
  • Identifies measurable cost/performance or consolidation opportunities.
  • Creates a credible roadmap without requiring a full migration commitment.

For a prototype tool

  • Ingests at least one Snowflake query history export and one BI metadata/usage export.
  • Produces deterministic scoring output.
  • Handles incomplete mappings with confidence scores.
  • Generates a readable executive summary.
  • Makes it easy to review and override recommendations.

For Adam's career/positioning

  • Demonstrates high-agency analytics engineering thinking.
  • Frames Snowflake expertise around business outcomes, not just tooling.
  • Creates a reusable consulting-style playbook.
  • Connects legacy BI modernisation to Snowflake-native apps, AI analytics, semantic layers, and cost governance.

Risks and mitigations

Risk: Streamlit is perceived as a BI replacement

Mitigation: explicitly position Streamlit as one possible consumption surface for data products, not the default recommendation.

Risk: BI usage data is incomplete

Mitigation: use multiple signals: activity logs, refreshes, subscriptions, Snowflake query patterns, stakeholder interviews, and manual validation.

Risk: Mapping BI assets to Snowflake queries is fuzzy

Mitigation: use confidence-scored mapping methods: query tags first, service accounts/warehouses second, datasource metadata third, SQL fingerprinting fourth, manual review last.

Risk: semantic logic is trapped in DAX/Tableau calculations

Mitigation: classify high semantic complexity as higher risk; do not auto-migrate; recommend semantic extraction and validation first.

Risk: governance and politics block retirement

Mitigation: treat retirement as a managed workflow with owner review, notice periods, and audit trails.

Risk: Snowflake costs are hard to attribute precisely

Mitigation: provide directional estimates with assumptions; separate exact metering from opportunity sizing.

Open questions

  1. Should this be positioned first as a consultancy diagnostic, an internal career playbook, or a future SaaS/tooling idea?
  2. Which initial platform should be prioritised for prototype support: Power BI or Tableau?
  3. Should Cortex Analyst and Snowflake Semantic Views be a first-class recommendation path from day one?
  4. What minimum access would a client need to provide for a credible diagnostic?
  5. Should the first prototype use exported CSVs only to avoid enterprise auth complexity?
  6. What would a compelling 4-week pilot package look like?
  7. How much should the playbook focus on cost reduction versus data product innovation?

Create a companion playbook that turns this PRD into a practical operating manual:

  1. Discovery questionnaire.
  2. Access checklist.
  3. Snowflake SQL extraction queries.
  4. Power BI metadata extraction checklist.
  5. Tableau metadata extraction checklist.
  6. Scoring rubric spreadsheet/table.
  7. Executive readout template.
  8. Pilot selection framework.

The PRD defines what should exist. The playbook should define how to run it with a real client or employer.