Snowflake BI Estate Diagnostic PRD¶

One-line thesis¶

Before a company migrates dashboards or experiments with Streamlit in Snowflake, it should first measure its BI estate: which reports are actually used, what they cost, what business logic they contain, and which assets are worth modernising into Snowflake-native data products.

Problem statement¶

Many Snowflake customers still rely heavily on legacy BI tools such as Power BI, Tableau, Qlik, SSRS, and Excel exports. These environments often accumulate dashboard sprawl, duplicated semantic logic, expensive extracts, slow live queries, and unclear ownership.

At the same time, Snowflake is increasingly positioning itself as more than a warehouse: Streamlit in Snowflake, Cortex Analyst, Semantic Views, Dynamic Tables, Snowflake Native Apps, and governed data products all create new consumption patterns. But most companies do not know which existing BI assets should move, stay, be retired, be optimised, or become Snowflake-native applications.

The result is a modernisation gap: companies want better analytics products, lower cost, and AI-ready semantic foundations, but lack an evidence-based way to prioritise the BI estate.

Product / service concept¶

Snowflake BI Estate Diagnostic is a repeatable consulting playbook and eventual lightweight diagnostic tool that analyses a company's legacy BI estate and Snowflake telemetry to produce a ranked modernisation roadmap.

It connects:

Snowflake query history, warehouse usage, object access, users, roles, and lineage signals.
Power BI metadata, usage events, semantic models, refresh history, and report inventory.
Tableau metadata, repository/API usage data, workbooks, datasources, extracts, and user interactions.
Optional business context from interviews, ownership docs, meeting-critical reports, and executive dashboards.

The output is not a blind migration plan. It is a classification and prioritisation layer that tells the client:

Which reports should stay in BI.
Which should be optimised in place.
Which should be retired or consolidated.
Which are candidates for Streamlit/Snowflake-native data products.
Which are candidates for Cortex Analyst / governed semantic access.
Which assets should be piloted first.

Target customer¶

Companies that:

Already use Snowflake as a major warehouse or analytics platform.
Still have significant reporting estates in Power BI, Tableau, Qlik, SSRS, or Excel.
Have not yet meaningfully adopted Streamlit in Snowflake, Cortex Analyst, Semantic Views, or Snowflake-native data products.
Suspect dashboard/report sprawl but lack hard evidence.
Have duplicated definitions, slow dashboards, expensive refreshes, unclear ownership, or low trust in metrics.
Are considering analytics modernisation but do not know where to start.

Primary buyer personas¶

Head of Data / Data Platform Lead¶

Wants to reduce platform waste, improve analytics reliability, and create a defensible roadmap for Snowflake-native capabilities.

Analytics Engineering Manager¶

Wants to rationalise duplicated metrics, move logic into governed models, reduce downstream chaos, and identify where semantic modelling work has the highest leverage.

BI Manager / Reporting Lead¶

Wants to understand which dashboards are used, which are stale, and where legacy BI tooling remains the best surface.

CFO / RevOps / Operations Leader¶

Wants better operational insight, less manual reporting, and clearer return on existing Snowflake and BI platform spend.

Snowflake Platform Owner¶

Wants to identify costly BI-driven workloads, right-size warehouses, reduce runaway queries, and justify platform optimisation work.

Strategic positioning¶

This should not be positioned as:

We convert Power BI/Tableau dashboards into Streamlit apps.

That sounds like a brittle tool replacement and will trigger justified resistance from BI teams.

It should be positioned as:

We measure your BI estate and identify the highest-value opportunities to optimise, retire, consolidate, or modernise analytics assets into Snowflake-native data products.

Core phrase:

Before you migrate BI, measure the BI estate.

Goals¶

Create a repeatable methodology for assessing BI estates connected to Snowflake.
Produce an evidence-based ranking of reports, dashboards, semantic models, and datasources.
Identify cost, usage, complexity, and modernisation opportunities.
Classify assets into clear recommendation buckets.
Define a practical 2–4 week consultancy diagnostic offer.
Lay the foundation for a lightweight metadata/query-history prototype.
Build a career-relevant thought-leadership asset around Snowflake-native analytics modernisation.

Non-goals¶

Rebuild every Power BI/Tableau dashboard in Streamlit.
Claim Streamlit is a like-for-like BI replacement.
Automate migration without human review.
Replace governance, stakeholder interviews, or business context.
Build a full SaaS product before validating the consulting methodology.
Optimise Snowflake workloads without understanding business value.

Core insight¶

Power BI and Tableau are not just visualisation tools. They often contain semantic logic, ownership patterns, consumption habits, refresh strategies, and business-critical workflows.

Streamlit in Snowflake is not a universal replacement for that. Its best use is as a Snowflake-native application surface for operational, guided, parameterised, or workflow-heavy data products.

The real opportunity is the serving-layer and classification problem:

What semantic logic should move into Snowflake/dbt/semantic views?
What workloads need dynamic tables, materialized aggregates, or query contracts?
What assets belong in BI?
What assets are actually internal applications trapped inside dashboards?
What questions should be served through Cortex Analyst or governed natural language?

Inputs¶

Snowflake inputs¶

SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY
WAREHOUSE_METERING_HISTORY
LOGIN_HISTORY
USERS
ROLES
GRANTS_TO_USERS
GRANTS_TO_ROLES
Object metadata for databases, schemas, tables, views, materialized views, dynamic tables.
Query tags where available.
Warehouse naming conventions.
BI service account names.
Known dashboard/report warehouse patterns.

Power BI inputs¶

Admin Activity Events.
Workspaces.
Reports.
Datasets / semantic models.
Dataset refresh history.
Dataset lineage.
Gateway/data source metadata.
Scanner API metadata.
XMLA/TOM metadata where available.
DAX measures and model definitions where accessible.

Tableau inputs¶

Tableau Server Repository/Postgres where available.
Tableau REST API.
Tableau Metadata API.
Workbooks.
Views.
Datasources.
Extract refresh history.
User interactions.
Subscriptions/favourites.
Background jobs.

Human/contextual inputs¶

Known executive dashboards.
Known operational workflows.
Known report owners.
Pain points from BI/data teams.
Business-critical recurring meetings.
Existing governance docs.
Current Snowflake cost concerns.
Current analytics modernisation goals.

Assessment dimensions¶

Each BI asset should receive a score across five dimensions.

1. Usage value¶

Measures whether the asset is actually consumed and by whom.

Signals:

View frequency.
Unique users.
Recency of use.
Subscriptions/favourites.
Embedded usage.
Export/download behaviour.
Executive or high-value stakeholder usage.
Recurring meeting dependency.

Indicative score:

0 = unused/stale.
1 = occasional use.
2 = team-level regular use.
3 = business-critical or executive-visible.

2. Cost / performance pain¶

Measures whether the asset creates Snowflake or BI platform cost/performance issues.

Signals:

Query volume.
Warehouse credits consumed.
Large table scans.
Repeated query patterns.
Slow dashboard load times.
Expensive scheduled refreshes.
Extract failures.
Warehouse scaling caused by BI workloads.

Indicative score:

0 = cheap / low pain.
1 = moderate.
2 = expensive or slow.
3 = severe cost/performance issue.

3. Semantic complexity¶

Measures how much business logic is embedded in the BI asset.

Signals:

Number and complexity of measures.
DAX/Tableau calculations.
Joins/relationships.
Role-level security rules.
Distinct counts, ratios, percentiles, semi-additive metrics.
Business-critical definitions.
Duplicated or conflicting metrics.

Indicative score:

0 = basic reporting.
1 = some business logic.
2 = meaningful semantic layer.
3 = complex/high-risk semantic asset.

4. Snowflake-native data product suitability¶

Measures whether the asset is actually a workflow or application disguised as a dashboard.

Signals:

Parameter entry.
Scenario modelling.
Writeback or approval needs.
Guided workflows.
Operational decision support.
Repeated manual exports/manipulation.
Users asking for actions, not just visuals.
Need for app-like UX.

Indicative score:

0 = classic BI is appropriate.
1 = weak candidate.
2 = strong candidate.
3 = obvious Snowflake-native app/data product candidate.

5. Migration / change risk¶

Measures how careful any intervention needs to be.

Signals:

Executive visibility.
Regulatory/audit implications.
Fragile undocumented calculations.
Unclear ownership.
High stakeholder sensitivity.
Strong dependence on visual polish.
Complex row-level security.

Indicative score:

0 = low risk.
1 = moderate.
2 = high.
3 = do not touch without governance.

Recommendation buckets¶

Keep in BI¶

For highly polished, heavily used, visually exploratory dashboards where Power BI/Tableau remains the right consumption surface.

Recommended action:

Keep the frontend.
Improve ownership, semantics, or backend performance if needed.

Optimise in place¶

For valuable dashboards with expensive or slow Snowflake workloads.

Recommended actions:

Rewrite expensive SQL.
Add dynamic tables or materialized aggregates.
Improve clustering/partitioning strategy.
Move repeated transformations into dbt/Snowflake models.
Standardise query contracts.
Right-size warehouses.

Retire / archive¶

For stale, unused, ownerless, or duplicated reports.

Recommended actions:

Owner review.
Usage-based retirement notice.
Archive or remove after agreed window.

Consolidate¶

For duplicated reports, duplicate datasets, or conflicting metrics.

Recommended actions:

Create governed semantic model.
Consolidate duplicate workbooks/reports.
Standardise definitions.
Decommission redundant variants.

Snowflake-native data product candidate¶

For operational workflows, internal tools, guided analysis, parameterised workflows, or app-like use cases.

Recommended surfaces:

Streamlit in Snowflake.
Snowflake Native App.
Embedded React/internal app backed by Snowflake.
Snowflake tasks/dynamic tables/API endpoints as serving layer.

Cortex / semantic candidate¶

For repeated business questions and governed metrics that are better served through a semantic layer and natural-language interaction.

Recommended surfaces:

Cortex Analyst.
Snowflake Semantic Views.
Governed metric layer.
Controlled NLQ experience.

Proposed diagnostic workflow¶

Phase 1: Discovery and access¶

Objective: establish scope and gather access.

Activities:

Identify BI platforms in scope.
Identify Snowflake accounts/warehouses in scope.
Gather BI admin/API access.
Identify BI service accounts and query tags.
Gather known critical dashboards and stakeholder pain points.

Deliverables:

Scope document.
Access checklist.
Known critical asset list.

Phase 2: Metadata and telemetry extraction¶

Objective: build a joined telemetry model.

Activities:

Extract Snowflake query and warehouse usage history.
Extract BI report/workbook/dataset metadata.
Extract BI activity/usage logs.
Extract refresh/extract history.
Join BI assets to Snowflake users, warehouses, queries, and objects where possible.

Deliverables:

Raw telemetry extracts.
Normalised asset inventory.
Initial lineage/cost mapping.

Phase 3: Scoring and classification¶

Objective: rank assets by value, cost, complexity, suitability, and risk.

Activities:

Apply scoring model.
Flag high-cost/high-use assets.
Flag low-use/high-cost assets.
Flag candidate operational data products.
Flag semantic consolidation opportunities.
Flag retirement candidates.

Deliverables:

Scored BI asset table.
Recommendation bucket for each asset.
Top candidate shortlists.

Phase 4: Expert review¶

Objective: avoid false positives and validate context.

Activities:

Review top candidates with data/BI/platform owners.
Validate business criticality.
Confirm ownership.
Identify political or operational constraints.
Select pilot candidates.

Deliverables:

Validated recommendation list.
Pilot shortlist.
Risk notes.

Phase 5: Executive recommendation pack¶

Objective: produce a decision-ready roadmap.

Deliverables:

Executive summary.
BI estate inventory summary.
Usage/cost leaderboard.
Retirement/consolidation candidates.
Optimisation candidates.
Snowflake-native data product candidates.
Cortex/semantic candidates.
Recommended 4-week pilot.
Business case and estimated opportunity.
Risks and next steps.

Phase 6: Pilot implementation¶

Objective: prove the recommendation with 1–3 high-signal assets.

Possible pilot mix:

One optimise-in-place dashboard.
One retirement/consolidation case.
One Snowflake-native data product prototype.

Deliverables:

Before/after performance or cost comparison.
Prototype or improved asset.
Lessons learned.
Next-phase roadmap.

Minimum viable prototype¶

The first prototype should not attempt full Power BI/Tableau integration. It should prove the scoring concept.

MVP input options¶

CSV export of Snowflake query history.
CSV/export/API dump of BI assets and usage.
Manually curated mapping of BI assets to Snowflake users/warehouses/query tags.

MVP processing¶

Normalise BI assets into a standard table.
Aggregate Snowflake query cost/volume by user, warehouse, object, and query pattern.
Heuristically map BI assets to Snowflake usage.
Apply scoring dimensions.
Produce ranked recommendations.

MVP output¶

Scored asset table.
Recommendation bucket.
Top 10 opportunities.
Cost/usage charts.
Markdown/HTML executive summary.

Data model sketch¶

`bi_assets`¶

asset_id
platform — Power BI, Tableau, Qlik, etc.
asset_type — report, dashboard, workbook, view, dataset, datasource, semantic_model.
asset_name
workspace_or_project
owner
created_at
updated_at
last_viewed_at
url
is_certified_or_promoted

`bi_usage_events`¶

event_id
asset_id
user_id
event_type
event_timestamp
session_id

`bi_refresh_events`¶

refresh_id
asset_id
start_time
end_time
status
duration_seconds
failure_reason

`snowflake_queries`¶

query_id
query_text
user_name
role_name
warehouse_name
start_time
end_time
execution_time
bytes_scanned
credits_estimated
database_name
schema_name
query_tag

`asset_query_mapping`¶

asset_id
query_id
mapping_method — query_tag, service_account, warehouse, SQL fingerprint, datasource metadata, manual.
confidence_score

`asset_scores`¶

asset_id
usage_value_score
cost_pain_score
semantic_complexity_score
data_product_suitability_score
migration_risk_score
recommendation_bucket
rationale

Success metrics¶

For the diagnostic offer¶

Can produce a useful ranked asset list within 2–4 weeks.
Client agrees that the top 10 recommendations reflect real business pain.
At least one pilot candidate is selected from the output.
Identifies measurable cost/performance or consolidation opportunities.
Creates a credible roadmap without requiring a full migration commitment.

For a prototype tool¶

Ingests at least one Snowflake query history export and one BI metadata/usage export.
Produces deterministic scoring output.
Handles incomplete mappings with confidence scores.
Generates a readable executive summary.
Makes it easy to review and override recommendations.

For Adam's career/positioning¶

Demonstrates high-agency analytics engineering thinking.
Frames Snowflake expertise around business outcomes, not just tooling.
Creates a reusable consulting-style playbook.
Connects legacy BI modernisation to Snowflake-native apps, AI analytics, semantic layers, and cost governance.

Risks and mitigations¶

Risk: Streamlit is perceived as a BI replacement¶

Mitigation: explicitly position Streamlit as one possible consumption surface for data products, not the default recommendation.

Risk: BI usage data is incomplete¶

Mitigation: use multiple signals: activity logs, refreshes, subscriptions, Snowflake query patterns, stakeholder interviews, and manual validation.

Risk: Mapping BI assets to Snowflake queries is fuzzy¶

Mitigation: use confidence-scored mapping methods: query tags first, service accounts/warehouses second, datasource metadata third, SQL fingerprinting fourth, manual review last.

Risk: semantic logic is trapped in DAX/Tableau calculations¶

Mitigation: classify high semantic complexity as higher risk; do not auto-migrate; recommend semantic extraction and validation first.

Risk: governance and politics block retirement¶

Mitigation: treat retirement as a managed workflow with owner review, notice periods, and audit trails.

Risk: Snowflake costs are hard to attribute precisely¶

Mitigation: provide directional estimates with assumptions; separate exact metering from opportunity sizing.

Open questions¶

Should this be positioned first as a consultancy diagnostic, an internal career playbook, or a future SaaS/tooling idea?
Which initial platform should be prioritised for prototype support: Power BI or Tableau?
Should Cortex Analyst and Snowflake Semantic Views be a first-class recommendation path from day one?
What minimum access would a client need to provide for a credible diagnostic?
Should the first prototype use exported CSVs only to avoid enterprise auth complexity?
What would a compelling 4-week pilot package look like?
How much should the playbook focus on cost reduction versus data product innovation?

Recommended next step¶

Create a companion playbook that turns this PRD into a practical operating manual:

Discovery questionnaire.
Access checklist.
Snowflake SQL extraction queries.
Power BI metadata extraction checklist.
Tableau metadata extraction checklist.
Scoring rubric spreadsheet/table.
Executive readout template.
Pilot selection framework.

The PRD defines what should exist. The playbook should define how to run it with a real client or employer.

Snowflake BI Estate Diagnostic PRD¶

One-line thesis¶

Problem statement¶

Product / service concept¶

Target customer¶

Primary buyer personas¶

Head of Data / Data Platform Lead¶

Analytics Engineering Manager¶

BI Manager / Reporting Lead¶

CFO / RevOps / Operations Leader¶

Snowflake Platform Owner¶

Strategic positioning¶

Goals¶

Non-goals¶

Core insight¶

Inputs¶

Snowflake inputs¶

Power BI inputs¶

Tableau inputs¶

Human/contextual inputs¶

Assessment dimensions¶

1. Usage value¶

2. Cost / performance pain¶

3. Semantic complexity¶

4. Snowflake-native data product suitability¶

5. Migration / change risk¶

Recommendation buckets¶

Keep in BI¶

Optimise in place¶

Retire / archive¶

Consolidate¶

Snowflake-native data product candidate¶

Cortex / semantic candidate¶

Proposed diagnostic workflow¶

Phase 1: Discovery and access¶

Phase 2: Metadata and telemetry extraction¶

Phase 3: Scoring and classification¶

Phase 4: Expert review¶

Phase 5: Executive recommendation pack¶

Phase 6: Pilot implementation¶

Minimum viable prototype¶

MVP input options¶

MVP processing¶

MVP output¶

Data model sketch¶

bi_assets¶

bi_usage_events¶

bi_refresh_events¶

snowflake_queries¶

asset_query_mapping¶

asset_scores¶

Success metrics¶

For the diagnostic offer¶

For a prototype tool¶

For Adam's career/positioning¶

Risks and mitigations¶

Risk: Streamlit is perceived as a BI replacement¶

Risk: BI usage data is incomplete¶

Risk: Mapping BI assets to Snowflake queries is fuzzy¶

Risk: semantic logic is trapped in DAX/Tableau calculations¶

Risk: governance and politics block retirement¶

Risk: Snowflake costs are hard to attribute precisely¶

Open questions¶

Recommended next step¶

`bi_assets`¶

`bi_usage_events`¶

`bi_refresh_events`¶

`snowflake_queries`¶

`asset_query_mapping`¶

`asset_scores`¶