Skip to content

Snowflake BI Estate Diagnostic — Operating Playbook

Companion to: [[Snowflake BI Estate Diagnostic PRD]]

This playbook is a practical field guide for delivering the diagnostic on a real engagement. It contains templates, SQL, checklists, and step-by-step instructions.


How to use this playbook

  1. Copy the relevant sections into your client project workspace.
  2. Run through phases in order. Do not skip Phase 1 (access) — incomplete access wrecks every later phase.
  3. Adapt the SQL and templates to the client’s Snowflake edition, BI platform version, and governance posture.
  4. The scoring is designed to be opinionated but overridable. Use your judgment on edge cases.
  5. This playbook assumes a 2–4 week timeline. If the client wants a two-day version, reduce to Snowflake-only analysis, skip BI metadata deep-dives, and focus on the cost/usage leaderboard.

Phase 0: Pre-Engagement Discovery Questionnaire

Send this to the client before Week 1 kickoff. The goal is to surface scope, pain, and access blockers early.

Part A: Context (5–10 minutes)

  1. Which BI platforms are you actively using? (Check all that apply.)
  2. Power BI (Service / Report Server / Fabric)
  3. Tableau Server / Tableau Cloud
  4. Qlik Sense / Qlik Cloud
  5. Looker / Looker Studio
  6. SSRS / Paginated Reports
  7. Excel / Google Sheets exports as primary reporting
  8. Other: __

  9. How long have you been on Snowflake?

  10. < 6 months
  11. 6–18 months
  12. 18–36 months
  13. 3+ years

  14. Do you have a current Snowflake cost or performance concern?

  15. Yes — we know BI workloads drive significant cost.
  16. Yes — but we don’t know which workloads.
  17. Somewhat.
  18. No.

  19. Have you explored Streamlit in Snowflake, Cortex Analyst, or Snowflake-native data products?

  20. Yes — actively building / evaluating.
  21. Yes — explored but paused.
  22. Heard of it, never evaluated.
  23. No.

  24. What is your primary analytics modernisation pain?

  25. Dashboard/report sprawl — too many reports, unclear value.
  26. Expensive Snowflake queries from BI tools.
  27. Duplicated metrics across tools.
  28. Slow dashboard/extract refresh times.
  29. Lack of trust in reported numbers.
  30. Want AI/natural-language analytics but data is messy.
  31. Other: __

Part B: Scope and Access (5–10 minutes)

  1. Which Snowflake account(s) and region(s) should be in scope?
  2. Account/Region: ______
  3. Optional secondary: ______

  4. Can you provide a user with the following Snowflake roles?

  5. ACCOUNTADMIN or ORGADMIN
  6. SECURITYADMIN (for user/role metadata)
  7. Read-only role on SNOWFLAKE database ACCOUNT_USAGE schemas
  8. We need to confirm — which IT/data platform contact can approve? : __

  9. For Power BI, which of the following can you provide?

  10. Power BI Admin portal access.
  11. Microsoft Entra (Azure AD) tenant-level admin.
  12. Power BI Service Principal app registration (for API access).
  13. Power BI Fabric/Scanner API access.
  14. We do not currently have admin-level access; owner is: __
  15. Not in scope.

  16. For Tableau, which of the following can you provide?

  17. Tableau Server admin (including PostgreSQL repository access).
  18. Tableau Cloud admin.
  19. Tableau REST API credentials/token.
  20. Tableau Metadata API access.
  21. We do not currently have admin-level access; owner is: __
  22. Not in scope.

  23. Are there any reports or dashboards the business considers untouchable?

    • Please list critical/executive dashboards: __
  24. Who should review recommendations before they are presented to leadership?

    • BI lead / manager: __
    • Data platform lead: __
    • Analytics engineering lead: __
    • CFO / Finance lead: __
    • Other: __

Part C: Known Pain Signals (Optional — helps prioritise)

  1. Name 3–5 dashboards or reports you suspect are expensive, unused, duplicated, or problematic:






  2. Are there functional areas where BI usage is especially high or especially painful?

    • E.g., Finance, Sales, Marketing, Operations, Product: __

Phase 1: Access and Scope Checklist

Run this checklist in Week 1, Day 1.

Snowflake access

  • Verify SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY is readable.
  • Verify SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY is readable.
  • Verify SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY is readable (Enterprise edition or higher).
  • Verify SNOWFLAKE.ACCOUNT_USAGE.LOGIN_HISTORY is readable.
  • Verify SNOWFLAKE.ACCOUNT_USAGE.USERS is readable.
  • Verify SNOWFLAKE.ACCOUNT_USAGE.ROLES is readable.
  • Verify SNOWFLAKE.ACCOUNT_USAGE.GRANTS_TO_USERS is readable.
  • Verify SNOWFLAKE.ACCOUNT_USAGE.GRANTS_TO_ROLES is readable.
  • Confirm retention period for QUERY_HISTORY (365 days standard).
  • Note any PII/governance restrictions on sharing query text.
  • Identify BI service account names and warehouse naming conventions.
  • Confirm whether query tags are used by BI tools.

Power BI access

  • Power BI Admin Activity Events accessible via REST API.
  • Workspaces, reports, datasets listable.
  • Dataset refresh history available.
  • Gateway/data source metadata available.
  • Fabric/Scanner API available (if on Fabric).
  • XMLA read access confirmed (optional, useful for semantic models).
  • Service Principal token created and tested.

Tableau access

  • Tableau Server repository/PostgreSQL accessible (if Server).
  • Tableau REST API token created and tested.
  • Tableau Metadata API available.
  • Workbook, view, datasource listable.
  • Extract refresh history readable.
  • Background job logs accessible.

Human/contextual inputs

  • Schedule stakeholder interviews (BI lead, data platform lead, business users).
  • Gather list of known executive dashboards.
  • Gather list of known operational/manual reporting pain points.
  • Confirm any regulatory/audit constraints on reporting changes.

Output

  • Access blockers documented.
  • Data retention and governance constraints noted.
  • Scope confirmed.
  • Rough asset count estimate obtained.

Phase 2: Snowflake Extraction SQL

Run these queries in a Snowflake worksheet, Python connector, or dbt model. Save outputs to CSV or parquet for downstream analysis.

2.1 BI-driven query volume and cost

-- Query volume, cost, and performance by user/warehouse/asset patterns
-- Last 90 days, filter to BI-related users if known

SELECT
    DATE_TRUNC('day', start_time) AS query_day,
    user_name,
    role_name,
    warehouse_name,
    database_name,
    schema_name,
    query_tag,
    COUNT(*) AS query_count,
    SUM(total_elapsed_time) AS total_elapsed_ms,
    SUM(credits_used_cloud_services) AS credits_cloud,
    AVG(total_elapsed_time) AS avg_elapsed_ms,
    AVG(bytes_scanned) AS avg_bytes_scanned,
    MAX(bytes_scanned) AS max_bytes_scanned,
    SUM(bytes_scanned) AS total_bytes_scanned,
    SUM(partitions_scanned) AS partitions_scanned
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time >= DATEADD(day, -90, CURRENT_DATE())
  AND warehouse_name NOT IN ('LOAD_WH', 'DBT_WH') -- adjust for your non-BI warehouses
  AND user_name NOT IN ('SYSADMIN', 'LOADER') -- adjust for your non-BI users
GROUP BY 1, 2, 3, 4, 5, 6, 7
ORDER BY total_elapsed_ms DESC;

2.2 Top BI queries by cost

-- Top individual queries by cost (useful for identifying expensive queries from BI tools)
SELECT
    query_id,
    start_time,
    user_name,
    warehouse_name,
    database_name,
    schema_name,
    query_tag,
    LEFT(query_text, 2000) AS query_text_snippet,
    total_elapsed_time,
    bytes_scanned,
    credits_used_cloud_services,
    tables_modified
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time >= DATEADD(day, -90, CURRENT_DATE())
ORDER BY total_elapsed_time DESC
LIMIT 500;

2.3 Object access heatmap

-- Which database objects are accessed most by BI users/warehouses?
SELECT
    database_name,
    schema_name,
    table_name,
    COUNT(DISTINCT query_id) AS distinct_queries,
    COUNT(DISTINCT user_name) AS distinct_users,
    SUM(total_elapsed_time) AS total_elapsed_ms,
    SUM(bytes_scanned) AS total_bytes_scanned
FROM SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY
WHERE start_time >= DATEADD(day, -90, CURRENT_DATE())
  AND query_start_time IS NOT NULL
GROUP BY 1, 2, 3
ORDER BY distinct_queries DESC
LIMIT 500;

2.4 Warehouse metering by day and user

-- Daily warehouse cost by user — helps spot expensive BI users
SELECT
    DATE_TRUNC('day', start_time) AS usage_day,
    warehouse_name,
    user_name,
    SUM(credits_used_compute) AS compute_credits_used,
    SUM(credits_used_cloud_services) AS cloud_services_credits_used
FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY
WHERE start_time >= DATEADD(day, -90, CURRENT_DATE())
GROUP BY 1, 2, 3
ORDER BY compute_credits_used DESC;

2.5 BI service account query patterns

-- If you know service account names, profile their query behaviour
SELECT
    user_name,
    DATE_TRUNC('week', start_time) AS query_week,
    warehouse_name,
    query_tag,
    COUNT(*) AS query_count,
    SUM(total_elapsed_time) AS total_elapsed_ms,
    MEDIAN(total_elapsed_time) AS median_elapsed_ms,
    AVG(bytes_scanned) AS avg_bytes_scanned,
    APPROX_COUNT_DISTINCT(query_id) AS approx_unique_queries
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time >= DATEADD(day, -90, CURRENT_DATE())
  AND user_name IN (
      'TABLEAU_SERVICE',      -- adjust
      'POWERBI_SERVICE',      -- adjust
      'LOOKER_SERVICE'        -- adjust
  )
GROUP BY 1, 2, 3, 4
ORDER BY total_elapsed_ms DESC;

2.6 Query text fingerprinting for BI patterns

-- Heuristic: group queries by similar patterns to identify repeated BI query shapes
-- Snowflake does not natively support EXPLAIN PLAN storage, so use LEFT() + REGEXP

SELECT
    APPROX_COUNT_DISTINCT(query_id) AS approx_query_count,
    APPROX_COUNT_DISTINCT(user_name) AS approx_distinct_users,
    SUM(total_elapsed_time) AS total_elapsed_ms,
    LEFT(query_text, 300) AS query_signature,
    query_tag
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time >= DATEADD(day, -90, CURRENT_DATE())
  AND query_text ILIKE '%SELECT%'
GROUP BY LEFT(query_text, 300), query_tag
HAVING approx_query_count > 10
ORDER BY total_elapsed_ms DESC
LIMIT 200;

2.7 User and role mappings

-- Active users and roles, helpful for identifying BI tool users
SELECT
    u.name AS user_name,
    u.email,
    u.created_on,
    u.last_success_login,
    ARRAY_AGG(DISTINCT g.role) AS roles
FROM SNOWFLAKE.ACCOUNT_USAGE.USERS u
LEFT JOIN SNOWFLAKE.ACCOUNT_USAGE.GRANTS_TO_USERS g
    ON u.name = g.grantee_name
WHERE u.deleted_on IS NULL
GROUP BY 1, 2, 3, 4
ORDER BY u.last_success_login DESC NULLS LAST;

2.8 Dynamic tables and materialized views

-- Which BI-queried objects are already pre-aggregated?
-- Useful for recommending which dashboards could be served faster

SELECT
    database_name,
    schema_name,
    name AS object_name,
    'DYNAMIC_TABLE' AS object_type,
    scheduling_state,
    last_suspended_on,
    bytes,
    rows
FROM SNOWFLAKE.ACCOUNT_USAGE.DYNAMIC_TABLES
UNION ALL
SELECT
    database_name,
    schema_name,
    name,
    'MATERIALIZED_VIEW',
    NULL,
    NULL,
    bytes,
    rows
FROM SNOWFLAKE.ACCOUNT_USAGE.MATERIALIZED_VIEWS
ORDER BY bytes DESC;

2.9 Extract outputs

Export all of the above to CSV or write them into a single analysis schema using CTAS for faster downstream joins.


Phase 3: Power BI Metadata Extraction Checklist

3.1 Activity events

Use the Power BI Admin REST API:

GET https://api.powerbi.com/v1.0/myorg/admin/activityevents
    ?startDateTime={start}
    &endDateTime={end}

Requires: Power BI Admin or delegated audit permissions.

Fields to capture for scoring:

  • Id — event ID.
  • CreationTime — event time.
  • Operation — e.g., ViewReport, ExportReport, RefreshDataset.
  • Workload — e.g., PowerBI, ASAzure.
  • UserId — user email.
  • WorkspaceName, WorkspaceId.
  • ReportName, ReportId, ObjectId.
  • DatasetName, DatasetId.
  • IsSuccess.
  • DataflowName (if applicable).

Key operations to count:

Operation Scoring signal
ViewReport Usage value
ExportReport Usage value / manual workaround signal
RefreshDataset Cost/performance pain
CreateReport Sprawl indicator
DeleteReport Churn
UpdateApp Governance signal
GenerateDataInsights Advanced usage

3.2 Reports, datasets, and workspaces

GET https://api.powerbi.com/v1.0/myorg/admin/reports
GET https://api.powerbi.com/v1.0/myorg/admin/datasets
GET https://api.powerbi.com/v1.0/myorg/groups  -- workspaces

Fields to capture:

  • name, id, workspaceId, datasetId, webUrl.
  • createdDateTime, modifiedDateTime.
  • reportType, endorsementDetails.

3.3 Refresh history

GET https://api.powerbi.com/v1.0/myorg/datasets/{datasetId}/refreshes

Capture:

  • startTime, endTime, status, refreshType, serviceExceptionJson.
  • Calculate: success rate, avg duration, failure reasons.

3.4 Dataset data sources (gateway mapping)

GET https://api.powerbi.com/v1.0/myorg/datasets/{datasetId}/datasources

Capture:

  • datasourceType — confirm Snowflake datasets.
  • connectionDetails — server/database granularity.
  • gatewayId — on-prem vs cloud gateway.

3.5 Scanner API / Fabric metadata (if available)

The Scanner API is the most powerful but requires Fabric capacity:

POST https://api.powerbi.com/v1.0/myorg/admin/workspaces/getInfo

Returns:

  • Reports, datasets, dashboards, dataflows, apps.
  • Dataset schema, measures, columns, relationships (TOM-level).
  • Lineage info.

If you have Scanner API access:

  • Extract measure definitions (DAX) for semantic complexity scoring.
  • Extract relationships and cardinality.
  • Identify duplicate measures across datasets.

3.6 XMLA endpoint (semantic model deep-dive)

If available, connect to the workspace XMLA endpoint:

Server: powerbi://api.powerbi.com/v1.0/myorg/{workspace-name}

Query via DAX/SSAS tools to extract:

  • Measures and expressions.
  • Calculated columns.
  • Relationships and key columns.
  • Row-level security roles.

3.7 Manual inventory fallback

If API access is blocked or limited, the fallback is manual inventory:

  • Export workspace report list from Power BI Service UI.
  • Survey known report owners for usage/criticality.
  • Map known Snowflake service accounts to known datasets.

Phase 4: Tableau Metadata Extraction Checklist

4.1 Tableau Server Repository (PostgreSQL)

If you have Tableau Server and repository access, this is the richest source.

Connection details (from Tableau Server admin):

  • Host: Tableau Server node running the repository.
  • Port: 8060 (default for readonly account).
  • Database: workgroup.
  • User: readonly (or custom).

Key tables/views:

Table/View Purpose Fields to extract
_views Dashboard usage name, id, workbook_id, site_id, created_at, updated_at
_workbooks Workbook inventory name, id, project_id, owner_id, created_at, updated_at, size
_datasources Datasource inventory name, id, type, created_at, owner_id, extract_refreshed_at
_users User mapping name, id, site_id, email, last_login
views_stats View event history view_id, user_id, timestamp, nviews
hist_views Historical view counts view_id, timestamp, nviews
_sessions User sessions user_id, session_id, created_at, updated_at
background_jobs Extract/refresh jobs target_id, target_type, created_at, completed_at, status, notes
data_connections Connection info datasource_id, server, port, dbclass, dbname

Key query patterns:

-- Most viewed workbooks
SELECT
    w.name AS workbook_name,
    COUNT(v.id) AS view_count,
    MAX(vs.timestamp) AS last_viewed_at
FROM _workbooks w
JOIN _views v ON w.id = v.workbook_id
LEFT JOIN views_stats vs ON v.id = vs.view_id
WHERE vs.timestamp >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY w.name
ORDER BY view_count DESC;

-- Extract refresh performance
SELECT
    d.name AS datasource_name,
    COUNT(bj.target_id) AS refresh_count,
    AVG(EXTRACT(EPOCH FROM (bj.completed_at - bj.created_at))) AS avg_refresh_seconds,
    SUM(CASE WHEN bj.status = 0 THEN 1 ELSE 0 END) AS failure_count
FROM background_jobs bj
JOIN _datasources d ON bj.target_id = d.id
WHERE bj.created_at >= CURRENT_DATE - INTERVAL '90 days'
  AND bj.target_type = 'Datasource'
GROUP BY d.name
ORDER BY refresh_count DESC;

-- Snowflake-connected datasources
SELECT
    d.name AS datasource_name,
    w.name AS workbook_name,
    dc.server,
    dc.dbname
FROM _datasources d
JOIN data_connections dc ON d.id = dc.datasource_id
LEFT JOIN _workbooks w ON d.id = w.id -- adjust join logic
WHERE dc.dbclass = 'snowflake';

4.2 Tableau REST API

Tableau Cloud or Server without repository access:

Base URL:

  • Server: https://{server}/api/{version}/
  • Cloud: https://{pod}.online.tableau.com/api/{version}/

Endpoints to hit:

Endpoint Purpose
GET /api/{version}/sites/{siteId}/workbooks List workbooks
GET /api/{version}/sites/{siteId}/views List views
GET /api/{version}/sites/{siteId}/datasources List datasources
GET /api/{version}/sites/{siteId}/workbooks/{id} Workbook details
GET /api/{version}/sites/{siteId}/views/{id}/usageStatistics View usage counts
GET /api/{version}/sites/{siteId}/datasources/{id}/connections Datasource connections
GET /api/{version}/sites/{siteId}/tasks/extractRefreshes Extract refresh schedules
GET /api/{version}/sites/{siteId}/tasks/extractRefreshes/{taskId} Extract refresh details

Authentication:

  • Personal Access Token (PAT): preferred for API automation.
  • Username/password: fallback if PAT is blocked.

Key fields to extract:

  • Workbook/View/Datasource: name, id, createdAt, updatedAt, owner, project.
  • Usage: totalViewCount, totalViewCount (snapshot — not history).
  • Connections: serverAddress, serverPort, userName, dbclass.

4.3 Tableau Metadata API (GraphQL)

More powerful than REST for lineage and field-level detail:

query workbooks {
  workbooks {
    id
    name
    projectName
    owner {
      username
      email
    }
    sheets {
      id
      name
    }
    upstreamDatabases {
      id
      name
      connectionType
    }
    embeddedExtracts {
      id
      name
    }
  }
}

Useful for:

  • Field-level lineage.
  • Column references.
  • Snowflake database connections.
  • Embedded vs live connections.
  • Extract dependencies.

Phase 5: Scoring and Classification

5.1 Scoring rubric

Use a 0–3 scale for each dimension. Do not overthink. The goal is ranking, not perfect precision.

Usage Value (U)

Score Criteria
0 No views in 90 days; no owner confirmation; stale metadata.
1 < 5 unique users; viewed < biweekly; no subscriptions.
2 5–20 unique users; viewed weekly; some subscriptions/favourites.
3 > 20 unique users OR executive/leadership use OR embedded in recurring meetings OR heavily subscribed.

Cost / Performance Pain (C)

Score Criteria
0 < 1 compute credit per week; scan < 1 GB; refresh < 1 min.
1 1–5 credits/week; scan 1–50 GB; refresh 1–5 min.
2 5–20 credits/week; scan 50–500 GB; refresh 5–15 min; occasional timeouts.
3 > 20 credits/week OR scan > 500 GB OR refresh > 15 min OR frequent failures OR causes warehouse scaling.

Semantic Complexity (S)

Score Criteria
0 Basic table/chart with filters; no calculated fields.
1 Some calculated columns/fields; basic aggregations.
2 Multiple measures; some DAX/LOD calcs; relationships; role-level security.
3 Complex DAX/Tableau LOD; semi-additive measures; many-to-many relationships; intricate row-level security.

Data Product Suitability (D)

Score Criteria
0 Purely visual/exploratory; no parameter input; no action needed.
1 Some parameters or basic drilling; occasional export for further work.
2 Regular parameter use; users export and manipulate; what-if questions; some writeback need.
3 Operational workflow in disguise; approval chains; scenario modelling; data quality review; writeback desirable; users asking for UI actions.

Migration / Change Risk (R)

Score Criteria
0 Internal team tool; owner confirmed; low visibility; simple logic.
1 Team tool with some exec interest; owner exists but busy.
2 Executive-facing; metrics used in financial reporting; owner unclear; heavy reliance on visual layout.
3 Board/audit/regulatory; fragile undocumented logic; no owner; multiple stakeholder dependencies.

5.2 Classification rules

After scoring, classify each asset using this flow. Do not be rigid — override with expert judgment.

IF R == 3 AND (U >= 2 OR C >= 2):
    → FLAG: High governance risk — do not touch without formal review
    ELSE IF R == 3:
    → CLASSIFY: Keep in BI (governance)

ELSE IF S >= 2 AND D <= 1:
    → Consider: Optimise in place OR Consolidate
    (semantic richness lives in BI; modernise the backend)

ELSE IF U == 0 AND C <= 1:
    → CLASSIFY: Retire / Archive

ELSE IF U <= 1 AND C >= 2:
    → CLASSIFY: Retire / Archive (unused but expensive)

ELSE IF D >= 2 AND S >= 2:
    → CLASSIFY: Snowflake-native data product candidate
    (high complexity + workflow potential = prime candidate)

ELSE IF D >= 2 AND S <= 1:
    → CLASSIFY: Snowflake-native data product candidate
    (simple logic but workflow fit = quick win)

ELSE IF S >= 2 AND D <= 1 AND U >= 2:
    → CLASSIFY: Cortex / Semantic candidate
    (governed metrics, repeated questions)

ELSE IF C >= 2 AND U >= 2:
    → CLASSIFY: Optimise in place
    (keep BI surface, fix backend/cost)

ELSE IF U >= 2 AND C <= 1 AND S <= 1 AND D <= 1:
    → CLASSIFY: Keep in BI
    (low cost, reasonable usage, simple BI fit)

ELSE IF duplicate metric/dataset confirmed:
    → CLASSIFY: Consolidate

ELSE:
    → FLAG: Human review needed

5.3 Composite priority score

For internal sorting, use a weighted composite:

priority_score =
    (U * 2.0) +
    (C * 1.5) +
    (D * 1.5) -
    (R * 1.0) -
    (S * 0.5)

Higher score = bigger opportunity to intervene.

Rationale:

  • U matters most: intervention on unused assets is waste.
  • C and D signal value potential.
  • R penalises risky assets.
  • S slightly penalises because complex BI semantic models are expensive to migrate.

Sort descending. The top 20–50 assets are your highest-leverage recommendations.


Phase 6: Executive Readout Template

6.1 Executive Summary (1 page)

Title: BI Estate Modernisation Assessment — [Client Name]

One-paragraph summary:

We analysed [X] BI assets across [Power BI / Tableau] and [Y] days of Snowflake query telemetry. We identified [N] high-priority opportunities spanning cost reduction, consolidation, and Snowflake-native data product creation. We recommend a 4-week pilot targeting 3 high-signal assets.

Key numbers:

Metric Value
Total BI assets in scope [X]
Total estimated weekly Snowflake BI cost [$ or credits]
Assets with zero usage in 90 days [N]
Assets flagged for retirement [N]
Assets flagged for optimisation [N]
Assets flagged for data product migration [N]
Assets flagged for Cortex/semantic layer [N]
Top 3 cost drivers [Names]
Top 3 usage drivers [Names]
Recommended pilot count 3

6.2 Cost/Usage Leaderboard (2–3 pages)

Include a table or chart of:

  • Top 20 assets by estimated cost (Snowflake credits).
  • Top 20 assets by usage events.
  • Bottom 20 assets by usage (retirement candidates).
  • Scatter: Usage Value vs Cost Pain (quadrants).

6.3 Recommendation Pages (1 page per bucket)

For each classification bucket, include:

  • Keep in BI — count + rationale + examples.
  • Optimise in place — count + examples + actions.
  • Retire / Archive — count + risk + suggested timeline.
  • Consolidate — duplicate clusters identified.
  • Snowflake-native data product candidates — count + why + suggested surface (Streamlit, Native App, embedded app).
  • Cortex / Semantic candidates — count + suggested semantic layer approach.

Pick 3 assets:

  1. Optimise in place — high cost, high usage, simple fix.
  2. Retire/Consolidate — unused or duplicated, low risk, quick win.
  3. Snowflake-native data product — strong workflow-fit, moderate risk, moderate complexity.

For each:

  • Asset name.
  • Current pain.
  • Proposed action.
  • Expected outcome.
  • Estimated effort.
  • Owner.
  • Risk.

6.5 Business Case

  • Current annual estimated BI-driven Snowflake cost.
  • Estimated savings from retirement + consolidation.
  • Estimated cost reduction from optimisation.
  • Opportunity value from faster/more reliable analytics.
  • Estimated implementation cost (if proceeding).

6.6 Risks and Next Steps

  • Governance/audit risk.
  • Ownership gaps.
  • Technical dependency risk.
  • Recommended next actions with owners and dates.

Phase 7: Pilot Selection Framework

7.1 Selection criteria

A good pilot asset should score well on:

  1. Speed to proof — Can we show a result in 2–4 weeks?
  2. Signal clarity — Is the before/after unambiguous?
  3. Stakeholder interest — Does someone important care?
  4. Low blast radius — If we break something, is it recoverable?
  5. Low dependency blockers — Can we access the data/logic without months of governance?

Avoid for pilot:

  • Anything with R == 3.
  • Anything used in board/audit reporting.
  • Anything with > 10 downstream dependencies.
  • Anything where the owner is unavailable.

7.2 Pilot mix

Every diagnostic should propose at least one pilot from three categories:

Category Why it matters Example
Optimise in place Proves backend modernisation is faster and cheaper than migration Rewrite expensive query → dynamic table; show cost reduction
Retire / Consolidate Proves governance reduces sprawl Remove 5 unused reports; merge 3 duplicate dashboards; show cleaner estate
Data product candidate Proves Snowflake-native UX for workflows Build Streamlit workflow replacing a manual export/approval loop

7.3 Pilot charter template

## Pilot: [Asset Name]

### Category
[Optimise / Retire / Data Product]

### Current state
- Tool: [Power BI / Tableau]
- Cost per week: [X credits / $]
- Users: [N]
- Pain: [description]

### Desired state
[What changes in 4 weeks]

### Success criteria
- [ ] Measurable cost or performance improvement.
- [ ] Stakeholder confirms the new state is acceptable.
- [ ] No regression in critical downstream workflows.

### Scope
- In: [list]
- Out: [list]

### Effort estimate
- Data/analytics engineering: [X] days
- BI/platform support: [Y] days
- Business stakeholder time: [Z] hours

### Owner
- Data/Analytics: [Name]
- BI/Platform: [Name]
- Business: [Name]

### Risk and rollback
- Risk: [describe]
- Rollback plan: [describe]

### Timeline
- Week 1: [activity]
- Week 2: [activity]
- Week 3: [activity]
- Week 4: [review and demo]

Appendix A: Data Model Quick Reference

This is the lightweight normalised model you should build from extracted data.

bi_assets

Column Type Notes
asset_id string UUID or deterministic platform ID
platform enum powerbi, tableau, qlik, looker, ssrs, excel, other
asset_type enum report, dashboard, workbook, view, dataset, datasource, semantic_model
asset_name string Human-readable name
workspace_or_project string Workspace / project / folder
owner string Owner email or name
created_at timestamp
updated_at timestamp
last_viewed_at timestamp
url string Deep link if available
is_certified boolean Promoted/certified in platform
snowflake_connected boolean Confirmed Snowflake datasource
snowflake_database string If known
snowflake_schema string If known

bi_usage_events

Column Type Notes
event_id string UUID
asset_id FK
user_id string Email or ID
event_type enum view, export, refresh, edit, share, subscribe
event_timestamp timestamp
session_id string Optional

bi_refresh_events

Column Type Notes
refresh_id string
asset_id FK
start_time timestamp
end_time timestamp
status enum success, failure, cancelled
duration_seconds float
failure_reason string

snowflake_queries

Column Type Notes
query_id string Snowflake query ID
query_text text Truncated at 2000 chars
user_name string
role_name string
warehouse_name string
start_time timestamp
end_time timestamp
execution_time_ms int
bytes_scanned int
credits_estimated float
database_name string
schema_name string
query_tag string

asset_query_mapping

Column Type Notes
asset_id FK
query_id FK Snowflake query
mapping_method enum query_tag, service_account, warehouse, sql_fingerprint, datasource_metadata, manual
confidence_score float 0.0–1.0

asset_scores

Column Type Notes
asset_id FK
usage_value_score int 0–3
cost_pain_score int 0–3
semantic_complexity_score int 0–3
data_product_suitability_score int 0–3
migration_risk_score int 0–3
composite_priority_score float Calculated
recommendation_bucket enum keep, optimise, retire, consolidate, data_product, cortex, review
rationale text
reviewed_by string Human reviewer
final_override enum Optional expert override

Appendix B: Common Pitfalls and Gotchas

Pitfall 1: Query text is truncated or redacted

Some organisations redact or truncate QUERY_TEXT for PII. If so:

  • Rely more on query tags, warehouse names, and service account patterns.
  • Use OBJECTS_MODIFIED or QUERY_HISTORY.QUERY_TEXT with QUERY_TAG.
  • Consider requesting an exemption or PII-safe export with query hashes instead of text.

Pitfall 2: BI service accounts map many assets to one user

If Power BI/Tableau uses one Snowflake service account for all queries, query-to-asset mapping is hard.

Mitigation:

  • Encourage the client to use per-workbook query tags.
  • Map by warehouse (if different BI assets use different warehouses).
  • Map by SQL fingerprint (query text patterns).
  • Map by scheduled refresh times.
  • Accept lower confidence and flag for manual review.

Pitfall 3: Power BI/Tableau activity events are 30-day rolling

Power BI Admin Activity Events and some Tableau logs have rolling retention.

Mitigation:

  • Collect continuously during the engagement.
  • Request a one-off historical export if available.
  • Use what you have; 30 days is usually enough to tell used vs unused.

Pitfall 4: Semantic complexity is hidden in DAX/Tableau LOD

You may not have access to all DAX or LOD expressions.

Mitigation:

  • Use measure count, relationship count, and dataset size as proxies.
  • Flag datasets with many measures or calculated columns for manual review.
  • If Scanner API or XMLA is available, use it to extract semantic depth.

Pitfall 5: Executives overstate importance of unused dashboards

A dashboard used in one annual board meeting may score zero on telemetry but be politically critical.

Mitigation:

  • Always combine telemetry with stakeholder interviews.
  • Use the R (risk) dimension as a safety valve.
  • Never auto-retire without owner and stakeholder confirmation.
  • Flag low-usage, high-risk assets separately.

Pitfall 6: Snowflake costs are not perfectly attributable

Warehouse costs include all queries, not just BI.

Mitigation:

  • Use query tags, user names, and warehouse patterns to estimate BI-attributable cost.
  • Be explicit about assumptions in the executive readout.
  • Focus on directional opportunity, not exact accounting.

Pitfall 7: The client wants Streamlit for everything

If the client is excited about Streamlit, they may want you to migrate everything.

Mitigation:

  • Be firm: Streamlit is a data product surface, not a BI replacement.
  • Use the scoring rubric to deflect unsuitable candidates.
  • Show them a candidate that is genuinely a workflow, not a dashboard.

Resource Purpose
SNOWFLAKE.ACCOUNT_USAGE docs Reference for all Snowflake usage views
Power BI REST API docs Admin, datasets, reports, refreshes, activity events
Power BI Scanner API docs Dataset-level metadata, measures, lineage
Tableau Server Repository docs PostgreSQL schema reference
Tableau REST API docs Workbook, view, datasource, task API
Tableau Metadata API docs GraphQL for lineage and field-level detail
dbt docs For modelling recommendations
Snowflake Cortex Analyst docs For Cortex/semantic layer recommendations
Snowflake Streamlit docs For data product prototyping
Snowflake Dynamic Tables docs For backend optimisation

Appendix D: Week-by-Week Example Timeline

Week 1: Kickoff + Access + Discovery

  • Day 1: Kickoff, access checklist, pre-engagement questionnaire review.
  • Day 2–3: Resolve access blockers, get credentials.
  • Day 4–5: Snowflake extraction (initial queries), stakeholder interviews.

Week 2: Extraction + Inventory

  • Day 1–2: Power BI / Tableau metadata extraction.
  • Day 3–4: Build telemetry model; heuristically map BI assets to Snowflake queries.
  • Day 5: Run scoring model; draft initial rankings.

Week 3: Scoring + Review

  • Day 1–2: Expert review with BI/data platform/business stakeholders.
  • Day 3: Apply overrides; finalise recommendations.
  • Day 4–5: Build executive readout; draft pilot charters.

Week 4: Readout + Pilot Planning

  • Day 1: Internal review of readout.
  • Day 2: Present to leadership.
  • Day 3–4: Confirm pilot scope and owners.
  • Day 5: Final deliverable; next-phase planning.

Checklist: Before You Finish

  • All Snowflake queries run successfully.
  • BI metadata extracted or documented fallback plan.
  • Scoring model applied to all in-scope assets.
  • At least one asset in each recommendation bucket.
  • Top 20 candidates reviewed with at least one stakeholder.
  • 3 pilot candidates selected with charter templates.
  • Executive readout reviewed internally.
  • Readout presented to leadership.
  • Access credentials revoked/handled per client policy.
  • Lessons learned captured.