Snowflake BI Estate Diagnostic — Operating Playbook¶
Companion to: [[Snowflake BI Estate Diagnostic PRD]]
This playbook is a practical field guide for delivering the diagnostic on a real engagement. It contains templates, SQL, checklists, and step-by-step instructions.
How to use this playbook¶
- Copy the relevant sections into your client project workspace.
- Run through phases in order. Do not skip Phase 1 (access) — incomplete access wrecks every later phase.
- Adapt the SQL and templates to the client’s Snowflake edition, BI platform version, and governance posture.
- The scoring is designed to be opinionated but overridable. Use your judgment on edge cases.
- This playbook assumes a 2–4 week timeline. If the client wants a two-day version, reduce to Snowflake-only analysis, skip BI metadata deep-dives, and focus on the cost/usage leaderboard.
Phase 0: Pre-Engagement Discovery Questionnaire¶
Send this to the client before Week 1 kickoff. The goal is to surface scope, pain, and access blockers early.
Part A: Context (5–10 minutes)¶
- Which BI platforms are you actively using? (Check all that apply.)
- Power BI (Service / Report Server / Fabric)
- Tableau Server / Tableau Cloud
- Qlik Sense / Qlik Cloud
- Looker / Looker Studio
- SSRS / Paginated Reports
- Excel / Google Sheets exports as primary reporting
-
Other: __
-
How long have you been on Snowflake?
- < 6 months
- 6–18 months
- 18–36 months
-
3+ years
-
Do you have a current Snowflake cost or performance concern?
- Yes — we know BI workloads drive significant cost.
- Yes — but we don’t know which workloads.
- Somewhat.
-
No.
-
Have you explored Streamlit in Snowflake, Cortex Analyst, or Snowflake-native data products?
- Yes — actively building / evaluating.
- Yes — explored but paused.
- Heard of it, never evaluated.
-
No.
-
What is your primary analytics modernisation pain?
- Dashboard/report sprawl — too many reports, unclear value.
- Expensive Snowflake queries from BI tools.
- Duplicated metrics across tools.
- Slow dashboard/extract refresh times.
- Lack of trust in reported numbers.
- Want AI/natural-language analytics but data is messy.
- Other: __
Part B: Scope and Access (5–10 minutes)¶
- Which Snowflake account(s) and region(s) should be in scope?
- Account/Region: ______
-
Optional secondary: ______
-
Can you provide a user with the following Snowflake roles?
-
ACCOUNTADMINorORGADMIN -
SECURITYADMIN(for user/role metadata) - Read-only role on
SNOWFLAKEdatabaseACCOUNT_USAGEschemas -
We need to confirm — which IT/data platform contact can approve? : __
-
For Power BI, which of the following can you provide?
- Power BI Admin portal access.
- Microsoft Entra (Azure AD) tenant-level admin.
- Power BI Service Principal app registration (for API access).
- Power BI Fabric/Scanner API access.
- We do not currently have admin-level access; owner is: __
-
Not in scope.
-
For Tableau, which of the following can you provide?
- Tableau Server admin (including PostgreSQL repository access).
- Tableau Cloud admin.
- Tableau REST API credentials/token.
- Tableau Metadata API access.
- We do not currently have admin-level access; owner is: __
-
Not in scope.
-
Are there any reports or dashboards the business considers untouchable?
- Please list critical/executive dashboards: __
-
Who should review recommendations before they are presented to leadership?
- BI lead / manager: __
- Data platform lead: __
- Analytics engineering lead: __
- CFO / Finance lead: __
- Other: __
Part C: Known Pain Signals (Optional — helps prioritise)¶
-
Name 3–5 dashboards or reports you suspect are expensive, unused, duplicated, or problematic:
-
-
Are there functional areas where BI usage is especially high or especially painful?
- E.g., Finance, Sales, Marketing, Operations, Product: __
Phase 1: Access and Scope Checklist¶
Run this checklist in Week 1, Day 1.
Snowflake access¶
- Verify
SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORYis readable. - Verify
SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORYis readable. - Verify
SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORYis readable (Enterprise edition or higher). - Verify
SNOWFLAKE.ACCOUNT_USAGE.LOGIN_HISTORYis readable. - Verify
SNOWFLAKE.ACCOUNT_USAGE.USERSis readable. - Verify
SNOWFLAKE.ACCOUNT_USAGE.ROLESis readable. - Verify
SNOWFLAKE.ACCOUNT_USAGE.GRANTS_TO_USERSis readable. - Verify
SNOWFLAKE.ACCOUNT_USAGE.GRANTS_TO_ROLESis readable. - Confirm retention period for
QUERY_HISTORY(365 days standard). - Note any PII/governance restrictions on sharing query text.
- Identify BI service account names and warehouse naming conventions.
- Confirm whether query tags are used by BI tools.
Power BI access¶
- Power BI Admin Activity Events accessible via REST API.
- Workspaces, reports, datasets listable.
- Dataset refresh history available.
- Gateway/data source metadata available.
- Fabric/Scanner API available (if on Fabric).
- XMLA read access confirmed (optional, useful for semantic models).
- Service Principal token created and tested.
Tableau access¶
- Tableau Server repository/PostgreSQL accessible (if Server).
- Tableau REST API token created and tested.
- Tableau Metadata API available.
- Workbook, view, datasource listable.
- Extract refresh history readable.
- Background job logs accessible.
Human/contextual inputs¶
- Schedule stakeholder interviews (BI lead, data platform lead, business users).
- Gather list of known executive dashboards.
- Gather list of known operational/manual reporting pain points.
- Confirm any regulatory/audit constraints on reporting changes.
Output¶
- Access blockers documented.
- Data retention and governance constraints noted.
- Scope confirmed.
- Rough asset count estimate obtained.
Phase 2: Snowflake Extraction SQL¶
Run these queries in a Snowflake worksheet, Python connector, or dbt model. Save outputs to CSV or parquet for downstream analysis.
2.1 BI-driven query volume and cost¶
-- Query volume, cost, and performance by user/warehouse/asset patterns
-- Last 90 days, filter to BI-related users if known
SELECT
DATE_TRUNC('day', start_time) AS query_day,
user_name,
role_name,
warehouse_name,
database_name,
schema_name,
query_tag,
COUNT(*) AS query_count,
SUM(total_elapsed_time) AS total_elapsed_ms,
SUM(credits_used_cloud_services) AS credits_cloud,
AVG(total_elapsed_time) AS avg_elapsed_ms,
AVG(bytes_scanned) AS avg_bytes_scanned,
MAX(bytes_scanned) AS max_bytes_scanned,
SUM(bytes_scanned) AS total_bytes_scanned,
SUM(partitions_scanned) AS partitions_scanned
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time >= DATEADD(day, -90, CURRENT_DATE())
AND warehouse_name NOT IN ('LOAD_WH', 'DBT_WH') -- adjust for your non-BI warehouses
AND user_name NOT IN ('SYSADMIN', 'LOADER') -- adjust for your non-BI users
GROUP BY 1, 2, 3, 4, 5, 6, 7
ORDER BY total_elapsed_ms DESC;
2.2 Top BI queries by cost¶
-- Top individual queries by cost (useful for identifying expensive queries from BI tools)
SELECT
query_id,
start_time,
user_name,
warehouse_name,
database_name,
schema_name,
query_tag,
LEFT(query_text, 2000) AS query_text_snippet,
total_elapsed_time,
bytes_scanned,
credits_used_cloud_services,
tables_modified
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time >= DATEADD(day, -90, CURRENT_DATE())
ORDER BY total_elapsed_time DESC
LIMIT 500;
2.3 Object access heatmap¶
-- Which database objects are accessed most by BI users/warehouses?
SELECT
database_name,
schema_name,
table_name,
COUNT(DISTINCT query_id) AS distinct_queries,
COUNT(DISTINCT user_name) AS distinct_users,
SUM(total_elapsed_time) AS total_elapsed_ms,
SUM(bytes_scanned) AS total_bytes_scanned
FROM SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY
WHERE start_time >= DATEADD(day, -90, CURRENT_DATE())
AND query_start_time IS NOT NULL
GROUP BY 1, 2, 3
ORDER BY distinct_queries DESC
LIMIT 500;
2.4 Warehouse metering by day and user¶
-- Daily warehouse cost by user — helps spot expensive BI users
SELECT
DATE_TRUNC('day', start_time) AS usage_day,
warehouse_name,
user_name,
SUM(credits_used_compute) AS compute_credits_used,
SUM(credits_used_cloud_services) AS cloud_services_credits_used
FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY
WHERE start_time >= DATEADD(day, -90, CURRENT_DATE())
GROUP BY 1, 2, 3
ORDER BY compute_credits_used DESC;
2.5 BI service account query patterns¶
-- If you know service account names, profile their query behaviour
SELECT
user_name,
DATE_TRUNC('week', start_time) AS query_week,
warehouse_name,
query_tag,
COUNT(*) AS query_count,
SUM(total_elapsed_time) AS total_elapsed_ms,
MEDIAN(total_elapsed_time) AS median_elapsed_ms,
AVG(bytes_scanned) AS avg_bytes_scanned,
APPROX_COUNT_DISTINCT(query_id) AS approx_unique_queries
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time >= DATEADD(day, -90, CURRENT_DATE())
AND user_name IN (
'TABLEAU_SERVICE', -- adjust
'POWERBI_SERVICE', -- adjust
'LOOKER_SERVICE' -- adjust
)
GROUP BY 1, 2, 3, 4
ORDER BY total_elapsed_ms DESC;
2.6 Query text fingerprinting for BI patterns¶
-- Heuristic: group queries by similar patterns to identify repeated BI query shapes
-- Snowflake does not natively support EXPLAIN PLAN storage, so use LEFT() + REGEXP
SELECT
APPROX_COUNT_DISTINCT(query_id) AS approx_query_count,
APPROX_COUNT_DISTINCT(user_name) AS approx_distinct_users,
SUM(total_elapsed_time) AS total_elapsed_ms,
LEFT(query_text, 300) AS query_signature,
query_tag
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time >= DATEADD(day, -90, CURRENT_DATE())
AND query_text ILIKE '%SELECT%'
GROUP BY LEFT(query_text, 300), query_tag
HAVING approx_query_count > 10
ORDER BY total_elapsed_ms DESC
LIMIT 200;
2.7 User and role mappings¶
-- Active users and roles, helpful for identifying BI tool users
SELECT
u.name AS user_name,
u.email,
u.created_on,
u.last_success_login,
ARRAY_AGG(DISTINCT g.role) AS roles
FROM SNOWFLAKE.ACCOUNT_USAGE.USERS u
LEFT JOIN SNOWFLAKE.ACCOUNT_USAGE.GRANTS_TO_USERS g
ON u.name = g.grantee_name
WHERE u.deleted_on IS NULL
GROUP BY 1, 2, 3, 4
ORDER BY u.last_success_login DESC NULLS LAST;
2.8 Dynamic tables and materialized views¶
-- Which BI-queried objects are already pre-aggregated?
-- Useful for recommending which dashboards could be served faster
SELECT
database_name,
schema_name,
name AS object_name,
'DYNAMIC_TABLE' AS object_type,
scheduling_state,
last_suspended_on,
bytes,
rows
FROM SNOWFLAKE.ACCOUNT_USAGE.DYNAMIC_TABLES
UNION ALL
SELECT
database_name,
schema_name,
name,
'MATERIALIZED_VIEW',
NULL,
NULL,
bytes,
rows
FROM SNOWFLAKE.ACCOUNT_USAGE.MATERIALIZED_VIEWS
ORDER BY bytes DESC;
2.9 Extract outputs¶
Export all of the above to CSV or write them into a single analysis schema using CTAS for faster downstream joins.
Phase 3: Power BI Metadata Extraction Checklist¶
3.1 Activity events¶
Use the Power BI Admin REST API:
GET https://api.powerbi.com/v1.0/myorg/admin/activityevents
?startDateTime={start}
&endDateTime={end}
Requires: Power BI Admin or delegated audit permissions.
Fields to capture for scoring:
Id— event ID.CreationTime— event time.Operation— e.g.,ViewReport,ExportReport,RefreshDataset.Workload— e.g.,PowerBI,ASAzure.UserId— user email.WorkspaceName,WorkspaceId.ReportName,ReportId,ObjectId.DatasetName,DatasetId.IsSuccess.DataflowName(if applicable).
Key operations to count:
| Operation | Scoring signal |
|---|---|
ViewReport |
Usage value |
ExportReport |
Usage value / manual workaround signal |
RefreshDataset |
Cost/performance pain |
CreateReport |
Sprawl indicator |
DeleteReport |
Churn |
UpdateApp |
Governance signal |
GenerateDataInsights |
Advanced usage |
3.2 Reports, datasets, and workspaces¶
GET https://api.powerbi.com/v1.0/myorg/admin/reports
GET https://api.powerbi.com/v1.0/myorg/admin/datasets
GET https://api.powerbi.com/v1.0/myorg/groups -- workspaces
Fields to capture:
name,id,workspaceId,datasetId,webUrl.createdDateTime,modifiedDateTime.reportType,endorsementDetails.
3.3 Refresh history¶
GET https://api.powerbi.com/v1.0/myorg/datasets/{datasetId}/refreshes
Capture:
startTime,endTime,status,refreshType,serviceExceptionJson.- Calculate: success rate, avg duration, failure reasons.
3.4 Dataset data sources (gateway mapping)¶
GET https://api.powerbi.com/v1.0/myorg/datasets/{datasetId}/datasources
Capture:
datasourceType— confirm Snowflake datasets.connectionDetails— server/database granularity.gatewayId— on-prem vs cloud gateway.
3.5 Scanner API / Fabric metadata (if available)¶
The Scanner API is the most powerful but requires Fabric capacity:
POST https://api.powerbi.com/v1.0/myorg/admin/workspaces/getInfo
Returns:
- Reports, datasets, dashboards, dataflows, apps.
- Dataset schema, measures, columns, relationships (TOM-level).
- Lineage info.
If you have Scanner API access:
- Extract measure definitions (DAX) for semantic complexity scoring.
- Extract relationships and cardinality.
- Identify duplicate measures across datasets.
3.6 XMLA endpoint (semantic model deep-dive)¶
If available, connect to the workspace XMLA endpoint:
Server: powerbi://api.powerbi.com/v1.0/myorg/{workspace-name}
Query via DAX/SSAS tools to extract:
- Measures and expressions.
- Calculated columns.
- Relationships and key columns.
- Row-level security roles.
3.7 Manual inventory fallback¶
If API access is blocked or limited, the fallback is manual inventory:
- Export workspace report list from Power BI Service UI.
- Survey known report owners for usage/criticality.
- Map known Snowflake service accounts to known datasets.
Phase 4: Tableau Metadata Extraction Checklist¶
4.1 Tableau Server Repository (PostgreSQL)¶
If you have Tableau Server and repository access, this is the richest source.
Connection details (from Tableau Server admin):
- Host: Tableau Server node running the repository.
- Port:
8060(default for readonly account). - Database:
workgroup. - User:
readonly(or custom).
Key tables/views:
| Table/View | Purpose | Fields to extract |
|---|---|---|
_views |
Dashboard usage | name, id, workbook_id, site_id, created_at, updated_at |
_workbooks |
Workbook inventory | name, id, project_id, owner_id, created_at, updated_at, size |
_datasources |
Datasource inventory | name, id, type, created_at, owner_id, extract_refreshed_at |
_users |
User mapping | name, id, site_id, email, last_login |
views_stats |
View event history | view_id, user_id, timestamp, nviews |
hist_views |
Historical view counts | view_id, timestamp, nviews |
_sessions |
User sessions | user_id, session_id, created_at, updated_at |
background_jobs |
Extract/refresh jobs | target_id, target_type, created_at, completed_at, status, notes |
data_connections |
Connection info | datasource_id, server, port, dbclass, dbname |
Key query patterns:
-- Most viewed workbooks
SELECT
w.name AS workbook_name,
COUNT(v.id) AS view_count,
MAX(vs.timestamp) AS last_viewed_at
FROM _workbooks w
JOIN _views v ON w.id = v.workbook_id
LEFT JOIN views_stats vs ON v.id = vs.view_id
WHERE vs.timestamp >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY w.name
ORDER BY view_count DESC;
-- Extract refresh performance
SELECT
d.name AS datasource_name,
COUNT(bj.target_id) AS refresh_count,
AVG(EXTRACT(EPOCH FROM (bj.completed_at - bj.created_at))) AS avg_refresh_seconds,
SUM(CASE WHEN bj.status = 0 THEN 1 ELSE 0 END) AS failure_count
FROM background_jobs bj
JOIN _datasources d ON bj.target_id = d.id
WHERE bj.created_at >= CURRENT_DATE - INTERVAL '90 days'
AND bj.target_type = 'Datasource'
GROUP BY d.name
ORDER BY refresh_count DESC;
-- Snowflake-connected datasources
SELECT
d.name AS datasource_name,
w.name AS workbook_name,
dc.server,
dc.dbname
FROM _datasources d
JOIN data_connections dc ON d.id = dc.datasource_id
LEFT JOIN _workbooks w ON d.id = w.id -- adjust join logic
WHERE dc.dbclass = 'snowflake';
4.2 Tableau REST API¶
Tableau Cloud or Server without repository access:
Base URL:
- Server:
https://{server}/api/{version}/ - Cloud:
https://{pod}.online.tableau.com/api/{version}/
Endpoints to hit:
| Endpoint | Purpose |
|---|---|
GET /api/{version}/sites/{siteId}/workbooks |
List workbooks |
GET /api/{version}/sites/{siteId}/views |
List views |
GET /api/{version}/sites/{siteId}/datasources |
List datasources |
GET /api/{version}/sites/{siteId}/workbooks/{id} |
Workbook details |
GET /api/{version}/sites/{siteId}/views/{id}/usageStatistics |
View usage counts |
GET /api/{version}/sites/{siteId}/datasources/{id}/connections |
Datasource connections |
GET /api/{version}/sites/{siteId}/tasks/extractRefreshes |
Extract refresh schedules |
GET /api/{version}/sites/{siteId}/tasks/extractRefreshes/{taskId} |
Extract refresh details |
Authentication:
- Personal Access Token (PAT): preferred for API automation.
- Username/password: fallback if PAT is blocked.
Key fields to extract:
- Workbook/View/Datasource:
name,id,createdAt,updatedAt,owner,project. - Usage:
totalViewCount,totalViewCount(snapshot — not history). - Connections:
serverAddress,serverPort,userName,dbclass.
4.3 Tableau Metadata API (GraphQL)¶
More powerful than REST for lineage and field-level detail:
query workbooks {
workbooks {
id
name
projectName
owner {
username
email
}
sheets {
id
name
}
upstreamDatabases {
id
name
connectionType
}
embeddedExtracts {
id
name
}
}
}
Useful for:
- Field-level lineage.
- Column references.
- Snowflake database connections.
- Embedded vs live connections.
- Extract dependencies.
Phase 5: Scoring and Classification¶
5.1 Scoring rubric¶
Use a 0–3 scale for each dimension. Do not overthink. The goal is ranking, not perfect precision.
Usage Value (U)¶
| Score | Criteria |
|---|---|
| 0 | No views in 90 days; no owner confirmation; stale metadata. |
| 1 | < 5 unique users; viewed < biweekly; no subscriptions. |
| 2 | 5–20 unique users; viewed weekly; some subscriptions/favourites. |
| 3 | > 20 unique users OR executive/leadership use OR embedded in recurring meetings OR heavily subscribed. |
Cost / Performance Pain (C)¶
| Score | Criteria |
|---|---|
| 0 | < 1 compute credit per week; scan < 1 GB; refresh < 1 min. |
| 1 | 1–5 credits/week; scan 1–50 GB; refresh 1–5 min. |
| 2 | 5–20 credits/week; scan 50–500 GB; refresh 5–15 min; occasional timeouts. |
| 3 | > 20 credits/week OR scan > 500 GB OR refresh > 15 min OR frequent failures OR causes warehouse scaling. |
Semantic Complexity (S)¶
| Score | Criteria |
|---|---|
| 0 | Basic table/chart with filters; no calculated fields. |
| 1 | Some calculated columns/fields; basic aggregations. |
| 2 | Multiple measures; some DAX/LOD calcs; relationships; role-level security. |
| 3 | Complex DAX/Tableau LOD; semi-additive measures; many-to-many relationships; intricate row-level security. |
Data Product Suitability (D)¶
| Score | Criteria |
|---|---|
| 0 | Purely visual/exploratory; no parameter input; no action needed. |
| 1 | Some parameters or basic drilling; occasional export for further work. |
| 2 | Regular parameter use; users export and manipulate; what-if questions; some writeback need. |
| 3 | Operational workflow in disguise; approval chains; scenario modelling; data quality review; writeback desirable; users asking for UI actions. |
Migration / Change Risk (R)¶
| Score | Criteria |
|---|---|
| 0 | Internal team tool; owner confirmed; low visibility; simple logic. |
| 1 | Team tool with some exec interest; owner exists but busy. |
| 2 | Executive-facing; metrics used in financial reporting; owner unclear; heavy reliance on visual layout. |
| 3 | Board/audit/regulatory; fragile undocumented logic; no owner; multiple stakeholder dependencies. |
5.2 Classification rules¶
After scoring, classify each asset using this flow. Do not be rigid — override with expert judgment.
IF R == 3 AND (U >= 2 OR C >= 2):
→ FLAG: High governance risk — do not touch without formal review
ELSE IF R == 3:
→ CLASSIFY: Keep in BI (governance)
ELSE IF S >= 2 AND D <= 1:
→ Consider: Optimise in place OR Consolidate
(semantic richness lives in BI; modernise the backend)
ELSE IF U == 0 AND C <= 1:
→ CLASSIFY: Retire / Archive
ELSE IF U <= 1 AND C >= 2:
→ CLASSIFY: Retire / Archive (unused but expensive)
ELSE IF D >= 2 AND S >= 2:
→ CLASSIFY: Snowflake-native data product candidate
(high complexity + workflow potential = prime candidate)
ELSE IF D >= 2 AND S <= 1:
→ CLASSIFY: Snowflake-native data product candidate
(simple logic but workflow fit = quick win)
ELSE IF S >= 2 AND D <= 1 AND U >= 2:
→ CLASSIFY: Cortex / Semantic candidate
(governed metrics, repeated questions)
ELSE IF C >= 2 AND U >= 2:
→ CLASSIFY: Optimise in place
(keep BI surface, fix backend/cost)
ELSE IF U >= 2 AND C <= 1 AND S <= 1 AND D <= 1:
→ CLASSIFY: Keep in BI
(low cost, reasonable usage, simple BI fit)
ELSE IF duplicate metric/dataset confirmed:
→ CLASSIFY: Consolidate
ELSE:
→ FLAG: Human review needed
5.3 Composite priority score¶
For internal sorting, use a weighted composite:
priority_score =
(U * 2.0) +
(C * 1.5) +
(D * 1.5) -
(R * 1.0) -
(S * 0.5)
Higher score = bigger opportunity to intervene.
Rationale:
Umatters most: intervention on unused assets is waste.CandDsignal value potential.Rpenalises risky assets.Sslightly penalises because complex BI semantic models are expensive to migrate.
Sort descending. The top 20–50 assets are your highest-leverage recommendations.
Phase 6: Executive Readout Template¶
6.1 Executive Summary (1 page)¶
Title: BI Estate Modernisation Assessment — [Client Name]
One-paragraph summary:
We analysed [X] BI assets across [Power BI / Tableau] and [Y] days of Snowflake query telemetry. We identified [N] high-priority opportunities spanning cost reduction, consolidation, and Snowflake-native data product creation. We recommend a 4-week pilot targeting 3 high-signal assets.
Key numbers:
| Metric | Value |
|---|---|
| Total BI assets in scope | [X] |
| Total estimated weekly Snowflake BI cost | [$ or credits] |
| Assets with zero usage in 90 days | [N] |
| Assets flagged for retirement | [N] |
| Assets flagged for optimisation | [N] |
| Assets flagged for data product migration | [N] |
| Assets flagged for Cortex/semantic layer | [N] |
| Top 3 cost drivers | [Names] |
| Top 3 usage drivers | [Names] |
| Recommended pilot count | 3 |
6.2 Cost/Usage Leaderboard (2–3 pages)¶
Include a table or chart of:
- Top 20 assets by estimated cost (Snowflake credits).
- Top 20 assets by usage events.
- Bottom 20 assets by usage (retirement candidates).
- Scatter: Usage Value vs Cost Pain (quadrants).
6.3 Recommendation Pages (1 page per bucket)¶
For each classification bucket, include:
- Keep in BI — count + rationale + examples.
- Optimise in place — count + examples + actions.
- Retire / Archive — count + risk + suggested timeline.
- Consolidate — duplicate clusters identified.
- Snowflake-native data product candidates — count + why + suggested surface (Streamlit, Native App, embedded app).
- Cortex / Semantic candidates — count + suggested semantic layer approach.
6.4 Recommended 4-Week Pilot¶
Pick 3 assets:
- Optimise in place — high cost, high usage, simple fix.
- Retire/Consolidate — unused or duplicated, low risk, quick win.
- Snowflake-native data product — strong workflow-fit, moderate risk, moderate complexity.
For each:
- Asset name.
- Current pain.
- Proposed action.
- Expected outcome.
- Estimated effort.
- Owner.
- Risk.
6.5 Business Case¶
- Current annual estimated BI-driven Snowflake cost.
- Estimated savings from retirement + consolidation.
- Estimated cost reduction from optimisation.
- Opportunity value from faster/more reliable analytics.
- Estimated implementation cost (if proceeding).
6.6 Risks and Next Steps¶
- Governance/audit risk.
- Ownership gaps.
- Technical dependency risk.
- Recommended next actions with owners and dates.
Phase 7: Pilot Selection Framework¶
7.1 Selection criteria¶
A good pilot asset should score well on:
- Speed to proof — Can we show a result in 2–4 weeks?
- Signal clarity — Is the before/after unambiguous?
- Stakeholder interest — Does someone important care?
- Low blast radius — If we break something, is it recoverable?
- Low dependency blockers — Can we access the data/logic without months of governance?
Avoid for pilot:
- Anything with
R == 3. - Anything used in board/audit reporting.
- Anything with > 10 downstream dependencies.
- Anything where the owner is unavailable.
7.2 Pilot mix¶
Every diagnostic should propose at least one pilot from three categories:
| Category | Why it matters | Example |
|---|---|---|
| Optimise in place | Proves backend modernisation is faster and cheaper than migration | Rewrite expensive query → dynamic table; show cost reduction |
| Retire / Consolidate | Proves governance reduces sprawl | Remove 5 unused reports; merge 3 duplicate dashboards; show cleaner estate |
| Data product candidate | Proves Snowflake-native UX for workflows | Build Streamlit workflow replacing a manual export/approval loop |
7.3 Pilot charter template¶
## Pilot: [Asset Name]
### Category
[Optimise / Retire / Data Product]
### Current state
- Tool: [Power BI / Tableau]
- Cost per week: [X credits / $]
- Users: [N]
- Pain: [description]
### Desired state
[What changes in 4 weeks]
### Success criteria
- [ ] Measurable cost or performance improvement.
- [ ] Stakeholder confirms the new state is acceptable.
- [ ] No regression in critical downstream workflows.
### Scope
- In: [list]
- Out: [list]
### Effort estimate
- Data/analytics engineering: [X] days
- BI/platform support: [Y] days
- Business stakeholder time: [Z] hours
### Owner
- Data/Analytics: [Name]
- BI/Platform: [Name]
- Business: [Name]
### Risk and rollback
- Risk: [describe]
- Rollback plan: [describe]
### Timeline
- Week 1: [activity]
- Week 2: [activity]
- Week 3: [activity]
- Week 4: [review and demo]
Appendix A: Data Model Quick Reference¶
This is the lightweight normalised model you should build from extracted data.
bi_assets¶
| Column | Type | Notes |
|---|---|---|
asset_id |
string | UUID or deterministic platform ID |
platform |
enum | powerbi, tableau, qlik, looker, ssrs, excel, other |
asset_type |
enum | report, dashboard, workbook, view, dataset, datasource, semantic_model |
asset_name |
string | Human-readable name |
workspace_or_project |
string | Workspace / project / folder |
owner |
string | Owner email or name |
created_at |
timestamp | |
updated_at |
timestamp | |
last_viewed_at |
timestamp | |
url |
string | Deep link if available |
is_certified |
boolean | Promoted/certified in platform |
snowflake_connected |
boolean | Confirmed Snowflake datasource |
snowflake_database |
string | If known |
snowflake_schema |
string | If known |
bi_usage_events¶
| Column | Type | Notes |
|---|---|---|
event_id |
string | UUID |
asset_id |
FK | |
user_id |
string | Email or ID |
event_type |
enum | view, export, refresh, edit, share, subscribe |
event_timestamp |
timestamp | |
session_id |
string | Optional |
bi_refresh_events¶
| Column | Type | Notes |
|---|---|---|
refresh_id |
string | |
asset_id |
FK | |
start_time |
timestamp | |
end_time |
timestamp | |
status |
enum | success, failure, cancelled |
duration_seconds |
float | |
failure_reason |
string |
snowflake_queries¶
| Column | Type | Notes |
|---|---|---|
query_id |
string | Snowflake query ID |
query_text |
text | Truncated at 2000 chars |
user_name |
string | |
role_name |
string | |
warehouse_name |
string | |
start_time |
timestamp | |
end_time |
timestamp | |
execution_time_ms |
int | |
bytes_scanned |
int | |
credits_estimated |
float | |
database_name |
string | |
schema_name |
string | |
query_tag |
string |
asset_query_mapping¶
| Column | Type | Notes |
|---|---|---|
asset_id |
FK | |
query_id |
FK | Snowflake query |
mapping_method |
enum | query_tag, service_account, warehouse, sql_fingerprint, datasource_metadata, manual |
confidence_score |
float | 0.0–1.0 |
asset_scores¶
| Column | Type | Notes |
|---|---|---|
asset_id |
FK | |
usage_value_score |
int | 0–3 |
cost_pain_score |
int | 0–3 |
semantic_complexity_score |
int | 0–3 |
data_product_suitability_score |
int | 0–3 |
migration_risk_score |
int | 0–3 |
composite_priority_score |
float | Calculated |
recommendation_bucket |
enum | keep, optimise, retire, consolidate, data_product, cortex, review |
rationale |
text | |
reviewed_by |
string | Human reviewer |
final_override |
enum | Optional expert override |
Appendix B: Common Pitfalls and Gotchas¶
Pitfall 1: Query text is truncated or redacted¶
Some organisations redact or truncate QUERY_TEXT for PII. If so:
- Rely more on query tags, warehouse names, and service account patterns.
- Use
OBJECTS_MODIFIEDorQUERY_HISTORY.QUERY_TEXTwithQUERY_TAG. - Consider requesting an exemption or PII-safe export with query hashes instead of text.
Pitfall 2: BI service accounts map many assets to one user¶
If Power BI/Tableau uses one Snowflake service account for all queries, query-to-asset mapping is hard.
Mitigation:
- Encourage the client to use per-workbook query tags.
- Map by warehouse (if different BI assets use different warehouses).
- Map by SQL fingerprint (query text patterns).
- Map by scheduled refresh times.
- Accept lower confidence and flag for manual review.
Pitfall 3: Power BI/Tableau activity events are 30-day rolling¶
Power BI Admin Activity Events and some Tableau logs have rolling retention.
Mitigation:
- Collect continuously during the engagement.
- Request a one-off historical export if available.
- Use what you have; 30 days is usually enough to tell used vs unused.
Pitfall 4: Semantic complexity is hidden in DAX/Tableau LOD¶
You may not have access to all DAX or LOD expressions.
Mitigation:
- Use measure count, relationship count, and dataset size as proxies.
- Flag datasets with many measures or calculated columns for manual review.
- If Scanner API or XMLA is available, use it to extract semantic depth.
Pitfall 5: Executives overstate importance of unused dashboards¶
A dashboard used in one annual board meeting may score zero on telemetry but be politically critical.
Mitigation:
- Always combine telemetry with stakeholder interviews.
- Use the
R(risk) dimension as a safety valve. - Never auto-retire without owner and stakeholder confirmation.
- Flag low-usage, high-risk assets separately.
Pitfall 6: Snowflake costs are not perfectly attributable¶
Warehouse costs include all queries, not just BI.
Mitigation:
- Use query tags, user names, and warehouse patterns to estimate BI-attributable cost.
- Be explicit about assumptions in the executive readout.
- Focus on directional opportunity, not exact accounting.
Pitfall 7: The client wants Streamlit for everything¶
If the client is excited about Streamlit, they may want you to migrate everything.
Mitigation:
- Be firm: Streamlit is a data product surface, not a BI replacement.
- Use the scoring rubric to deflect unsuitable candidates.
- Show them a candidate that is genuinely a workflow, not a dashboard.
Appendix C: Recommended External Resources¶
| Resource | Purpose |
|---|---|
SNOWFLAKE.ACCOUNT_USAGE docs |
Reference for all Snowflake usage views |
| Power BI REST API docs | Admin, datasets, reports, refreshes, activity events |
| Power BI Scanner API docs | Dataset-level metadata, measures, lineage |
| Tableau Server Repository docs | PostgreSQL schema reference |
| Tableau REST API docs | Workbook, view, datasource, task API |
| Tableau Metadata API docs | GraphQL for lineage and field-level detail |
| dbt docs | For modelling recommendations |
| Snowflake Cortex Analyst docs | For Cortex/semantic layer recommendations |
| Snowflake Streamlit docs | For data product prototyping |
| Snowflake Dynamic Tables docs | For backend optimisation |
Appendix D: Week-by-Week Example Timeline¶
Week 1: Kickoff + Access + Discovery¶
- Day 1: Kickoff, access checklist, pre-engagement questionnaire review.
- Day 2–3: Resolve access blockers, get credentials.
- Day 4–5: Snowflake extraction (initial queries), stakeholder interviews.
Week 2: Extraction + Inventory¶
- Day 1–2: Power BI / Tableau metadata extraction.
- Day 3–4: Build telemetry model; heuristically map BI assets to Snowflake queries.
- Day 5: Run scoring model; draft initial rankings.
Week 3: Scoring + Review¶
- Day 1–2: Expert review with BI/data platform/business stakeholders.
- Day 3: Apply overrides; finalise recommendations.
- Day 4–5: Build executive readout; draft pilot charters.
Week 4: Readout + Pilot Planning¶
- Day 1: Internal review of readout.
- Day 2: Present to leadership.
- Day 3–4: Confirm pilot scope and owners.
- Day 5: Final deliverable; next-phase planning.
Checklist: Before You Finish¶
- All Snowflake queries run successfully.
- BI metadata extracted or documented fallback plan.
- Scoring model applied to all in-scope assets.
- At least one asset in each recommendation bucket.
- Top 20 candidates reviewed with at least one stakeholder.
- 3 pilot candidates selected with charter templates.
- Executive readout reviewed internally.
- Readout presented to leadership.
- Access credentials revoked/handled per client policy.
- Lessons learned captured.