Snowflake BI Estate Diagnostic — Operating Playbook¶

Companion to: [[Snowflake BI Estate Diagnostic PRD]]

This playbook is a practical field guide for delivering the diagnostic on a real engagement. It contains templates, SQL, checklists, and step-by-step instructions.

How to use this playbook¶

Copy the relevant sections into your client project workspace.
Run through phases in order. Do not skip Phase 1 (access) — incomplete access wrecks every later phase.
Adapt the SQL and templates to the client’s Snowflake edition, BI platform version, and governance posture.
The scoring is designed to be opinionated but overridable. Use your judgment on edge cases.
This playbook assumes a 2–4 week timeline. If the client wants a two-day version, reduce to Snowflake-only analysis, skip BI metadata deep-dives, and focus on the cost/usage leaderboard.

Phase 0: Pre-Engagement Discovery Questionnaire¶

Send this to the client before Week 1 kickoff. The goal is to surface scope, pain, and access blockers early.

Part A: Context (5–10 minutes)¶

Part B: Scope and Access (5–10 minutes)¶

Part C: Known Pain Signals (Optional — helps prioritise)¶

Name 3–5 dashboards or reports you suspect are expensive, unused, duplicated, or problematic:
Are there functional areas where BI usage is especially high or especially painful?
- E.g., Finance, Sales, Marketing, Operations, Product: __

Phase 1: Access and Scope Checklist¶

Run this checklist in Week 1, Day 1.

Snowflake access¶

Power BI access¶

Power BI Admin Activity Events accessible via REST API.
Workspaces, reports, datasets listable.
Dataset refresh history available.
Gateway/data source metadata available.
Fabric/Scanner API available (if on Fabric).
XMLA read access confirmed (optional, useful for semantic models).
Service Principal token created and tested.

Tableau access¶

Tableau Server repository/PostgreSQL accessible (if Server).
Tableau REST API token created and tested.
Tableau Metadata API available.
Workbook, view, datasource listable.
Extract refresh history readable.
Background job logs accessible.

Human/contextual inputs¶

Schedule stakeholder interviews (BI lead, data platform lead, business users).
Gather list of known executive dashboards.
Gather list of known operational/manual reporting pain points.
Confirm any regulatory/audit constraints on reporting changes.

Output¶

Access blockers documented.
Data retention and governance constraints noted.
Scope confirmed.
Rough asset count estimate obtained.

Phase 2: Snowflake Extraction SQL¶

Run these queries in a Snowflake worksheet, Python connector, or dbt model. Save outputs to CSV or parquet for downstream analysis.

2.1 BI-driven query volume and cost¶

-- Query volume, cost, and performance by user/warehouse/asset patterns
-- Last 90 days, filter to BI-related users if known

SELECT
    DATE_TRUNC('day', start_time) AS query_day,
    user_name,
    role_name,
    warehouse_name,
    database_name,
    schema_name,
    query_tag,
    COUNT(*) AS query_count,
    SUM(total_elapsed_time) AS total_elapsed_ms,
    SUM(credits_used_cloud_services) AS credits_cloud,
    AVG(total_elapsed_time) AS avg_elapsed_ms,
    AVG(bytes_scanned) AS avg_bytes_scanned,
    MAX(bytes_scanned) AS max_bytes_scanned,
    SUM(bytes_scanned) AS total_bytes_scanned,
    SUM(partitions_scanned) AS partitions_scanned
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time >= DATEADD(day, -90, CURRENT_DATE())
  AND warehouse_name NOT IN ('LOAD_WH', 'DBT_WH') -- adjust for your non-BI warehouses
  AND user_name NOT IN ('SYSADMIN', 'LOADER') -- adjust for your non-BI users
GROUP BY 1, 2, 3, 4, 5, 6, 7
ORDER BY total_elapsed_ms DESC;

2.2 Top BI queries by cost¶

-- Top individual queries by cost (useful for identifying expensive queries from BI tools)
SELECT
    query_id,
    start_time,
    user_name,
    warehouse_name,
    database_name,
    schema_name,
    query_tag,
    LEFT(query_text, 2000) AS query_text_snippet,
    total_elapsed_time,
    bytes_scanned,
    credits_used_cloud_services,
    tables_modified
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time >= DATEADD(day, -90, CURRENT_DATE())
ORDER BY total_elapsed_time DESC
LIMIT 500;

2.3 Object access heatmap¶

-- Which database objects are accessed most by BI users/warehouses?
SELECT
    database_name,
    schema_name,
    table_name,
    COUNT(DISTINCT query_id) AS distinct_queries,
    COUNT(DISTINCT user_name) AS distinct_users,
    SUM(total_elapsed_time) AS total_elapsed_ms,
    SUM(bytes_scanned) AS total_bytes_scanned
FROM SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY
WHERE start_time >= DATEADD(day, -90, CURRENT_DATE())
  AND query_start_time IS NOT NULL
GROUP BY 1, 2, 3
ORDER BY distinct_queries DESC
LIMIT 500;

2.4 Warehouse metering by day and user¶

-- Daily warehouse cost by user — helps spot expensive BI users
SELECT
    DATE_TRUNC('day', start_time) AS usage_day,
    warehouse_name,
    user_name,
    SUM(credits_used_compute) AS compute_credits_used,
    SUM(credits_used_cloud_services) AS cloud_services_credits_used
FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY
WHERE start_time >= DATEADD(day, -90, CURRENT_DATE())
GROUP BY 1, 2, 3
ORDER BY compute_credits_used DESC;

2.5 BI service account query patterns¶

-- If you know service account names, profile their query behaviour
SELECT
    user_name,
    DATE_TRUNC('week', start_time) AS query_week,
    warehouse_name,
    query_tag,
    COUNT(*) AS query_count,
    SUM(total_elapsed_time) AS total_elapsed_ms,
    MEDIAN(total_elapsed_time) AS median_elapsed_ms,
    AVG(bytes_scanned) AS avg_bytes_scanned,
    APPROX_COUNT_DISTINCT(query_id) AS approx_unique_queries
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time >= DATEADD(day, -90, CURRENT_DATE())
  AND user_name IN (
      'TABLEAU_SERVICE',      -- adjust
      'POWERBI_SERVICE',      -- adjust
      'LOOKER_SERVICE'        -- adjust
  )
GROUP BY 1, 2, 3, 4
ORDER BY total_elapsed_ms DESC;

2.6 Query text fingerprinting for BI patterns¶

-- Heuristic: group queries by similar patterns to identify repeated BI query shapes
-- Snowflake does not natively support EXPLAIN PLAN storage, so use LEFT() + REGEXP

SELECT
    APPROX_COUNT_DISTINCT(query_id) AS approx_query_count,
    APPROX_COUNT_DISTINCT(user_name) AS approx_distinct_users,
    SUM(total_elapsed_time) AS total_elapsed_ms,
    LEFT(query_text, 300) AS query_signature,
    query_tag
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time >= DATEADD(day, -90, CURRENT_DATE())
  AND query_text ILIKE '%SELECT%'
GROUP BY LEFT(query_text, 300), query_tag
HAVING approx_query_count > 10
ORDER BY total_elapsed_ms DESC
LIMIT 200;

2.7 User and role mappings¶

-- Active users and roles, helpful for identifying BI tool users
SELECT
    u.name AS user_name,
    u.email,
    u.created_on,
    u.last_success_login,
    ARRAY_AGG(DISTINCT g.role) AS roles
FROM SNOWFLAKE.ACCOUNT_USAGE.USERS u
LEFT JOIN SNOWFLAKE.ACCOUNT_USAGE.GRANTS_TO_USERS g
    ON u.name = g.grantee_name
WHERE u.deleted_on IS NULL
GROUP BY 1, 2, 3, 4
ORDER BY u.last_success_login DESC NULLS LAST;

2.8 Dynamic tables and materialized views¶

-- Which BI-queried objects are already pre-aggregated?
-- Useful for recommending which dashboards could be served faster

SELECT
    database_name,
    schema_name,
    name AS object_name,
    'DYNAMIC_TABLE' AS object_type,
    scheduling_state,
    last_suspended_on,
    bytes,
    rows
FROM SNOWFLAKE.ACCOUNT_USAGE.DYNAMIC_TABLES
UNION ALL
SELECT
    database_name,
    schema_name,
    name,
    'MATERIALIZED_VIEW',
    NULL,
    NULL,
    bytes,
    rows
FROM SNOWFLAKE.ACCOUNT_USAGE.MATERIALIZED_VIEWS
ORDER BY bytes DESC;

2.9 Extract outputs¶

Export all of the above to CSV or write them into a single analysis schema using CTAS for faster downstream joins.

Phase 3: Power BI Metadata Extraction Checklist¶

3.1 Activity events¶

Use the Power BI Admin REST API:

GET https://api.powerbi.com/v1.0/myorg/admin/activityevents
    ?startDateTime={start}
    &endDateTime={end}

Requires: Power BI Admin or delegated audit permissions.

Fields to capture for scoring:

Id — event ID.
CreationTime — event time.
Operation — e.g., ViewReport, ExportReport, RefreshDataset.
Workload — e.g., PowerBI, ASAzure.
UserId — user email.
WorkspaceName, WorkspaceId.
ReportName, ReportId, ObjectId.
DatasetName, DatasetId.
IsSuccess.
DataflowName (if applicable).

Key operations to count:

Operation	Scoring signal
`ViewReport`	Usage value
`ExportReport`	Usage value / manual workaround signal
`RefreshDataset`	Cost/performance pain
`CreateReport`	Sprawl indicator
`DeleteReport`	Churn
`UpdateApp`	Governance signal
`GenerateDataInsights`	Advanced usage

3.2 Reports, datasets, and workspaces¶

GET https://api.powerbi.com/v1.0/myorg/admin/reports
GET https://api.powerbi.com/v1.0/myorg/admin/datasets
GET https://api.powerbi.com/v1.0/myorg/groups  -- workspaces

Fields to capture:

name, id, workspaceId, datasetId, webUrl.
createdDateTime, modifiedDateTime.
reportType, endorsementDetails.

3.3 Refresh history¶

GET https://api.powerbi.com/v1.0/myorg/datasets/{datasetId}/refreshes

Capture:

startTime, endTime, status, refreshType, serviceExceptionJson.
Calculate: success rate, avg duration, failure reasons.

3.4 Dataset data sources (gateway mapping)¶

GET https://api.powerbi.com/v1.0/myorg/datasets/{datasetId}/datasources

Capture:

datasourceType — confirm Snowflake datasets.
connectionDetails — server/database granularity.
gatewayId — on-prem vs cloud gateway.

3.5 Scanner API / Fabric metadata (if available)¶

The Scanner API is the most powerful but requires Fabric capacity:

POST https://api.powerbi.com/v1.0/myorg/admin/workspaces/getInfo

Returns:

Reports, datasets, dashboards, dataflows, apps.
Dataset schema, measures, columns, relationships (TOM-level).
Lineage info.

If you have Scanner API access:

Extract measure definitions (DAX) for semantic complexity scoring.
Extract relationships and cardinality.
Identify duplicate measures across datasets.

3.6 XMLA endpoint (semantic model deep-dive)¶

If available, connect to the workspace XMLA endpoint:

Server: powerbi://api.powerbi.com/v1.0/myorg/{workspace-name}

Query via DAX/SSAS tools to extract:

Measures and expressions.
Calculated columns.
Relationships and key columns.
Row-level security roles.

3.7 Manual inventory fallback¶

If API access is blocked or limited, the fallback is manual inventory:

Export workspace report list from Power BI Service UI.
Survey known report owners for usage/criticality.
Map known Snowflake service accounts to known datasets.

Phase 4: Tableau Metadata Extraction Checklist¶

4.1 Tableau Server Repository (PostgreSQL)¶

If you have Tableau Server and repository access, this is the richest source.

Connection details (from Tableau Server admin):

Host: Tableau Server node running the repository.
Port: 8060 (default for readonly account).
Database: workgroup.
User: readonly (or custom).

Key tables/views:

Table/View	Purpose	Fields to extract
`_views`	Dashboard usage	`name`, `id`, `workbook_id`, `site_id`, `created_at`, `updated_at`
`_workbooks`	Workbook inventory	`name`, `id`, `project_id`, `owner_id`, `created_at`, `updated_at`, `size`
`_datasources`	Datasource inventory	`name`, `id`, `type`, `created_at`, `owner_id`, `extract_refreshed_at`
`_users`	User mapping	`name`, `id`, `site_id`, `email`, `last_login`
`views_stats`	View event history	`view_id`, `user_id`, `timestamp`, `nviews`
`hist_views`	Historical view counts	`view_id`, `timestamp`, `nviews`
`_sessions`	User sessions	`user_id`, `session_id`, `created_at`, `updated_at`
`background_jobs`	Extract/refresh jobs	`target_id`, `target_type`, `created_at`, `completed_at`, `status`, `notes`
`data_connections`	Connection info	`datasource_id`, `server`, `port`, `dbclass`, `dbname`

Key query patterns:

-- Most viewed workbooks
SELECT
    w.name AS workbook_name,
    COUNT(v.id) AS view_count,
    MAX(vs.timestamp) AS last_viewed_at
FROM _workbooks w
JOIN _views v ON w.id = v.workbook_id
LEFT JOIN views_stats vs ON v.id = vs.view_id
WHERE vs.timestamp >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY w.name
ORDER BY view_count DESC;

-- Extract refresh performance
SELECT
    d.name AS datasource_name,
    COUNT(bj.target_id) AS refresh_count,
    AVG(EXTRACT(EPOCH FROM (bj.completed_at - bj.created_at))) AS avg_refresh_seconds,
    SUM(CASE WHEN bj.status = 0 THEN 1 ELSE 0 END) AS failure_count
FROM background_jobs bj
JOIN _datasources d ON bj.target_id = d.id
WHERE bj.created_at >= CURRENT_DATE - INTERVAL '90 days'
  AND bj.target_type = 'Datasource'
GROUP BY d.name
ORDER BY refresh_count DESC;

-- Snowflake-connected datasources
SELECT
    d.name AS datasource_name,
    w.name AS workbook_name,
    dc.server,
    dc.dbname
FROM _datasources d
JOIN data_connections dc ON d.id = dc.datasource_id
LEFT JOIN _workbooks w ON d.id = w.id -- adjust join logic
WHERE dc.dbclass = 'snowflake';

4.2 Tableau REST API¶

Tableau Cloud or Server without repository access:

Base URL:

Server: https://{server}/api/{version}/
Cloud: https://{pod}.online.tableau.com/api/{version}/

Endpoints to hit:

Endpoint	Purpose
`GET /api/{version}/sites/{siteId}/workbooks`	List workbooks
`GET /api/{version}/sites/{siteId}/views`	List views
`GET /api/{version}/sites/{siteId}/datasources`	List datasources
`GET /api/{version}/sites/{siteId}/workbooks/{id}`	Workbook details
`GET /api/{version}/sites/{siteId}/views/{id}/usageStatistics`	View usage counts
`GET /api/{version}/sites/{siteId}/datasources/{id}/connections`	Datasource connections
`GET /api/{version}/sites/{siteId}/tasks/extractRefreshes`	Extract refresh schedules
`GET /api/{version}/sites/{siteId}/tasks/extractRefreshes/{taskId}`	Extract refresh details

Authentication:

Personal Access Token (PAT): preferred for API automation.
Username/password: fallback if PAT is blocked.

Key fields to extract:

Workbook/View/Datasource: name, id, createdAt, updatedAt, owner, project.
Usage: totalViewCount, totalViewCount (snapshot — not history).
Connections: serverAddress, serverPort, userName, dbclass.

4.3 Tableau Metadata API (GraphQL)¶

More powerful than REST for lineage and field-level detail:

query workbooks {
  workbooks {
    id
    name
    projectName
    owner {
      username
      email
    }
    sheets {
      id
      name
    }
    upstreamDatabases {
      id
      name
      connectionType
    }
    embeddedExtracts {
      id
      name
    }
  }
}

Useful for:

Field-level lineage.
Column references.
Snowflake database connections.
Embedded vs live connections.
Extract dependencies.

Phase 5: Scoring and Classification¶

5.1 Scoring rubric¶

Use a 0–3 scale for each dimension. Do not overthink. The goal is ranking, not perfect precision.

Usage Value (U)¶

Score	Criteria
0	No views in 90 days; no owner confirmation; stale metadata.
1	< 5 unique users; viewed < biweekly; no subscriptions.
2	5–20 unique users; viewed weekly; some subscriptions/favourites.
3	> 20 unique users OR executive/leadership use OR embedded in recurring meetings OR heavily subscribed.

Cost / Performance Pain (C)¶

Score	Criteria
0	< 1 compute credit per week; scan < 1 GB; refresh < 1 min.
1	1–5 credits/week; scan 1–50 GB; refresh 1–5 min.
2	5–20 credits/week; scan 50–500 GB; refresh 5–15 min; occasional timeouts.
3	> 20 credits/week OR scan > 500 GB OR refresh > 15 min OR frequent failures OR causes warehouse scaling.

Semantic Complexity (S)¶

Score	Criteria
0	Basic table/chart with filters; no calculated fields.
1	Some calculated columns/fields; basic aggregations.
2	Multiple measures; some DAX/LOD calcs; relationships; role-level security.
3	Complex DAX/Tableau LOD; semi-additive measures; many-to-many relationships; intricate row-level security.

Data Product Suitability (D)¶

Score	Criteria
0	Purely visual/exploratory; no parameter input; no action needed.
1	Some parameters or basic drilling; occasional export for further work.
2	Regular parameter use; users export and manipulate; what-if questions; some writeback need.
3	Operational workflow in disguise; approval chains; scenario modelling; data quality review; writeback desirable; users asking for UI actions.

Migration / Change Risk (R)¶

Score	Criteria
0	Internal team tool; owner confirmed; low visibility; simple logic.
1	Team tool with some exec interest; owner exists but busy.
2	Executive-facing; metrics used in financial reporting; owner unclear; heavy reliance on visual layout.
3	Board/audit/regulatory; fragile undocumented logic; no owner; multiple stakeholder dependencies.

5.2 Classification rules¶

After scoring, classify each asset using this flow. Do not be rigid — override with expert judgment.

IF R == 3 AND (U >= 2 OR C >= 2):
    → FLAG: High governance risk — do not touch without formal review
    ELSE IF R == 3:
    → CLASSIFY: Keep in BI (governance)

ELSE IF S >= 2 AND D <= 1:
    → Consider: Optimise in place OR Consolidate
    (semantic richness lives in BI; modernise the backend)

ELSE IF U == 0 AND C <= 1:
    → CLASSIFY: Retire / Archive

ELSE IF U <= 1 AND C >= 2:
    → CLASSIFY: Retire / Archive (unused but expensive)

ELSE IF D >= 2 AND S >= 2:
    → CLASSIFY: Snowflake-native data product candidate
    (high complexity + workflow potential = prime candidate)

ELSE IF D >= 2 AND S <= 1:
    → CLASSIFY: Snowflake-native data product candidate
    (simple logic but workflow fit = quick win)

ELSE IF S >= 2 AND D <= 1 AND U >= 2:
    → CLASSIFY: Cortex / Semantic candidate
    (governed metrics, repeated questions)

ELSE IF C >= 2 AND U >= 2:
    → CLASSIFY: Optimise in place
    (keep BI surface, fix backend/cost)

ELSE IF U >= 2 AND C <= 1 AND S <= 1 AND D <= 1:
    → CLASSIFY: Keep in BI
    (low cost, reasonable usage, simple BI fit)

ELSE IF duplicate metric/dataset confirmed:
    → CLASSIFY: Consolidate

ELSE:
    → FLAG: Human review needed

5.3 Composite priority score¶

For internal sorting, use a weighted composite:

priority_score =
    (U * 2.0) +
    (C * 1.5) +
    (D * 1.5) -
    (R * 1.0) -
    (S * 0.5)

Higher score = bigger opportunity to intervene.

Rationale:

U matters most: intervention on unused assets is waste.
C and D signal value potential.
R penalises risky assets.
S slightly penalises because complex BI semantic models are expensive to migrate.

Sort descending. The top 20–50 assets are your highest-leverage recommendations.

Phase 6: Executive Readout Template¶

6.1 Executive Summary (1 page)¶

Title: BI Estate Modernisation Assessment — [Client Name]

One-paragraph summary:

We analysed [X] BI assets across [Power BI / Tableau] and [Y] days of Snowflake query telemetry. We identified [N] high-priority opportunities spanning cost reduction, consolidation, and Snowflake-native data product creation. We recommend a 4-week pilot targeting 3 high-signal assets.

Key numbers:

Metric	Value
Total BI assets in scope	[X]
Total estimated weekly Snowflake BI cost	[$ or credits]
Assets with zero usage in 90 days	[N]
Assets flagged for retirement	[N]
Assets flagged for optimisation	[N]
Assets flagged for data product migration	[N]
Assets flagged for Cortex/semantic layer	[N]
Top 3 cost drivers	[Names]
Top 3 usage drivers	[Names]
Recommended pilot count	3

6.2 Cost/Usage Leaderboard (2–3 pages)¶

Include a table or chart of:

Top 20 assets by estimated cost (Snowflake credits).
Top 20 assets by usage events.
Bottom 20 assets by usage (retirement candidates).
Scatter: Usage Value vs Cost Pain (quadrants).

6.3 Recommendation Pages (1 page per bucket)¶

For each classification bucket, include:

Keep in BI — count + rationale + examples.
Optimise in place — count + examples + actions.
Retire / Archive — count + risk + suggested timeline.
Consolidate — duplicate clusters identified.
Snowflake-native data product candidates — count + why + suggested surface (Streamlit, Native App, embedded app).
Cortex / Semantic candidates — count + suggested semantic layer approach.

6.4 Recommended 4-Week Pilot¶

Pick 3 assets:

Optimise in place — high cost, high usage, simple fix.
Retire/Consolidate — unused or duplicated, low risk, quick win.
Snowflake-native data product — strong workflow-fit, moderate risk, moderate complexity.

For each:

Asset name.
Current pain.
Proposed action.
Expected outcome.
Estimated effort.
Owner.
Risk.

6.5 Business Case¶

Current annual estimated BI-driven Snowflake cost.
Estimated savings from retirement + consolidation.
Estimated cost reduction from optimisation.
Opportunity value from faster/more reliable analytics.
Estimated implementation cost (if proceeding).

6.6 Risks and Next Steps¶

Governance/audit risk.
Ownership gaps.
Technical dependency risk.
Recommended next actions with owners and dates.

Phase 7: Pilot Selection Framework¶

7.1 Selection criteria¶

A good pilot asset should score well on:

Speed to proof — Can we show a result in 2–4 weeks?
Signal clarity — Is the before/after unambiguous?
Stakeholder interest — Does someone important care?
Low blast radius — If we break something, is it recoverable?
Low dependency blockers — Can we access the data/logic without months of governance?

Avoid for pilot:

Anything with R == 3.
Anything used in board/audit reporting.
Anything with > 10 downstream dependencies.
Anything where the owner is unavailable.

7.2 Pilot mix¶

Every diagnostic should propose at least one pilot from three categories:

Category	Why it matters	Example
Optimise in place	Proves backend modernisation is faster and cheaper than migration	Rewrite expensive query → dynamic table; show cost reduction
Retire / Consolidate	Proves governance reduces sprawl	Remove 5 unused reports; merge 3 duplicate dashboards; show cleaner estate
Data product candidate	Proves Snowflake-native UX for workflows	Build Streamlit workflow replacing a manual export/approval loop

7.3 Pilot charter template¶

## Pilot: [Asset Name]

### Category
[Optimise / Retire / Data Product]

### Current state
- Tool: [Power BI / Tableau]
- Cost per week: [X credits / $]
- Users: [N]
- Pain: [description]

### Desired state
[What changes in 4 weeks]

### Success criteria
- [ ] Measurable cost or performance improvement.
- [ ] Stakeholder confirms the new state is acceptable.
- [ ] No regression in critical downstream workflows.

### Scope
- In: [list]
- Out: [list]

### Effort estimate
- Data/analytics engineering: [X] days
- BI/platform support: [Y] days
- Business stakeholder time: [Z] hours

### Owner
- Data/Analytics: [Name]
- BI/Platform: [Name]
- Business: [Name]

### Risk and rollback
- Risk: [describe]
- Rollback plan: [describe]

### Timeline
- Week 1: [activity]
- Week 2: [activity]
- Week 3: [activity]
- Week 4: [review and demo]

Appendix A: Data Model Quick Reference¶

This is the lightweight normalised model you should build from extracted data.

`bi_assets`¶

Column	Type	Notes
`asset_id`	string	UUID or deterministic platform ID
`platform`	enum	`powerbi`, `tableau`, `qlik`, `looker`, `ssrs`, `excel`, `other`
`asset_type`	enum	`report`, `dashboard`, `workbook`, `view`, `dataset`, `datasource`, `semantic_model`
`asset_name`	string	Human-readable name
`workspace_or_project`	string	Workspace / project / folder
`owner`	string	Owner email or name
`created_at`	timestamp
`updated_at`	timestamp
`last_viewed_at`	timestamp
`url`	string	Deep link if available
`is_certified`	boolean	Promoted/certified in platform
`snowflake_connected`	boolean	Confirmed Snowflake datasource
`snowflake_database`	string	If known
`snowflake_schema`	string	If known

`bi_usage_events`¶

Column	Type	Notes
`event_id`	string	UUID
`asset_id`	FK
`user_id`	string	Email or ID
`event_type`	enum	`view`, `export`, `refresh`, `edit`, `share`, `subscribe`
`event_timestamp`	timestamp
`session_id`	string	Optional

`bi_refresh_events`¶

Column	Type	Notes
`refresh_id`	string
`asset_id`	FK
`start_time`	timestamp
`end_time`	timestamp
`status`	enum	`success`, `failure`, `cancelled`
`duration_seconds`	float
`failure_reason`	string

`snowflake_queries`¶

Column	Type	Notes
`query_id`	string	Snowflake query ID
`query_text`	text	Truncated at 2000 chars
`user_name`	string
`role_name`	string
`warehouse_name`	string
`start_time`	timestamp
`end_time`	timestamp
`execution_time_ms`	int
`bytes_scanned`	int
`credits_estimated`	float
`database_name`	string
`schema_name`	string
`query_tag`	string

`asset_query_mapping`¶

Column	Type	Notes
`asset_id`	FK
`query_id`	FK	Snowflake query
`mapping_method`	enum	`query_tag`, `service_account`, `warehouse`, `sql_fingerprint`, `datasource_metadata`, `manual`
`confidence_score`	float	0.0–1.0

`asset_scores`¶

Column	Type	Notes
`asset_id`	FK
`usage_value_score`	int	0–3
`cost_pain_score`	int	0–3
`semantic_complexity_score`	int	0–3
`data_product_suitability_score`	int	0–3
`migration_risk_score`	int	0–3
`composite_priority_score`	float	Calculated
`recommendation_bucket`	enum	`keep`, `optimise`, `retire`, `consolidate`, `data_product`, `cortex`, `review`
`rationale`	text
`reviewed_by`	string	Human reviewer
`final_override`	enum	Optional expert override

Appendix B: Common Pitfalls and Gotchas¶

Pitfall 1: Query text is truncated or redacted¶

Some organisations redact or truncate QUERY_TEXT for PII. If so:

Rely more on query tags, warehouse names, and service account patterns.
Use OBJECTS_MODIFIED or QUERY_HISTORY.QUERY_TEXT with QUERY_TAG.
Consider requesting an exemption or PII-safe export with query hashes instead of text.

Pitfall 2: BI service accounts map many assets to one user¶

If Power BI/Tableau uses one Snowflake service account for all queries, query-to-asset mapping is hard.

Mitigation:

Encourage the client to use per-workbook query tags.
Map by warehouse (if different BI assets use different warehouses).
Map by SQL fingerprint (query text patterns).
Map by scheduled refresh times.
Accept lower confidence and flag for manual review.

Pitfall 3: Power BI/Tableau activity events are 30-day rolling¶

Power BI Admin Activity Events and some Tableau logs have rolling retention.

Mitigation:

Collect continuously during the engagement.
Request a one-off historical export if available.
Use what you have; 30 days is usually enough to tell used vs unused.

Pitfall 4: Semantic complexity is hidden in DAX/Tableau LOD¶

You may not have access to all DAX or LOD expressions.

Mitigation:

Use measure count, relationship count, and dataset size as proxies.
Flag datasets with many measures or calculated columns for manual review.
If Scanner API or XMLA is available, use it to extract semantic depth.

Pitfall 5: Executives overstate importance of unused dashboards¶

A dashboard used in one annual board meeting may score zero on telemetry but be politically critical.

Mitigation:

Always combine telemetry with stakeholder interviews.
Use the R (risk) dimension as a safety valve.
Never auto-retire without owner and stakeholder confirmation.
Flag low-usage, high-risk assets separately.

Pitfall 6: Snowflake costs are not perfectly attributable¶

Warehouse costs include all queries, not just BI.

Mitigation:

Use query tags, user names, and warehouse patterns to estimate BI-attributable cost.
Be explicit about assumptions in the executive readout.
Focus on directional opportunity, not exact accounting.

Pitfall 7: The client wants Streamlit for everything¶

If the client is excited about Streamlit, they may want you to migrate everything.

Mitigation:

Be firm: Streamlit is a data product surface, not a BI replacement.
Use the scoring rubric to deflect unsuitable candidates.
Show them a candidate that is genuinely a workflow, not a dashboard.

Appendix C: Recommended External Resources¶

Resource	Purpose
`SNOWFLAKE.ACCOUNT_USAGE` docs	Reference for all Snowflake usage views
Power BI REST API docs	Admin, datasets, reports, refreshes, activity events
Power BI Scanner API docs	Dataset-level metadata, measures, lineage
Tableau Server Repository docs	PostgreSQL schema reference
Tableau REST API docs	Workbook, view, datasource, task API
Tableau Metadata API docs	GraphQL for lineage and field-level detail
dbt docs	For modelling recommendations
Snowflake Cortex Analyst docs	For Cortex/semantic layer recommendations
Snowflake Streamlit docs	For data product prototyping
Snowflake Dynamic Tables docs	For backend optimisation

Appendix D: Week-by-Week Example Timeline¶

Week 1: Kickoff + Access + Discovery¶

Day 1: Kickoff, access checklist, pre-engagement questionnaire review.
Day 2–3: Resolve access blockers, get credentials.
Day 4–5: Snowflake extraction (initial queries), stakeholder interviews.

Week 2: Extraction + Inventory¶

Day 1–2: Power BI / Tableau metadata extraction.
Day 3–4: Build telemetry model; heuristically map BI assets to Snowflake queries.
Day 5: Run scoring model; draft initial rankings.

Snowflake BI Estate Diagnostic — Operating Playbook¶

How to use this playbook¶

Phase 0: Pre-Engagement Discovery Questionnaire¶

Part A: Context (5–10 minutes)¶

Part B: Scope and Access (5–10 minutes)¶

Part C: Known Pain Signals (Optional — helps prioritise)¶

Phase 1: Access and Scope Checklist¶

Snowflake access¶

Power BI access¶

Tableau access¶

Human/contextual inputs¶

Output¶

Phase 2: Snowflake Extraction SQL¶

2.1 BI-driven query volume and cost¶

2.2 Top BI queries by cost¶

2.3 Object access heatmap¶

2.4 Warehouse metering by day and user¶

2.5 BI service account query patterns¶

2.6 Query text fingerprinting for BI patterns¶

2.7 User and role mappings¶

2.8 Dynamic tables and materialized views¶

2.9 Extract outputs¶

Phase 3: Power BI Metadata Extraction Checklist¶

3.1 Activity events¶

3.2 Reports, datasets, and workspaces¶

3.3 Refresh history¶

3.4 Dataset data sources (gateway mapping)¶

3.5 Scanner API / Fabric metadata (if available)¶

3.6 XMLA endpoint (semantic model deep-dive)¶

3.7 Manual inventory fallback¶

Phase 4: Tableau Metadata Extraction Checklist¶

4.1 Tableau Server Repository (PostgreSQL)¶

4.2 Tableau REST API¶

4.3 Tableau Metadata API (GraphQL)¶

Phase 5: Scoring and Classification¶

5.1 Scoring rubric¶

Usage Value (U)¶

Cost / Performance Pain (C)¶

Semantic Complexity (S)¶

Data Product Suitability (D)¶

Migration / Change Risk (R)¶

5.2 Classification rules¶

5.3 Composite priority score¶

Phase 6: Executive Readout Template¶

6.1 Executive Summary (1 page)¶

6.2 Cost/Usage Leaderboard (2–3 pages)¶

6.3 Recommendation Pages (1 page per bucket)¶

6.4 Recommended 4-Week Pilot¶

6.5 Business Case¶

6.6 Risks and Next Steps¶

Phase 7: Pilot Selection Framework¶

7.1 Selection criteria¶

7.2 Pilot mix¶

7.3 Pilot charter template¶

Appendix A: Data Model Quick Reference¶

bi_assets¶

bi_usage_events¶

bi_refresh_events¶

snowflake_queries¶

asset_query_mapping¶

asset_scores¶

Appendix B: Common Pitfalls and Gotchas¶

Pitfall 1: Query text is truncated or redacted¶

Pitfall 2: BI service accounts map many assets to one user¶

Pitfall 3: Power BI/Tableau activity events are 30-day rolling¶

Pitfall 4: Semantic complexity is hidden in DAX/Tableau LOD¶

Pitfall 5: Executives overstate importance of unused dashboards¶

Pitfall 6: Snowflake costs are not perfectly attributable¶

Pitfall 7: The client wants Streamlit for everything¶

Appendix C: Recommended External Resources¶

Appendix D: Week-by-Week Example Timeline¶

Week 1: Kickoff + Access + Discovery¶

Week 2: Extraction + Inventory¶

Week 3: Scoring + Review¶

Week 4: Readout + Pilot Planning¶

Checklist: Before You Finish¶

`bi_assets`¶

`bi_usage_events`¶

`bi_refresh_events`¶

`snowflake_queries`¶

`asset_query_mapping`¶

`asset_scores`¶