Start Here¶
What this repository is¶
This is an open-source reference monorepo that demonstrates how to build production-grade MLOps on Snowflake. It uses a hub-spoke architecture where a shared hub manages platform infrastructure and individual project spokes own their own feature stores, training pipelines, and inference pipelines.
The reference project, PUDO (Pick-Up / Drop-Off), predicts parcel capacity utilisation across a network of package drop-off and collection points.
What this repository is not¶
- It is not a Snowflake SDK tutorial.
- It is not a general-purpose ML framework.
- It does not require a specific CI/CD platform to run locally.
Prerequisites¶
Before starting the tutorials, you will need:
| Requirement | Why |
|---|---|
A Snowflake account with ACCOUNTADMIN access (for initial bootstrap) |
The hub component creates databases, warehouses, roles, and compute pools. |
| Python 3.10 | All components pin Python 3.10 for Snowflake runtime compatibility. |
| uv | The package manager used across all components. |
| Git | For cloning the repository and for the environment-selection mechanism. |
Two ways to use this documentation¶
Guided path (recommended for newcomers)¶
Follow the tutorials in order:
- Prerequisites & Bootstrap
- Repo Mental Model
- Seed Shared Data
- Deploy Schema & Feature Store
- Deploy & Run Training
- Deploy & Run Inference
- Simulate, Evaluate & Alert
- Change Promotion & ML Lifecycle
Concepts and reference (for experienced users)¶
If you already know Snowflake ML basics and want to understand the architecture or look up a specific command:
- Concepts: architecture, lifecycle, feature stores, orchestration, and environment promotion.
- Guides: practical how-to pages.
- Reference: command reference, component map, and glossary.
Next step¶
Read the PUDO Capacity Prediction use case to understand the business problem, then start with Tutorial 1: Prerequisites & Bootstrap.