Skip to content

Start Here

What this repository is

This is an open-source reference monorepo that demonstrates how to build production-grade MLOps on Snowflake. It uses a hub-spoke architecture where a shared hub manages platform infrastructure and individual project spokes own their own feature stores, training pipelines, and inference pipelines.

The reference project, PUDO (Pick-Up / Drop-Off), predicts parcel capacity utilisation across a network of package drop-off and collection points.

What this repository is not

  • It is not a Snowflake SDK tutorial.
  • It is not a general-purpose ML framework.
  • It does not require a specific CI/CD platform to run locally.

Prerequisites

Before starting the tutorials, you will need:

Requirement Why
A Snowflake account with ACCOUNTADMIN access (for initial bootstrap) The hub component creates databases, warehouses, roles, and compute pools.
Python 3.10 All components pin Python 3.10 for Snowflake runtime compatibility.
uv The package manager used across all components.
Git For cloning the repository and for the environment-selection mechanism.

Two ways to use this documentation

Follow the tutorials in order:

  1. Prerequisites & Bootstrap
  2. Repo Mental Model
  3. Seed Shared Data
  4. Deploy Schema & Feature Store
  5. Deploy & Run Training
  6. Deploy & Run Inference
  7. Simulate, Evaluate & Alert
  8. Change Promotion & ML Lifecycle

Concepts and reference (for experienced users)

If you already know Snowflake ML basics and want to understand the architecture or look up a specific command:

  • Concepts: architecture, lifecycle, feature stores, orchestration, and environment promotion.
  • Guides: practical how-to pages.
  • Reference: command reference, component map, and glossary.

Next step

Read the PUDO Capacity Prediction use case to understand the business problem, then start with Tutorial 1: Prerequisites & Bootstrap.