Tutorial 8: Change Promotion & ML Lifecycle¶

This tutorial connects the operational workflows you have been running with the broader ML lifecycle: how code changes flow through Git, how they map to Snowflake environments, and how the full lifecycle is managed.

What you will learn¶

How the repository uses Git branches to select Snowflake environments.
How a code change progresses from development to production.
How code promotes from development to production via Git branches.
What happens at each lifecycle stage: feature engineering, training, inference, evaluation.

The Git-to-Snowflake mapping¶

The repository currently uses the Git branch name to determine which Snowflake environment to target:

Git branch	Snowflake environment	Config overlay
`dev` or `feature/*`	`DEV`	`dev.override.yaml`
`staging` or `release/*`	`STAGING`	`staging.override.yaml`
`main`	`PROD`	`prod.override.yaml`

When you run make -C projects/pudo deploy-schema, the script reads the current Git branch, resolves the environment, and merges the appropriate configuration overlay.

The ML lifecycle in this repository¶

The Snowflake ML lifecycle has several stages, each backed by repository components:

graph TD
    subgraph S1["1. Feature Engineering"]
        S1A["Define entities and feature views"]
        S1B["Deploy to Feature Store"]
    end

    subgraph S2["2. Dataset Generation"]
        S2A["Build spine with temporal splits"]
        S2B["ASOF JOIN for point-in-time correctness"]
    end

    subgraph S3["3. Model Training"]
        S3A["Distributed XGBoost via Container Services"]
        S3B["Evaluate on validation set"]
        S3C["Register in Model Registry"]
    end

    subgraph S4["4. Batch Inference"]
        S4A["Load model from registry"]
        S4B["Generate features for target date"]
        S4C["Write predictions"]
    end

    subgraph S5["5. Evaluation & Monitoring"]
        S5A["Compare predictions to actuals"]
        S5B["Compute drift metrics"]
        S5C["Trigger alerts on threshold breaches"]
    end

    subgraph S6["6. Retraining Decision"]
        S6A["Scheduled: daily/weekly"]
        S6B["Triggered by drift detection"]
        S6C["Returns to step 2"]
    end

    S1 --> S2 --> S3 --> S4 --> S5 --> S6
    S6 -->|"Returns to"| S2

How code promotes through environments¶

A typical promotion flow:

Development (feature branch)¶

Engineer creates a feature branch.
Makes changes to feature views, training code, or configuration.
Runs make -C projects/pudo deploy-* from the branch.
Changes are deployed to the DEV Snowflake environment.
Training and inference run against dev data.

Staging (release branch)¶

Feature branch is merged into a release branch.
Engineer runs the deploy targets from the release branch.
Changes are deployed to the STAGING environment.
Training runs against staging data (larger, more realistic).
Model metrics are compared to the production baseline.

Production (main branch)¶

Release branch is merged into main.
Engineer runs the deploy targets from main.
Changes are deployed to the PROD environment.
Production training DAG runs on schedule.
Inference DAG runs daily.
Monitoring and alerting are active.

This repository does not ship an automated CI/CD pipeline. Promotion today is manual: the same make targets are run from the appropriate branch, and the branch name selects the target environment.

The configuration overlay system¶

At each stage, different configuration overlays are applied:

# config/training/base.yaml (shared defaults)
train_days: 90
n_estimators: 500
learning_rate: 0.05

# config/training/dev.override.yaml (fast iteration)
train_days: 30
n_estimators: 50

# config/training/prod.override.yaml (full training)
train_days: 365
n_estimators: 1000

This means the same code runs in all environments, but with parameters appropriate for each stage.

What this means for your workflow¶

When you make a change:

Test locally against DEV using a feature branch.
Validate against STAGING using a release branch.
Deploy to PROD by merging to main.
Monitor production metrics and alerts.
Retrain when scheduled or when drift is detected.

Connecting back to the tutorials¶

Tutorial	ML lifecycle stage
1: Bootstrap	Platform setup
2: Mental Model	Architecture understanding
3: Seed Data	Data foundation
4: Feature Store	Feature engineering
5: Training	Training + registration
6: Inference	Batch inference
7: Evaluate & Alert	Evaluation + monitoring
This tutorial	Lifecycle + branch-based promotion

Where to go from here¶

Read the Concepts section for deeper explanations of individual topics.
Use the Guides for specific operational tasks.
Refer to the Command Reference for Makefile target details.