PUDO Capacity Prediction¶
The business problem¶
PUDO stands for Pick-Up / Drop-Off: a network of locations where customers collect or return parcels. Each PUDO location has a finite daily capacity determined by its physical size, staffing, and operating hours.
The central operational question is:
How much capacity will each PUDO location need tomorrow?
Underestimating capacity leads to overflow, missed deliveries, and customer dissatisfaction. Overestimating wastes resources and increases cost.
Why this is an ML problem¶
- Capacity demand depends on many factors: location type, nearby competing PUDOs, geographic density, day of week, seasonal patterns, and historical trends.
- Simple heuristics (e.g., last week's average) fail when conditions change.
- The relationship between features and demand is non-linear and benefits from gradient-boosted models.
The data¶
The repository includes a mock data generator (mock_data/) that simulates
a realistic PUDO network:
| Data domain | Examples |
|---|---|
| PUDO locations | Geographic coordinates, location type, capacity, operating hours |
| Parcel volumes | Daily parcel counts, delivery attempts, occupancy rates |
| Temporal patterns | Day-of-week effects, seasonal trends, growth trajectories |
| Geospatial context | Nearby competing PUDOs, regional demand density |
The mock data is seeded into a shared Snowflake schema (SHARED_DATA) and
consumed by the feature store and downstream ML pipelines.
Why this is a good MLOps reference¶
The PUDO problem exercises the full MLOps lifecycle:
- Feature engineering: geospatial features, temporal aggregations, point-in-time correct feature views.
- Dataset generation: spine construction, temporal train/val/test splits, ASOF joins for point-in-time correctness.
- Model training: distributed XGBoost training via Snowflake Container Services, model evaluation, and registration in the Snowflake Model Registry.
- Batch inference: automated feature generation, model loading, and prediction writing.
- Evaluation and monitoring: prediction vs. actual comparison, alerting on threshold breaches, drift detection.
- Lifecycle management: feature versioning, model versioning, environment promotion, and configuration overlays.
Next step¶
Now that you understand the business problem, read the Repo Mental Model to learn how the codebase is organised, or start with Tutorial 1: Prerequisites & Bootstrap if you are ready to set up your environment.