Predicting Vilaine River Floods with Deep Learning

The problem: anticipating rising water levels

The Vilaine, the main river in Brittany (France), flows through a watershed where winter floods are a recurring risk. In January 2025, a major flood once again highlighted the vulnerability of towns in the upper basin, from Vitré to Châteaubourg.

Traditional hydrological models (conceptual or physics-based) are valuable, but their implementation remains complex and their adaptation to real-time limited. The goal of this project: build an operational machine learning model capable of providing reliable predictions of water level (H) and flow rate (Q) across 11 stations simultaneously, with a horizon of 1 to 24 hours, and confidence intervals to quantify uncertainty.

11 Hydrometric stations

24h Prediction horizon

25 yrs Training history

0.971 NSE at t+24h (Châteaubourg)

Upper Vilaine basin — Hydrographic network

The basin covers 4 rivers — the Vilaine (main axis), the Valière, the Cantache and the Veuvre/Chevré — with 3 dams whose releases directly influence downstream flood dynamics. The propagation time between the upstream end (Bourgon) and downstream (Cesson-Sévigné) is 11 to 27 hours depending on flow rate, which aligns well with our 24-hour prediction horizon.

The data: combining hydrology and weather

A good model starts with good data. I collected and combined two complementary sources, covering the period 2000 – 2026:

Hydrological data — Hydro EauFrance API

For each of the 11 stations in the basin (from upstream to downstream), I retrieved hourly time series of water level (H) and flow rate (Q) via the public Hydro EauFrance API. The API only returns one year of data per request: the collection script automatically splits the range into annual segments, handles retries, and operates incrementally — re-running the collection only fetches new data.

Weather data — Open-Meteo

For each station, I collected hourly precipitation (mm/h) and soil moisture (layers 0-7 cm and 7-28 cm) from the free Open-Meteo API. Soil moisture is a valuable indicator: already-saturated soil significantly amplifies runoff during a rainfall event.

💡 Incremental collection

All collection scripts are designed to run daily in production. They automatically detect the last collected timestamp and only download new data, making the pipeline robust and API-efficient.

Feature engineering

Raw data goes through a 7-step transformation pipeline to produce a training-ready dataset. Each station is characterized by 7 variables at each time step:

📥

Merge

11 stations × 3 sources → common hourly index

🔧

Interpolation

Gaps ≤ 6h filled linearly

📐

Derivatives

dH/dt, dQ/dt via central difference

🚰

Dams

Release detection (release feature)

⚖️

Normalization

Robust P1/P99

The derivative features (dH/dt and dQ/dt) capture the dynamics — is the level rising or falling, and how fast? For the 3 stations located downstream of dams, I added a "release" feature that detects water releases: a drop in water level without associated rainfall is a characteristic signal of a valve opening.

Normalization uses P1/P99 percentiles rather than min/max, making it robust to extreme values while preserving flood information.

Model architecture

The architecture I designed, called Station-Attention, combines three key ideas: treat each station as a distinct entity, let stations "talk to each other" via attention, and natively quantify uncertainty.

🌊 Past sequence
72h × 7 variables × 11 stations

🌧️ Future rainfall
24h forecasts per station

↓

Shared LSTM (per-station)
Each station encoded independently → 1 embedding / station

↓

Cross-Station Multi-Head Attention (×3 layers)
8 heads · Stations exchange spatial information

↓

H Decoder
Water level

Q Decoder
Flow rate

↓

1,296 outputs
11 stations × 24 horizons × 3 quantiles (q10, q50, q90)

Why a shared LSTM?

A single LSTM network processes all 11 stations, forcing the model to learn generic hydrological patterns (flood rise, recession, rainfall response) rather than local peculiarities. Each station enters with its 7 variables and outputs an embedding vector that summarizes its recent dynamics.

Multi-station attention: the key component

It's the cross-attention layer (Multi-Head Attention, 3 layers × 8 heads) that gives the model its power. Concretely, it allows each station to consult the state of others to refine its own prediction. If an upstream station shows a rapid rise, downstream stations "know" a flood wave is approaching, even if nothing has happened locally yet. This mechanism naturally captures the spatial and temporal propagation of floods along the basin.

Quantile regression: measuring uncertainty

Rather than a single predicted value, the model provides three quantiles for each prediction: q10 (lower bound), q50 (median prediction), and q90 (upper bound). This gives an 80% confidence interval directly usable for risk management.

Prediction example with confidence intervals

q10

q50 — median prediction

q90

← Best case 80% confidence interval Worst case →

Training: the details that matter

The model is implemented in PyTorch and trained on an NVIDIA DGX Spark equipped with a Blackwell GB10 GPU with 128 GB of unified memory — a welcome comfort for handling 25 years of data across 11 stations.

Asymmetric loss function

I use a pinball loss (the standard quantile regression loss) with a specific asymmetric penalty: during floods, underestimating the water level is far more dangerous than overestimating it. When the level at Châteaubourg exceeds 800 mm, the underestimation penalty is doubled. This is as much a domain decision as a technical one.

Flood oversampling

Floods are inherently rare events in the data. Without correction, the model would be excellent under normal conditions but mediocre during floods — precisely when accuracy matters most. I implemented progressive oversampling of flood periods in the training set:

🌊 Progressive oversampling

H ≥ 800 mm (moderate flood) → ×2 | H ≥ 1,200 mm (major flood) → ×4 | H ≥ 1,500 mm (severe flood) → ×8

Training strategy

Optimization uses AdamW with weight decay, a linear warmup over 5 epochs followed by a ReduceLROnPlateau scheduler. Gradients are clipped at 1.0 to stabilize training. Early stopping with a patience of 20 epochs on validation loss monitors overfitting.

The data split is strictly chronological to prevent information leakage: 25.5 years of training (2000 – June 2025), 6 months of validation (July – December 2025), and winter 2025-2026 for testing.

Results

The metrics below are measured on the test set (winter 2025-2026, data never seen during training), at the Châteaubourg station — the basin's reference station. The prediction is the median quantile (q50).

Real-time predictions — Châteaubourg, February 2026

Water level (blue) and flow rate (orange) observed and predicted at Châteaubourg. The colored bands represent the confidence interval [q10 – q90]. The dashed lines indicate historical reference flood levels.

Horizon	NSE	RMSE	Interpretation
t + 1h	0.9999	6 mm	Near-perfect — the model has very strong natural inertia
t + 6h	0.9967	36 mm	Excellent — well above the operational utility threshold
t + 12h	0.9887	67 mm	Very good — uncertainty increases but remains controlled
t + 24h	0.9710	108 mm	Good — an NSE > 0.97 at D+1 is a solid result

📊 Quantile calibration

The confidence intervals are well calibrated: P(y < q10) ≈ 8% and P(y < q90) ≈ 92%, for respective targets of 10% and 90%. This means the model's uncertainty intervals are reliable and usable in real situations.

NSE (Nash-Sutcliffe Efficiency) is the reference metric in hydrology. An NSE of 1 means a perfect prediction, an NSE of 0 means the model does no better than the historical mean. In practice, an NSE above 0.75 is considered "good" and above 0.90 "excellent". Our results are well beyond these thresholds, even at the 24-hour horizon.

Production deployment

A model that only runs in a notebook has limited value. The pipeline is designed end-to-end for real production deployment:

The PyTorch model is exported to ONNX format, a standard format that enables inference from any language. In this case, the model runs in a Node.js backend via onnxruntime-node, eliminating the Python dependency in production. Normalization parameters and metadata are also exported to ensure that inference preprocessing is identical to training.

Data collection runs continuously with the same scripts used for training — in incremental mode — ensuring consistency between training and production data.

Tech stack

# Framework & model
PyTorch        → LSTM + Multi-Head Attention + Quantile Regression
ONNX Runtime  → Export & production inference (Node.js)

# Data
Hydro EauFrance  → Water levels (H) & flow rates (Q), 11 stations
Open-Meteo        → Precipitation & soil moisture

# Infrastructure
NVIDIA DGX Spark  → Blackwell GB10, 128 GB unified memory, CUDA 13.0

# Production
Node.js           → Inference backend (onnxruntime-node)
    

What this project demonstrates

Beyond the numbers, this project illustrates a complete approach to applied machine learning — from raw data collection to production deployment. Designing a high-performing model also means building a robust pipeline, choosing an architecture suited to the business problem, and making the right engineering decisions (asymmetric loss, oversampling, ONNX export...).

This is exactly the type of engagement I deliver as a freelancer: turning a business problem into an operational, reliable, and maintainable machine learning system.

See the project: vilaine-amont.haruni.net · Source code on GitHub · Model on Hugging Face

🤝 Need a similar project?

I'm available for machine learning engineering engagements — modeling, data pipelines, production deployment. Feel free to get in touch.