f1 predict
live
./docs / prediction-model

Prediction Model

Overview

Two separate models run per race weekend: one for the grand prix, one for the sprint (sprint weekends only). Both use weighted feature scoring + softmax. The features and weights differ because sprint races are a fundamentally different format (~17 laps, no pit stops, track position dominates).

Prediction standings

Prediction page

Race prediction detail

Prediction detail page


Where the data comes from

Every feature score ultimately originates from one of three places:

1. FastF1 (live F1 session data, Python library)

FastF1 is the sole external data source. It provides telemetry and session data from the official F1 timing feed. The Python ingest jobs pull this data after each session and store it in raw tables.

FastF1 sessionIngest jobPopulates table(s)
Race ('R')ingest_racerace_results — finish pos, grid pos, status, points
Qualifying ('Q')ingest_qualifyingqualifying_results — Q1/Q2/Q3 lap times, grid pos, sector times
FP2 ('FP2')ingest_fp2fp2_long_run_times — stint median lap times on each compound
Race lap dataingest_racelap_times — per-lap time, sector times, tyre compound, tyre life, speed
Sprint ('S')ingest_sprintsprint_results — sprint finish pos, points
Sprint Qualifying ('SQ')ingest_sprint_qualifyingsprint_results.sq1/sq2/sq3_time_ms
Sprint lap dataingest_sprintsprint_lap_times — per-lap sprint data
Weatherevery ingestraces.weather, races.sprint_weatherRainfall flag from session telemetry

FastF1 HeadshotUrl is used for driver photos (available 2019+, not available earlier). FastF1 does not provide team logos — those are stored as static files in web/public/teams/.

2. Computed aggregates (run after each race)

compute_season_stats reads from the raw tables above and computes season-wide summaries. These are the inputs for most driver/team-level features.

Aggregate tableComputed fromKey columns used by features
driver_season_statsrace_results, qualifying_resultstotal_points, wins, races_entered, dnf_rate, avg_position_gain, teammate_quali_delta
team_season_statsrace_resultscar_performance_score (normalized avg finish), reliability_score (1 − DNF rate)

3. Static / hardcoded

Some inputs never change and are seeded once into the database or hardcoded in Python.

ValueWhere storedChanges?
Circuit overtake ratecircuits.overtake_rateNever — seeded once per track
Circuit SC probabilitycircuits.sc_probabilityNever — seeded once per track
Model weightscompute_features.py WEIGHTS dictOnly when model is intentionally updated
Sprint model weightscompute_sprint_features.py WEIGHTS dictOnly when model is intentionally updated
Softmax temperature T=0.3compute_predictions.py, compute_sprint_predictions.pyOnly when model is intentionally updated

Data pipeline — end-to-end flow

FastF1 session

    ▼  (ingest jobs)
raw tables: race_results, qualifying_results, lap_times,
            sprint_results, sprint_lap_times, fp2_long_run_times

    ▼  (compute_season_stats — after each race)
driver_season_stats, team_season_stats

    ├── + qualifying_results (sector times, grid position)
    ├── + lap_times (tyre degradation, long run pace)
    ├── + fp2_long_run_times (FP2 stint data)
    ├── + circuits.overtake_rate  ← static
    └── + circuits.sc_probability ← static

         ▼  (compute_features / compute_sprint_features)
    driver_prediction_features / driver_sprint_features

         ▼  (compute_predictions / compute_sprint_predictions — softmax)
    race_predictions / sprint_predictions

         ▼  (Hono API reads)
    /prediction page — displays win probabilities + feature breakdown

Race Status Flow

conventional weekend:
  scheduled → qualifying_done → completed
                   │                │
          compute_features    ingest_race
          compute_predictions compute_season_stats

sprint weekend:
  scheduled → sprint_qualifying_done → sprint_done → qualifying_done → completed
                       │                   │               │
              compute_sprint_features  ingest_sprint  compute_features
              compute_sprint_predictions               compute_predictions

Main race predictions are computed at qualifying_done. Sprint predictions are computed at sprint_qualifying_done (after SQ session).


Grand Prix Model — weighted-v3 — 12 Features

FeatureWeightSource
Car Performance20%team_season_stats.car_performance_score — normalized avg finish position across the field
Long Run Pace15%Primary: FP2 MEDIUM-normalised median stint lap time from fp2_long_run_times; fallback: historical circuit median from lap_times (last 6 visits)
Tyre Degradation8%REGR_SLOPE(lap_time_ms, tyre_life) over last 4 races at circuit — lower slope = better management = higher score; cross-season via driver code
Reliability8%Blend: team reliability score (70%) + driver personal DNF rate (30%)
Qualifying Delta8%Recency-weighted (5→1) mean of teammate quali gap across last 5 races — cross-season via driver code
Driver Rating8%driver_season_stats.total_points / races_entered / 25 — capped at 1.0
Win Rate8%Bayesian-smoothed: (wins + 0.5) / (races + 2)
Luck Factor7%Rolling 5-race delta between actual finish and expected (car rank + grid average) — cross-season
Sector Strength6%Best sector time advantage vs field in qualifying, averaged across S1/S2/S3
Circuit-Adj. Starting Position7%Grid position scaled by overtake rate and SC probability: start_pos × (1 + (1−overtake_rate)) × (1 − 0.3×sc_prob) — Monaco grid matters ×1.95, Monza ×1.15
Circuit-Adj. Position Gain3%Position gain scaled by overtake rate: avg_gain × overtake_rate — zero at Monaco, meaningful at Monza
Weather Impact2%Historical avg wet-race finish (cross-season, ≥1 wet race); neutral (0.5) for dry races

All feature scores are normalized to [0.0, 1.0] using min-max normalization across the driver field for that race.

Weighted Score

raw = (
    car_perf                  * 0.20 +
    long_run                  * 0.15 +
    tyre_deg                  * 0.08 +
    reliability               * 0.08 +
    quali_delta               * 0.08 +
    driver_rating             * 0.08 +
    win_rate                  * 0.08 +
    luck                      * 0.07 +
    sector_strength           * 0.06 +
    circuit_adj_start_pos     * 0.07 +
    circuit_adj_position_gain * 0.03 +
    weather_score             * 0.02
)

track_overtake_score is no longer a standalone additive feature — its value is baked into the circuit-adjusted starting position and position gain multipliers.


Sprint Model — sprint-v2 — 8 Features

Intentionally different from the main race model. Sprint races are ~17 laps with no pit stops — track position and car dominance matter far more than long-run strategy or reliability.

FeatureWeightSource
Car Performance25%team_season_stats.car_performance_score
Circuit-Adj. Starting Position25%Same multiplier as GP: start_pos × (1 + (1−overtake_rate)) × (1 − 0.3×sc_prob)
Short Run Pace10%Best SQ lap time (sq1/sq2/sq3_time_ms); falls back to main qualifying if SQ not yet ingested
Driver Rating10%Sprint-specific (sprint_total_points / sprint_races / 8) when ≥3 sprint races recorded; falls back to main race rating
Weather Impact8%Based on races.sprint_weather; cross-season historical wet performance
Win Rate8%Sprint-specific when ≥3 sprint races recorded; falls back to main race win rate
Luck Factor8%Rolling 5-race delta from recent grand prix results — cross-season
SQ Qualifying Delta6%Recency-weighted (5→1) mean of SQ teammate gap across last 5 sprint weekends — cross-season

Weighted Score

raw = (
    car_perf              * 0.25 +
    circuit_adj_start_pos * 0.25 +
    short_run             * 0.10 +
    driver_rating         * 0.10 +
    weather_score         * 0.08 +
    win_rate              * 0.08 +
    luck                  * 0.08 +
    sq_delta              * 0.06
)

Sprint driver rating and win rate use sprint-specific history when ≥3 sprint races are recorded, otherwise fall back to main race stats (scaled to match the 8-point sprint maximum vs 25-point race maximum).


Softmax → Win Probability

exp_s = np.exp(scores / T)
win_probability = exp_s / exp_s.sum()

Temperature T = 0.3. Lower temperature makes the model more decisive — small score differences produce larger probability gaps. Do not increase T.

Probabilities sum to 1.0 across all drivers in a race. Same temperature applies to both the sprint and main race models.


Predicted Position

Drivers are ranked by win_probability descending. Rank 1 = predicted winner, stored in race_predictions.predicted_winner_id (or sprint_predictions.predicted_winner_id for sprints). All driver positions are stored in driver_prediction_features.predicted_position (or driver_sprint_features.predicted_position).


Data Availability by Era

YearsQualifying dataLap timesSprint data
2018–presentFull Q1/Q2/Q3 + sector timesFull per-lap telemetryFull SQ times + sprint lap times
2006–2017Q1/Q2/Q3 timesNoneN/A (sprint format started 2021)
2000–2005Single best lap time onlyNoneN/A

For races without qualifying data, compute_features cannot run — those races have no predictions.