ADR 0001: Sidecar Storage Schema (SQLite)
- Status: accepted
- Date: 2025-11-15
- Deciders: @mac-reichelt
- Issue: #8
Context
The tuning-coach Rust sidecar ingests Forza Motorsport telemetry over UDP at
~60 Hz, runs heuristics over rolling windows, emits tuning recommendations to
the SimHub overlay over WebSocket, accepts hotkey events from SimHub, and
persists per-car setup state. Phase 1 of docs/PLAN.md requires a
durable local store for:
- Sessions — one bounded period of driving on a given car/track.
- Laps — segmented from telemetry; tagged valid/dirty/pit/reset.
- Telemetry snapshots — high-rate parsed packets, ~60 Hz.
- Recommendations — structured engineer output (symptom, mechanism, adjustment, confidence) emitted live and aggregated post-session.
- Car setups — per-car known/locked/current parameter values.
- User preferences — overlay + coach config (style override, snooze, etc.).
- Hotkey events — discrete inputs from SimHub Global Hotkeys (mark dirty / pit / snooze / request feedback / reset).
Constraints
- Single-user desktop app. No multi-tenant, no auth, no network DB.
- Pre-1.0. Schema can change, but every change must ship a migration.
rusqlitewithbundledfeature — pin the SQLite version to what we test against; avoid platform SQLite drift on Windows.- WAL mode. Concurrent reader (overlay query thread) + single writer (telemetry ingest) without contention.
- File path:
<data_dir>/tuning-coach.db(Windows%LOCALAPPDATA%\tuning-coach\tuning-coach.db). - Backup-friendly. Copying the
.dbfile (and.db-wal,.db-shm) while the app is closed must produce a usable backup.
Read patterns
| Reader | Query shape | Latency budget |
|---|---|---|
| Live overlay HUD | “give me the most recent N recommendations for current session” | < 50 ms |
| Heuristics engine | “give me last K seconds of telemetry for current lap” | < 10 ms (in-memory ring is primary; DB is fallback for replay) |
| Post-session report | “all laps + aggregated recommendations for session X” | < 1 s |
| Recorded session replay (Phase 9) | “stream all snapshots for session X in t_ms order” | sequential scan, throughput-bound |
| Setup form | “current setup + locked params for car ordinal C” | < 50 ms |
Write patterns
| Writer | Rate | Burst |
|---|---|---|
| Telemetry ingest | ~60 rows/s during sessions | bounded by packet rate |
| Recommendations | < 1 row/s typical | bursts at lap boundaries |
| Hotkey events | event-driven; tens per session | n/a |
| Car setups / preferences | manual; rare | n/a |
The dominant cost is telemetry. Everything else is rounding error.
Decision
We will use SQLite (via rusqlite with the bundled feature) as the
canonical persistence layer, opened in WAL mode, with the schema below.
For telemetry_snapshots we will use a hybrid layout: store the raw Forza
packet as a BLOB, alongside a small set of indexed/denormalized columns
needed for the most common live and analysis queries (lap_id, t_ms,
speed_mps, rpm, throttle, brake, gear, is_race_on). All other
fields are recovered by re-parsing the blob on demand. This pins our storage
cost close to the raw-blob lower bound while preserving fast lap/time-range
scans and basic filtering without a parse step.
A _migrations table tracks applied schema versions. Migration files live in
sidecar/migrations/NNNN_<name>.sql and are applied in numeric order at
startup, with each migration executed in its own transaction; each migration
is recorded with its SHA-256 to detect tampering.
Schema
All timestamps are UTC ISO-8601 strings (TEXT) for human readability
unless they’re sub-second deltas, in which case they are INTEGER milliseconds.
Wall-clock fields are display-only; elapsed-time fields use monotonic deltas
captured by the ingest loop.
_migrations
Tracks which migration files have been applied.
| Column | Type | Notes |
|---|---|---|
version |
INTEGER PRIMARY KEY |
matches NNNN prefix of migration file |
name |
TEXT NOT NULL |
filename minus prefix and extension |
sha256 |
TEXT NOT NULL |
hex digest of the migration file at apply time |
applied_at |
TEXT NOT NULL |
UTC ISO-8601 |
Behavior:
- On startup, migrations are sorted by
version, applied in a single transaction each, recorded on success. - If a previously applied migration’s on-disk SHA differs, the runner aborts with a clear error (do not silently re-run).
- The runner is idempotent: re-running with no new files is a no-op.
sessions
| Column | Type | Notes |
|---|---|---|
id |
INTEGER PRIMARY KEY AUTOINCREMENT |
|
started_at |
TEXT NOT NULL |
UTC ISO-8601 |
ended_at |
TEXT |
NULL while in progress |
car_ordinal |
INTEGER |
from packet; NULL if unknown at start |
car_class |
INTEGER |
0=D … 7=X |
car_pi |
INTEGER |
performance index |
drivetrain |
INTEGER |
0=FWD, 1=RWD, 2=AWD |
track_id |
TEXT |
resolved name from SimHub list (Phase 11); NULL until then |
game |
TEXT NOT NULL DEFAULT 'forza_motorsport_2023' |
discriminator for future games |
sidecar_version |
TEXT NOT NULL |
semver of producing build, for replay forward-compat |
notes |
TEXT |
user-editable, optional |
Indexes:
idx_sessions_started_at (started_at DESC)— list view sortidx_sessions_car_track (car_ordinal, track_id, started_at DESC)— trend queries
laps
| Column | Type | Notes |
|---|---|---|
id |
INTEGER PRIMARY KEY AUTOINCREMENT |
|
session_id |
INTEGER NOT NULL REFERENCES sessions(id) ON DELETE CASCADE |
|
lap_number |
INTEGER NOT NULL |
from packet (LapNumber); 0 = out lap |
started_t_ms |
INTEGER NOT NULL |
session-relative monotonic ms |
ended_t_ms |
INTEGER |
NULL until lap closes |
time_s |
REAL |
final lap time in seconds; NULL until close |
valid |
INTEGER NOT NULL DEFAULT 1 |
boolean (0/1) |
dirty_reason |
TEXT |
enum string: off_track | contact | grip_anomaly | manual | pit | reset | NULL |
is_pit |
INTEGER NOT NULL DEFAULT 0 |
boolean |
is_reset |
INTEGER NOT NULL DEFAULT 0 |
boolean (rewind detected) |
best_split_s |
REAL |
optional aggregate, populated post-lap |
Indexes:
idx_laps_session_lap (session_id, lap_number)UNIQUE — guard double-insertidx_laps_session_started (session_id, started_t_ms)— time-range joins
telemetry_snapshots
The hot table. Hybrid layout: blob + a thin index of common-query columns.
| Column | Type | Notes |
|---|---|---|
id |
INTEGER PRIMARY KEY AUTOINCREMENT |
|
session_id |
INTEGER NOT NULL REFERENCES sessions(id) ON DELETE CASCADE |
denormalized for session-scoped scans without join |
lap_id |
INTEGER REFERENCES laps(id) ON DELETE SET NULL |
NULL for pre-lap-close packets, backfilled on lap close |
t_ms |
INTEGER NOT NULL |
session-relative monotonic ms; primary time axis |
is_race_on |
INTEGER NOT NULL |
boolean from packet |
speed_mps |
REAL NOT NULL |
from packet |
rpm |
REAL NOT NULL |
CurrentEngineRpm |
throttle |
INTEGER NOT NULL |
0..255 raw |
brake |
INTEGER NOT NULL |
0..255 raw |
gear |
INTEGER NOT NULL |
0=R, 1..N |
packet |
BLOB NOT NULL |
raw Forza Dash packet (311 or 331 bytes), little-endian |
packet_format |
INTEGER NOT NULL DEFAULT 1 |
discriminator: 1=forza_dash_v2_311b, 2=forza_dash_fm2023_331b |
Indexes:
idx_telemetry_session_t (session_id, t_ms)— replay & windowed scansidx_telemetry_lap_t (lap_id, t_ms)WHERElap_id IS NOT NULL(partial index) — per-lap analysis
Notes:
- The blob is the source of truth. The denormalized columns are a cache; any disagreement is a parser bug. Tests will assert round-trip equality on a sampling basis.
- We do not index
t_msalone — all time queries are scoped by session or lap. - Future packet formats (Sled-only, other games) are stored as new
packet_formatvalues with their own raw bytes.
recommendations
| Column | Type | Notes |
|---|---|---|
id |
INTEGER PRIMARY KEY AUTOINCREMENT |
|
session_id |
INTEGER NOT NULL REFERENCES sessions(id) ON DELETE CASCADE |
|
lap_id |
INTEGER REFERENCES laps(id) ON DELETE SET NULL |
NULL = session-level (e.g., post-session) |
created_at |
TEXT NOT NULL |
UTC ISO-8601 |
category |
TEXT NOT NULL |
enum: tires | gearing | alignment | anti_roll | springs | damping | aero | brakes | differential |
parameter |
TEXT |
specific param key, e.g. spring_rate_front; NULL for multi-param advice |
confidence |
TEXT NOT NULL |
enum: high | medium | low |
delivered |
INTEGER NOT NULL DEFAULT 0 |
1 once shown in HUD |
dismissed |
INTEGER NOT NULL DEFAULT 0 |
user dismissed in HUD |
payload_json |
TEXT NOT NULL |
full structured rec (see race-engineer.agent.md format): symptom, mechanism, adjustment, expected outcome, caveats, style/locked-param notes |
schema_version |
INTEGER NOT NULL DEFAULT 1 |
of the JSON payload, independent of DB schema |
Indexes:
idx_recs_session_created (session_id, created_at DESC)— list viewidx_recs_lap (lap_id)— per-lap rollupidx_recs_category (session_id, category)— “show me all springs advice”
car_setups
One row per car ordinal — current setup state, with snapshot history kept in
payload_json (small enough we don’t need a separate history table yet).
| Column | Type | Notes |
|---|---|---|
car_ordinal |
INTEGER PRIMARY KEY |
from packet |
car_label |
TEXT |
display name; resolved later from SimHub list |
setup_json |
TEXT NOT NULL |
full tune object, keyed by parameter |
locked_params_json |
TEXT NOT NULL |
array of param keys the car can’t adjust |
upgrades_json |
TEXT NOT NULL DEFAULT '{}' |
installed upgrades affecting locked set |
source |
TEXT NOT NULL |
enum: manual | ocr | imported |
updated_at |
TEXT NOT NULL |
UTC ISO-8601 |
schema_version |
INTEGER NOT NULL DEFAULT 1 |
of the JSON shape |
No additional indexes — single-row lookup by PK.
user_preferences
Free-form key-value store for overlay + coach config. Typed keys are validated in code, not in the schema, so adding a preference doesn’t require a migration.
| Column | Type | Notes |
|---|---|---|
key |
TEXT PRIMARY KEY |
dot.separated.key |
value_json |
TEXT NOT NULL |
JSON-encoded scalar/object |
updated_at |
TEXT NOT NULL |
UTC ISO-8601 |
hotkey_events
| Column | Type | Notes |
|---|---|---|
id |
INTEGER PRIMARY KEY AUTOINCREMENT |
|
session_id |
INTEGER REFERENCES sessions(id) ON DELETE CASCADE |
NULL if no active session |
received_at |
TEXT NOT NULL |
UTC ISO-8601 (display) |
t_ms |
INTEGER |
session-relative monotonic ms (NULL if no session) |
action |
TEXT NOT NULL |
enum: mark_dirty | pit_in | pit_out | snooze | request_feedback | reset_session |
payload_json |
TEXT |
optional extra context |
Indexes:
idx_hotkey_session_t (session_id, t_ms)— correlate with telemetry
Connection management
- Single connection used for writes, opened with
journal_mode=WAL,synchronous=NORMAL,foreign_keys=ON,busy_timeout=5000. - Read-only connections via a small pool (
r2d2_sqlite, size 4) for the WS query path. Reads see WAL snapshots so they don’t block the writer. - All schema state is created/upgraded by the migration runner before any reader/writer is exposed.
Pragmas applied at startup
PRAGMA journal_mode = WAL;
PRAGMA synchronous = NORMAL;
PRAGMA foreign_keys = ON;
PRAGMA busy_timeout = 5000;
PRAGMA temp_store = MEMORY;
Consequences
Positive
- One file = one backup. Users can copy
tuning-coach.db(with-wal/-shmsiblings if open) and have everything. - Hybrid telemetry layout keeps storage close to raw-bytes minimum while the few hot columns make the live overlay and lap analysis fast without a parse step.
- Blob is source of truth — adding a new derived field never requires a destructive migration; we just re-parse historical packets.
- Migration runner with SHA pinning catches the classic “someone edited an applied migration” footgun loudly instead of silently corrupting state.
- Foreign keys + cascades make session deletion (a user-facing feature later) a single statement.
- JSON columns for recommendations / setups / preferences absorb iteration during the heuristics phase without DB churn; we promote fields out of JSON if they become query targets.
Negative
- Storage cost. A 1-hour active session is ~71.5 MB of raw packets (~331 B × 60 Hz × 3600 s for FM 2023; ~67.2 MB for legacy 311-byte packets) plus ~10–15 MB for the row overhead and denormalized columns and indexes — call it ~80–87 MB/hour active. A heavy weekend (~10 hours) is ~0.8–0.9 GB. Acceptable for a desktop app; we’ll add a retention setting before 1.0 (open question, deferred).
- Querying non-indexed packet fields requires reparsing. Acceptable: those queries are post-session/replay paths, not hot.
- JSON-in-SQL loses static schema guarantees. Mitigated with strict
serde-typed structs in Rust +
schema_versioncolumns so we can migrate payloads in code. - Two sources of truth for the indexed columns vs the blob. Mitigated by a single write path (parser → row builder → insert) and a sampled round-trip test in CI.
Neutral
- All timestamps are TEXT ISO-8601 except monotonic deltas, which are INTEGER ms. Mixing types is intentional — display vs math.
- We use
INTEGER(0/1) for booleans because that’s what SQLite gives us. - Car/track resolution to human names is a Phase 11 concern; the schema already has the columns, they just sit NULL until then.
Alternatives Considered
A. Telemetry as raw blob only (no denormalized columns)
- Pros: smallest on-disk footprint; one source of truth; no risk of column drift; simplest insert path.
- Cons: every “show me speed over last 30 s” query has to read + parse N blobs. Live overlay would either need a parallel in-memory mirror (which we have anyway, but it has finite history) or pay the parse cost on every request.
- Why not chosen: the parse cost isn’t catastrophic, but it forces every consumer to depend on the parser crate. Hybrid layout costs us ~30% extra storage in exchange for SQL-native queries on the fields that 90% of callers want.
B. Telemetry fully denormalized (one column per packet field, no blob)
- Pros: fully queryable; no parser dependency on the read path; trivial joins.
- Cons: ~80–100 columns per row; ~3–4× storage of the blob layout; every new packet field is a destructive migration; replaying historical sessions through an updated parser is impossible because we threw the bytes away.
- Why not chosen: loses our forward-compat escape hatch. Heuristics will iterate fast; we will want to re-derive new fields from old sessions.
C. DuckDB instead of SQLite
- Pros: columnar, vectorized, fast aggregates over telemetry; great for the post-session report and replay use cases.
- Cons: less mature on Windows desktop; weaker concurrent
reader-while-writing story; embedding story for
rusqlite-style bundled use is less battle-tested; bigger binary; team familiarity with SQLite is higher. - Why not chosen: boring tech wins. If aggregate query perf becomes a bottleneck, we can ETL session blobs into Parquet for offline analytics without changing the canonical store.
D. Flat Parquet files per session
- Pros: ideal for replay scans and external tooling (DuckDB, pandas); excellent compression on telemetry.
- Cons: mutable state (recommendations, setups, preferences) doesn’t fit cleanly; concurrent live append + read requires extra machinery; lose ACID; backup story is per-file not per-DB.
- Why not chosen: wrong shape for the transactional parts of the app. Parquet is on the table as an export format (the existing “on-demand JSON export” decision in PLAN.md can grow a Parquet sibling) but not as the primary store.
E. SQLite with one row per field (EAV)
- Pros: infinitely flexible; no migrations.
- Cons: unusable. ~80× row count, no type safety, no useful indexes.
- Why not chosen: classic anti-pattern.
Migration & evolution plan
- Migrations are files:
sidecar/migrations/0001_initial.sql,0002_<change>.sql, etc. Numeric prefix is monotonic and gap-free. - Every change to the schema (including adding/removing indexes) ships a new migration file; we never edit a previously-applied migration.
- Payload-only changes (
recommendations.payload_json,car_setups.setup_json) bump the row’sschema_versionand are migrated lazily in code on read. - Pre-1.0: if a migration would be too painful to write, we may ship a one-shot “drop & recreate the dev DB” path behind a feature flag. After 1.0, every change is a real migration.
- Rolling back is not supported; the user keeps the previous binary if they want the previous schema. (Single-user desktop app — acceptable.)
Backup story
- App-shutdown backup: copy
tuning-coach.db. WAL has been checkpointed at graceful shutdown; no sidecar files needed. - Live backup: copy
tuning-coach.db,tuning-coach.db-wal,tuning-coach.db-shmtogether, or useVACUUM INTO 'backup.db'from a read connection. Both produce a consistent snapshot. docs/getting-started.mdwill document the file path and the shutdown-then-copy flow as the recommended approach.
Storage estimates
Assumptions: ~331 B per FM 2023 Forza Dash packet (311 B for legacy Dash), ~60 Hz, ~150 B of denormalized columns + row overhead, ~50% index overhead on the indexed columns.
| Workload | Raw bytes | With indexes | Per session (~30 min) | Per heavy week (~10 h) |
|---|---|---|---|---|
| Telemetry only | ~71.5 MB/h | ~85 MB/h | ~43 MB | ~850 MB |
| + recommendations + laps + hotkeys | +<1 MB/h | +<1 MB/h | ~44 MB | ~860 MB |
Order of magnitude: a serious user generates GB-scale data per year. We will introduce a configurable retention policy (open question in PLAN.md) before 1.0, e.g. “keep last N sessions in full; older sessions keep recommendations + laps but drop telemetry.”
Observability
- Migration runner emits structured INFO logs per applied version
(
migration.applied version=NNNN sha=…). - Telemetry insert path exposes a counter (
telemetry_rows_written_total) and a gauge for write-batch latency (telemetry_write_batch_ms). DB file size is a periodic gauge. - Errors are typed (
StorageError::{Migrate, Open, Write, Read, Schema}), mapped to user-facing strings only at the WS / CLI boundary.
References
- Issue #8 — feat(sidecar): SQLite schema
- docs/PLAN.md — Phase 1 sidecar foundation
- .github/agents/telemetry-expert.agent.md — Forza Dash packet schema (232 B Sled / 311 B legacy Dash / 331 B FM 2023 Dash)
- .github/agents/race-engineer.agent.md — recommendation payload shape
- SQLite WAL: https://www.sqlite.org/wal.html
rusqlitebundled feature: https://docs.rs/rusqlite/latest/rusqlite/#bundled