Understand. Patterns from your own data, on your own phone.

Correlations between symptoms, sleep, vitals, weather, cycle, and triggers — computed with classical statistics, on-device, framed descriptively. No predictions, no “optimal ranges,” and no large language model anywhere in Leo.

Math · classical statistics, on-device·AI · none in the app, ever·Output · descriptive, not directive

The math behind the engines

The principle

The math is yours, the answer is yours.

Leo's pattern detection layer doesn't ship a model and run it over you. Every correlation, trend, and cycle-phase cluster is computed on your phone from your own data using non-parametric statistics — Spearman rank correlation, Wilson 95% confidence intervals, Fisher's exact test for small-N contingency tables, Rayleigh for circular clustering, Mann-Kendall + Sen's slope for trends. The result is descriptive: “Migraines happened on 9 of 14 days you logged poor sleep, vs. 3 of 47 other days” — not a black-box score.

The framing matters as much as the math. Pattern cards are written without action language. They describe what your log shows; they don't tell you what to do, suggest a treatment, define an “optimal” range, or predict a future event. Strict sample-size and significance floors (n ≥ 10, p < 0.05) mean a card never surfaces unless the evidence holds up. The next move is yours and your clinician's.

↳Methods · Spearman ρ (correlations) · Wilson 95% CI (proportions) · Fisher's exact (contingency tables) · Rayleigh (cycle clustering) · Mann-Kendall + Sen's slope (trends) · LOESS (smoothing). See /engines for the worked examples.

What it shows

Five kinds of pattern, each in plain language.

Patterns surface in two places: a dedicated My Patterns feed on the home tab (with a 90-day burden heatmap, four pattern types, and per-symptom frequency counts), and inline inside each condition hub — sleep correlations on the sleep hub, trigger lifts on the condition hub, weather correlations on the head-pain hub. The math lives next to the data it's about, but you can also see everything in one place.

Every card carries the same chrome: plain-English headline (“Migraines happened on 9 of 14 days you logged poor sleep”), the underlying counts, a 95% confidence interval, and an expandable “Why this pattern?” row that walks through the math. No verdicts, no recommendations.

Sleep ↔ symptoms

Days following short sleep vs adequate sleep, compared on symptom counts you’ve logged. Reported as a ratio with the sample size visible.

Weather ↔ flares

Pressure drops, humidity swings, and temperature deltas line up with logged flares. Tagged at log time so we don’t guess about the past.

Trigger frequency

Which triggers you’re tagging most often this month. A simple count; useful as a question for your clinician, not a verdict.

Per-trigger lift

When a trigger is present, how much more often symptoms appear in the next 24h vs days without it. Honest about small-sample noise.

Vitals around events

Heart rate / SpO₂ / glucose distributions in the 24h around your bigger logged events. Pulled from HealthKit, never re-measured.

The math

Classical statistics, not a model in a box.

Every number Leo shows can be traced back to a textbook method that someone with a stats minor can reproduce. The /engines page walks through the exact equations; the summary below is what each one does and where we're honest about its limits.

◦ Where the calculation lives

◦ On your phone

All insights run here

Your symptom + vital logs

Spearman, Wilcoxon, two-proportion tests

Rolling windows · sample-size gates

Renders the insight card

no
crossing

◦ Firebase (server)

Storage only · never analyzed

Encrypted at rest (AES-256)

Per-user keys for sensitive fields

Caregiver sync (with your permission)

Cross-device replication of your record

No model trained on your data, ever — by Leo or a vendor.

No LLM anywhere in the user-facing app — for any purpose.

No third-party analytics on your symptom data — period.

Sample size required

A pattern only surfaces after at least 10 logged days (and many cards wait until 30). Below threshold, the card reads 'not enough data yet' instead of showing a noisy result.

Significance floor

Patterns are only surfaced when the underlying test returns p < 0.05. Below that, the card stays hidden — we'd rather miss a borderline finding than spotlight a false one.

Spearman rank correlation

For symptom × factor associations. Non-parametric, so it doesn't assume your data are normally distributed, and tie-corrected for users who pick the same severity over and over.

Wilson 95% confidence interval

Every reported percentage carries a Wilson score interval — accurate even at 0/n or n/n where the normal approximation breaks. So '60% of poor-sleep nights had a migraine' shows as '60% (95% CI 41–77%, n=15)'.

Fisher's exact test

Two-sided p-value on the 2×2 contingency table (symptom × factor). Exact at small N where chi-square is unreliable; matches R's fisher.test.

Rayleigh test

Cycle-phase clustering. Asks 'do these symptom days actually cluster around a phase of the cycle, or are they spread evenly?' Centers the cluster + reports a resultant length.

Mann-Kendall + Sen's slope

Monotonic trend test that doesn't assume linearity. Sen's slope (median of pairwise slopes) gives a robust magnitude estimate even with outliers.

LOESS smoothing

Local polynomial smoothing for the trend overlay on detail charts. Span is tuned per chart so the smooth follows real shape without imposing a straight line.

On-device computation

All of the above runs in Swift on your phone. Nothing about your symptom log is sent to a server to compute a pattern.

↳Privacy · because the math runs on-device, no symptom log leaves your phone for the insight to render. Firestore stores your data encrypted at rest; the analysis doesn't require server-side access to it.

Reading an insight card

Each card shows its work.

Every insight in Leo lays out the headline, the sample size, the direction, the window, and a link to the raw data. We don't show numbers without the context that makes them readable.

◦ Insight · sleep ↔ migraine

Past tense

Migraines were 2.4× more common on days following a sleep window under 6 hours, this month.

Sample

n = 28 days

Direction

↑ up

Window

Last 30 days

Migraine rate per day

After short sleep

0.62

After ≥ 7h sleep

0.26

From your sleep + migraine logsSee raw data

◦ Anatomy of an insight

Past tense headline

Describes what happened. Never 'will' or 'should'.

Sample size, always shown

n = 28 days. If it's small, the card flags it. No theatrical precision.

Direction, not score

Up / down / no change. Not 'good' or 'bad' — your body isn't being graded.

Explicit window

Last 30 days. The window can't silently move when convenient.

Tap-through to raw data

Every claim is one tap away from the underlying logs that built it.

Headline

A descriptive sentence — 'Migraines were 2.4× more common on days following poor sleep this month.' Past tense, no recommendation.

Sample size

How many data points the number is built from. 'n = 47 days' is always visible — small samples are flagged.

Direction

Up / down / no change. Not 'good / bad' — Leo doesn’t score your body.

Time window

Every card states its window (last 14 days, last 30 days, year-to-date). Insights don’t silently move the goalposts.

Where it’s from

Tap-through to the underlying logs / vitals / events so you can see the raw data.

What it isn't

The things our insights layer refuses to do.

Health insights are where most apps cross the line from “trackers” into “medical device, sometimes accidentally.” The list below is the line we're holding — descriptive, bounded, and patient-owned.

◦ Not

Forecasts / predictions

Leo doesn’t predict tomorrow’s migraine, next week’s flare, or your future glucose. Forward-looking numbers were removed during our engine-liability work to keep the surface descriptive.

◦ Not

'Optimal' or 'recommended' ranges

No card frames a number as your personal optimum, target, or recommendation. That framing is reserved for clinicians.

◦ Not

Action prompts ('drink more water', 'go to bed earlier')

Descriptive insights only. No directive language. Generic wellness suggestions belong in /resources or a clinician's mouth — not in your hub.

◦ Not

Any LLM or generative AI inside Leo

Leo doesn't use a large language model anywhere — not for insights, not for chat, not for summaries, not for anything in the user-facing app. Every number on a card came out of classical statistics on your phone.

◦ Not

Cross-user benchmarking

Your numbers are never compared against an aggregate of other Leo users to imply you're 'above average' or 'concerning' relative to a population.

◦ Not

Severity interpretation

Insights don’t say a number is 'normal', 'high', 'concerning', or 'borderline'. They describe what your log shows and let you and your clinician interpret.

◦ Not

Provider-facing 'clinical' framing

The PDF export is your work — a summary of your data. It isn't a clinical assessment, and Leo doesn't sign off as a provider.

↳Leo does not diagnose, does not predict, does not score, does not benchmark against other users, and is not a replacement for a clinician. The insights layer describes — that's the whole job.

Understand

The patterns from your own data — with their work showing.

Classical statistics on-device, descriptive language by policy, and a full breakdown of the math on the /engines page. Bring the result to your clinician.

The math behind the engines

/features/health-reports /features /security