Validity Apple Watch

🔬 Wearables in science: How accurate is the Apple Watch really?

A new living systematic review & meta-analysis (82 studies, >430,000 participants) provides one of the most comprehensive evaluations to date.

Here is the bottom line for researchers and physically active individuals:


📊 Key findings

❤️ Heart rate
→ High agreement with criterion methods
→ Small bias (~ -0.27 bpm), but relevant variability (≈ ±7 bpm)
→ Accuracy decreases with movement complexity and intensity

🫁 Blood oxygen saturation (SpO₂)
→ Low average error
→ BUT wide limits of agreement (≈ ±4%)
→ Reduced validity in hypoxic ranges

🫀 Atrial fibrillation detection
→ High specificity (0.91)
→ Moderate sensitivity (0.79)
→ Many inconclusive readings → not negligible in practice

🔥 Energy expenditure
→ Poor validity
→ Errors frequently >20%
→ Not suitable for precise quantification

😴 Sleep
→ Good sleep vs wake detection
→ Weak differentiation of sleep stages

👣 Steps & activity metrics
→ Moderate accuracy
→ Context-dependent error


🧠 Interpretation for practice and research

⚙️ Metric matters
Direct physiological signals (e.g., HR via PPG) outperform derived metrics (e.g., energy expenditure).

🏃 Context matters
Accuracy declines with motion artefacts, intensity, and environmental factors.

🧬 Individual matters
Physiology (e.g., perfusion, skin contact) systematically influences measurement error.


📌 Practical implications

✔️ Useful for longitudinal monitoring and trends
✔️ Applicable for population-level research
⚠️ Limited for clinical decision-making without validation
❌ Not appropriate for precise energy expenditure or VO₂max assessment


📉 Take-home message

Wearables are not inherently valid or invalid.
They are metric-specific tools with context-dependent accuracy.


💬 Discussion point

Where should we currently draw the line between
👉 “good enough for practice”
vs
👉 “valid for science or clinical use”?

Full article: https://pubmed.ncbi.nlm.nih.gov/41513748/