Our Estimation Methodology
When restaurants don't publish nutritional data, we estimate it — and we've measured exactly how accurate those estimates are.
We ran a controlled blind experiment: 60 real dishes from Pret A Manger, Itsu, and Farmer J, each with complete restaurant-reported nutritional data. We asked our AI to estimate those same values — then compared the outputs against the known truth. The result: our single-shot estimation pipeline outperforms every independently tested commercial nutrition app in the published literature.
Sourced data first. Always.
Health Freak isn't an estimation-only platform. Our data hierarchy is:
- Restaurant-published data — If a restaurant reports nutritional values, we use those. Always the most accurate source.
- AI-estimated data — For restaurants that publish nothing or only partial data, we use LLM-based estimation. Every estimated value is clearly labelled.
- Transparent provenance — Every data point carries a tag showing whether it was sourced or estimated.
We estimate because the alternative — leaving users with no nutritional information at all — is worse. But we're honest about what's sourced and what's estimated.
How we tested accuracy
We selected 20 dishes from each of Pret A Manger, Itsu, and Farmer J — chosen because they represent different cuisine types and complexity levels, and all had complete nutritional data to test against. Each dish was run through four scenarios, giving the AI progressively more information:
| Scenario | What the AI was given | What it had to estimate |
|---|---|---|
| S1: Image only | Photo + dish name | Everything |
| S2: + Description | S1 + menu description | Everything |
| S3: + Calories | S2 + reported calories | Macros, salt, sugar, sat fat |
| S4: + Macros | S3 + protein, fat, carbs, fibre | Salt, sugar, sat fat only |
All estimates were generated by GPT-4.1 (OpenAI). A second round using Google's Gemini 2.5 Flash is in progress.
Key findings
Calorie estimation: 84 kcal average error from a photo alone
For a typical restaurant dish of 300–700 kcal, that's an error of 12–28% — enough to reliably distinguish a light salad from a calorie-dense bowl.
| Restaurant | MAE (kcal) | MAPE |
|---|---|---|
| Pret A Manger | 68 | 23% |
| Itsu | 63 | 27% |
| Farmer J | 121 | 36% |
Adding the menu description doesn't help
Providing the menu description (S1 → S2) produced no meaningful accuracy improvement. The AI's visual analysis of the image, combined with the dish name and restaurant context, already captures whatever nutritional signal the description provides.
Knowing calories unlocks accurate macro estimation
The single biggest accuracy improvement comes from providing restaurant-reported calories. Many UK chains are legally required to publish these.
| Nutrient | Without calories (S1) | With calories (S3) | Improvement |
|---|---|---|---|
| Protein | 3.8g MAE | 2.5g MAE | 34% better |
| Fat | 6.7g MAE | 4.2g MAE | 37% better |
| Carbs | 10.5g MAE | 8.9g MAE | 15% better |
| Fibre | 1.9g MAE | 1.4g MAE | 26% better |
Salt, sugars, and saturated fat: an industry-wide challenge
These three fields remain difficult regardless of how much information the model has. Salt MAPE stays around 50% across all scenarios. This isn't a failure of our pipeline — it's a fundamental limitation of estimating from visual and textual cues alone. No commercial nutrition app currently attempts to estimate salt from images.
How we compare to other systems
vs. Commercial nutrition apps
Most comprehensive independent benchmark: Yan et al. (2025), Nature Communications Medicine
| System | Calorie MAE | Notes |
|---|---|---|
| Health Freak | 84 kcal | Image + dish name, GPT-4.1 |
| DietAI24 (research) | 48 kcal | Multi-stage RAG, not commercially available |
| Foodvisor | 168 kcal | Commercial app |
| SnapCalorie | 169 kcal | Uses LIDAR depth sensors |
| ViT baseline | 199 kcal | Trained Vision Transformer |
| Calorie Mama | 277 kcal | Commercial app |
Our error rate is approximately half that of the best commercial apps — achieved using a general-purpose LLM with a well-designed prompt, without custom-trained models, depth sensors, or proprietary food image datasets.
vs. Direct LLM benchmarks
Benchmark: Fridolfsson et al. (2025), Current Developments in Nutrition
| System | Calorie MAPE |
|---|---|
| Health Freak | 29% |
| ChatGPT-4o | 36% |
| Claude 3.5 Sonnet | 36% |
| Gemini 1.5 Pro | 64–110% |
vs. Human estimation
Validation studies using doubly-labelled water show that untrained humans underreport energy intake by 20–50%. Even trained nutrition professionals miss portion-based calorie estimates by approximately 41%. Our 29% MAPE from a standard photograph, without depth sensing, sits comfortably within this range.
What this means for you
- When we show sourced data, it's direct from the restaurant — as accurate as the restaurant's own measurements.
- When we show estimated data, it's clearly labelled, and the estimates are more accurate than any commercially available nutrition app we've benchmarked against.
- For less reliable fields (particularly salt, sugars, saturated fat), our scoring algorithms degrade gracefully — weighting sourced data more heavily and applying wider confidence bands.
We continue to expand our benchmark as new AI models emerge. A second model comparison (Google Gemini 2.5 Flash) is currently in progress, and we are investigating multi-stage estimation pipelines with food composition database integration.
References
- Yan, R. et al. (2025). "DietAI24 as a framework for comprehensive nutrition estimation using multimodal large language models." Communications Medicine, 5, 458.
- Fridolfsson, J. et al. (2025). "Performance Evaluation of 3 Large Language Models for Nutritional Content Estimation from Food Images." Current Developments in Nutrition, 9(10), 107556.
- Chotwanvirat, P. et al. (2024). "Advancements in Using AI for Dietary Assessment Based on Food Images: Scoping Review." Journal of Medical Internet Research, 26, e51432.
- Li, X. et al. (2024). "Evaluating the Quality and Comparative Validity of Manual Food Logging and AI-Enabled Food Image Recognition in Apps for Nutrition Care." Nutrients, 16(15), 2573.
- Azimi, I. et al. (2025). "Evaluation of LLMs accuracy and consistency in the registered dietitian exam through prompt engineering and knowledge retrieval." Scientific Reports, 15, 1506.
Download the full white paper
Complete methodology, per-dish results, statistical tables, and full literature references.
Download Estimation Methodology (PDF)