"Skill for investigating food product data using the Open Food Facts database. Use this skill when the user wants to analyse food products at scale — Nutri-Score distributions, NOVA ultra-processed food rates, additive frequencies, brand comparisons, or European cross-country nutrition patterns. SHOWCASE TRIGGER: when the user says 'showcase [brand]' or 'show me [brand]' generate a full HTML brand profile page. Triggers on: Open Food Facts, OFF, food.parquet, DuckDB food data, Nutri-Score, NOVA groups, ultra-processed foods, food labelling, E-numbers, food additives, European food data, food product databases, food transparency, Dataharvest workshop, food journalism, showcase, brand profile, brand dashboard. Primary method: DuckDB queries on a local food.parquet file (4.5M products, sub-second queries). Secondary: live API for individual product lookups."
Resources
4Install
npx skillscat add linksmith/openfoodfacts-skill Install via the SkillsCat registry.
Purpose
Turn an AI agent into a food data investigative partner that analyses the
Open Food Facts database at scale — distribution patterns across millions
of products, brand comparisons, additive inventories, and cross-country
nutrition stories — all using a pre-downloaded local parquet file and
DuckDB for sub-second queries.
Open Food Facts is the world's largest open food product database:
4.5 million products, 150 countries, licensed under the Open Database
License (ODbL). Every output you produce must include attribution.
Parquet file
The full Open Food Facts parquet lives in data/ in the project root
(not inside the skill directory).
| File | Contents | Approx. size |
|---|---|---|
data/food.parquet |
Full dataset (4.5M+ products, 150 countries) | ~7.5 GB |
Loading data:
from off_parquet import OFFParquet
off = OFFParquet("data/food.parquet")
# ✓ Connected: food.parquet
# ✓ 4,490,000 products · struct nutriments
# ✓ Data: Open Food Facts (openfoodfacts.org), ODbL v1.0Country filtering is done at query time using DuckDB's list_contains:
# All French products
off.query("SELECT * FROM food WHERE list_contains(countries_tags, 'en:france') LIMIT 10")
# Built-in methods also accept a country filter
df = off.nova_distribution(country="en:france")
df = off.top_additives(country="en:netherlands", category="en:breakfast-cereals")Available country-split data files
In addition to the full food.parquet, the dataset is split into smaller
per-country files hosted on Hetzner Object Storage and available locally
on workshop VMs at ~/data/openfoodfacts/.
Use a country-specific file when the workshop focuses on a single country
or a small set of countries — queries run faster and require less RAM.
Always check data density before running analysis. Countries with
fewer than 5,000 products have limited Open Food Facts coverage; results
may not be statistically representative. Flag this in any published output.
File registry
Row counts below are populated after running
scripts/split_parquet.py.
Until then, useoff.info()on each file to check coverage.
Themanifest.jsonfile in the same directory has exact counts and sizes.
| File | Region | Label | Rows (approx.) | Notes |
|---|---|---|---|---|
food.parquet |
Global | Full dataset | ~4.5M | All countries |
food_eu_all.parquet |
EU combined | EU-27 combined | ~3–4M | Deduped union of all EU countries |
food_us.parquet |
Americas | United States | ~200K | |
food_france.parquet |
EU | France | ~1.5–2M | Largest OFF dataset |
food_germany.parquet |
EU | Germany | ~200–400K | |
food_united-kingdom.parquet |
Non-EU Europe | United Kingdom | ~150–300K | |
food_spain.parquet |
EU | Spain | ~100–200K | |
food_italy.parquet |
EU | Italy | ~100–200K | |
food_netherlands.parquet |
EU | Netherlands | ~80–150K | |
food_belgium.parquet |
EU | Belgium | ~60–120K | |
food_switzerland.parquet |
Non-EU Europe | Switzerland | ~50–100K | |
food_sweden.parquet |
EU | Sweden | ~30–80K | |
food_austria.parquet |
EU | Austria | ~30–60K | |
food_poland.parquet |
EU | Poland | ~30–60K | |
food_denmark.parquet |
EU | Denmark | ~20–50K | |
food_portugal.parquet |
EU | Portugal | ~20–50K | |
food_norway.parquet |
Non-EU Europe | Norway | ~15–40K | |
food_czech-republic.parquet |
EU | Czech Republic | ~15–30K | |
food_greece.parquet |
EU | Greece | ~10–25K | |
food_romania.parquet |
EU | Romania | ~10–20K | |
food_hungary.parquet |
EU | Hungary | ~10–20K | |
food_ireland.parquet |
EU | Ireland | ~8–20K | |
food_turkey.parquet |
Non-EU Europe | Turkey | ~8–15K | |
food_finland.parquet |
EU | Finland | ~8–15K | |
food_russia.parquet |
Non-EU Europe | Russia | ~5–10K | |
food_ukraine.parquet |
Non-EU Europe | Ukraine | ~5–10K | |
food_slovakia.parquet |
EU | Slovakia | ~5–10K | |
food_croatia.parquet |
EU | Croatia | ~5–10K | |
food_bulgaria.parquet |
EU | Bulgaria | ~5–10K | |
food_serbia.parquet |
Non-EU Europe | Serbia | ~3–8K | ⚠ may be sparse |
food_slovenia.parquet |
EU | Slovenia | ~3–8K | ⚠ may be sparse |
food_lithuania.parquet |
EU | Lithuania | ~3–8K | ⚠ may be sparse |
food_latvia.parquet |
EU | Latvia | ~2–6K | ⚠ sparse |
food_estonia.parquet |
EU | Estonia | ~2–5K | ⚠ sparse |
food_belarus.parquet |
Non-EU Europe | Belarus | ~2–5K | ⚠ sparse |
food_iceland.parquet |
Non-EU Europe | Iceland | ~1–4K | ⚠ sparse |
food_luxembourg.parquet |
EU | Luxembourg | ~1–4K | ⚠ sparse |
food_cyprus.parquet |
EU | Cyprus | ~1–3K | ⚠ sparse |
food_malta.parquet |
EU | Malta | ~500–2K | ⚠ very sparse |
food_albania.parquet |
Non-EU Europe | Albania | <1K | ⚠ very sparse |
food_moldova.parquet |
Non-EU Europe | Moldova | <1K | ⚠ very sparse |
food_north-macedonia.parquet |
Non-EU Europe | North Macedonia | <1K | ⚠ very sparse |
food_bosnia-and-herzegovina.parquet |
Non-EU Europe | Bosnia and Herzegovina | <1K | ⚠ very sparse |
food_liechtenstein.parquet |
Non-EU Europe | Liechtenstein | <500 | ⚠ very sparse |
food_andorra.parquet |
Non-EU Europe | Andorra | <500 | ⚠ very sparse |
food_monaco.parquet |
Non-EU Europe | Monaco | <500 | ⚠ very sparse |
food_san-marino.parquet |
Non-EU Europe | San Marino | <100 | ⚠ nearly empty |
food_montenegro.parquet |
Non-EU Europe | Montenegro | <100 | ⚠ nearly empty |
food_kosovo.parquet |
Non-EU Europe | Kosovo | <100 | ⚠ nearly empty |
food_armenia.parquet |
Non-EU Europe | Armenia | <100 | ⚠ nearly empty |
food_azerbaijan.parquet |
Non-EU Europe | Azerbaijan | <100 | ⚠ nearly empty |
food_georgia.parquet |
Non-EU Europe | Georgia | <100 | ⚠ nearly empty |
food_vatican-city.parquet |
Non-EU Europe | Vatican City | ~0 | ⚠ no data |
Using country-specific files
from off_parquet import OFFParquet, COUNTRY_FILES, load_manifest
# Load a country file directly
off_fr = OFFParquet("data/food_france.parquet")
# Check available files and their row counts
manifest = load_manifest("data") # reads data/manifest.json
for slug, info in manifest.items():
print(f"{slug}: {info['rows']:,} rows ({info['size_bytes']/1e6:.1f} MB)")
# Warn if data coverage is low (< 5,000 rows)
off_mt = OFFParquet("data/food_malta.parquet")
off_mt.warn_if_thin("malta", "data")
# ⚠ WARNING: 'malta' has only 847 products in Open Food Facts ...
# Look up a file path by country slug
france_file = COUNTRY_FILES["france"] # → "food_france.parquet"
uk_file = COUNTRY_FILES["united-kingdom"] # → "food_united-kingdom.parquet"
eu_file = COUNTRY_FILES["eu_all"] # → "food_eu_all.parquet"Low-data guideline
Before running any country-specific analysis:
- Check the row count in the manifest table above
- If rows < 5,000: tell the user the count and explain that results may not be representative
- Suggest
food_eu_all.parquetorfood.parquetfor cross-EU context - Always include the row count as a caveat in the output
S3 public download URLs
Files are publicly readable (no authentication required):
{OFF_S3_ENDPOINT}/{OFF_S3_BUCKET}/food_france.parquet
{OFF_S3_ENDPOINT}/{OFF_S3_BUCKET}/food_eu_all.parquet
{OFF_S3_ENDPOINT}/{OFF_S3_BUCKET}/manifest.jsonExample (with default bucket):
https://nbg1.your-objectstorage.com/openfoodfacts-dataharvest-2026/food_france.parquetThe OFF_S3_ENDPOINT and OFF_S3_BUCKET environment variables are set
on workshop VMs via cloud-config and available in ~/.config/dev/env.sh.
🎯 Showcase trigger — fast-track brand analysis
Trigger pattern: any prompt containing showcase + a food brand name,
or variations like "show me [brand]", "brand profile for [brand]",
"demo [brand]", "create a dashboard for [brand]".
When you detect this pattern, follow these steps exactly and do not
ask clarifying questions unless the brand name is completely ambiguous:
Step-by-step
1. Extract the brand name from the prompt.
2. Choose the parquet file. Use data/food.parquet (the full dataset).
Filter by country at query time using list_contains(countries_tags, 'en:france').
3. Run the showcase generator:
import sys
sys.path.insert(0, "path/to/skill/scripts") # adjust to actual skill location
from brand_showcase import generate_brand_showcase
path = generate_brand_showcase(
brand="[brand name from prompt]",
parquet_path="data/food.parquet",
output_dir="output",
open_browser=True, # opens in browser automatically
)
print(f"Saved to: {path}")4. Report the key findings to the user in this format:
✅ Brand showcase generated: output/[slug]_showcase.html
Key findings for [Brand]:
• [X] products in the [eu/full] dataset
• [Y]% of products are ultra-processed (NOVA 4)
• Most common Nutri-Score: [grade] ([ns_coverage]% of products scored)
• Avg. sugar: [X]g/100g vs. [Y]g/100g for [category] average
• [N] products contain additives of concern: [list top 1-2]
⚠️ Caveat: [brief data completeness note]5. If the brand returns 0 products, try:
- A shorter brand name variant (e.g. "Kellogg" instead of "Kellogg's")
- The
"full"parquet if using a subset - Suggest similar brand names
What the showcase generates
The HTML page includes 8 visualisations and tables:
| Section | What it shows |
|---|---|
| Hero stats | Product count · countries · % ultra-processed · avg sugar |
| Nutri-Score distribution | Stacked bar A→E with official colours |
| NOVA distribution | Doughnut chart, NOVA 4 highlighted red |
| Nutrition vs. category avg | Grouped bar: brand vs. category average (sugars, fat, salt, proteins) |
| Top categories | Horizontal bar — what kinds of products the brand makes |
| Additives | Table with E-numbers, flagging controversial ones (⚠️) |
| Best products | Table of Nutri-Score A/B products |
| Worst products | Table of Nutri-Score D/E products |
Output file: output/{brand_slug}_showcase.html (self-contained, no server needed).
What makes this journalistically interesting
The three most surprising findings that OFF data reliably produces for major brands:
NOVA 4 rate — Major food brands often have 50–80% of their range
classified as ultra-processed, even when they market products as "natural".
This is the headline number.Nutri-Score vs. processing level mismatch — A Nutri-Score B product
can still be NOVA 4. Research shows 51% of B-grade products are
ultra-processed. The showcase surfaces this tension.Additive fingerprint — Each brand has a characteristic set of
additives. Comparing the E-numbers across brands in the same category
reveals very different food engineering approaches.
Two data access modes
| Mode | When to use | Tool |
|---|---|---|
| Parquet + DuckDB (primary) | Any population-level analysis, distribution, aggregation, comparison | scripts/off_parquet.py |
| REST API (secondary) | Individual product lookup by barcode, real-time data | scripts/off_client.py |
Default to parquet mode. The API is rate-limited (10 searches/min),
returns at most a few hundred products per query, and cannot answer
population-level questions. Parquet queries run against all 4.5M products
in under one second.
Quick-reference: files to read
| Need | File |
|---|---|
| Story templates and DuckDB query patterns | references/story-recipes.md |
| Nutri-Score methodology, NOVA classification | references/nutriscore-nova-reference.md |
| API endpoint details (for individual lookups only) | references/api-guide.md |
Parquet workflow (primary)
Step 1: Locate the parquet file
The workshop VMs have food.parquet pre-downloaded. Look for it in:
- The working directory:
./food.parquet ./data/food.parquet- The user's home directory:
~/food.parquet
The OFFParquet class auto-detects files in data/. If it raisesFileNotFoundError, run the setup script.
One-command setup:
# Install dependencies
pip install duckdb pandas requests tqdm
# Download food.parquet (~7.5 GB, ~10 min on 100 Mbit)
python scripts/download_data.pyCheck status:
python scripts/download_data.py --statusVM provisioning script (run during VM setup, before workshop):
pip install duckdb pandas matplotlib plotly requests tqdm
python scripts/download_data.py # ~10 min downloadStep 2: Install dependencies
# Core (always required for parquet mode)
pip install duckdb pandas
# Visualisation (install when the user wants charts)
pip install matplotlib plotly
# For download only
pip install requests tqdm huggingface_hubStep 3: Import and connect
import sys
sys.path.insert(0, "/path/to/skill/scripts") # adjust to actual skill location
from off_parquet import OFFParquet
off = OFFParquet("data/food.parquet")
# ✓ Connected: food.parquet
# ✓ 4,490,000 products · struct nutriments
# ✓ Data: Open Food Facts (openfoodfacts.org), ODbL v1.0Step 4: Explore the schema first
Always start an investigation by inspecting the schema and running info().
The parquet has ~180 columns; most analyses use a small subset.
# Full column list with types
print(off.schema())
# Quick dataset overview
off.info()
# Sample 3 products to understand data shape
print(off.sample(3))
# Country coverage
print(off.country_coverage(top_n=15))Step 5: Choose your query approach
| Investigation | Method |
|---|---|
| NOVA ultra-processed distribution | off.nova_distribution(country, category) |
| NOVA by category (which foods are most processed?) | off.nova_by_category(country) |
| Nutri-Score distribution | off.nutriscore_distribution(country, category, brand) |
| Cross-country Nutri-Score comparison | off.nutriscore_by_country(countries, category) |
| Nutri-Score coverage gaps | off.nutriscore_gaps(country) |
| Most common additives | off.top_additives(country, category) |
| Products containing a specific additive | off.products_with_additive("en:e171", country) |
| Additive across countries | off.additive_country_comparison("en:e171") |
| Brand comparison | off.brand_comparison(["Nestlé", "Danone"], country) |
| Organic vs conventional | off.label_nutrition_comparison("en:organic", category) |
| Product search by name | off.search("chocolate", country, category) |
| Custom analysis | off.query("SELECT ... FROM food WHERE ...") |
| Export results | off.export_subset(sql, "results.parquet") |
Step 6: Run the analysis
All methods return pandas DataFrames. For custom analyses, use off.query(sql).
Key DuckDB patterns for food data:
# Filter by country (tags are arrays)
off.query("""
SELECT product_name, nutriscore_grade
FROM food
WHERE list_contains(countries_tags, 'en:france')
LIMIT 100
""")
# Count products per country
off.query("""
SELECT UNNEST(countries_tags) AS country, COUNT(*) AS n
FROM food
GROUP BY 1 ORDER BY 2 DESC LIMIT 20
""")
# Most common additives in French sodas
off.query("""
SELECT UNNEST(additives_tags) AS additive, COUNT(*) AS n
FROM food
WHERE list_contains(countries_tags, 'en:france')
AND list_contains(categories_tags, 'en:sodas')
GROUP BY 1 ORDER BY 2 DESC LIMIT 20
""")
# Brand comparison: Nutri-Score and NOVA for Nestlé products
off.query("""
SELECT
LOWER(nutriscore_grade) AS grade,
COUNT(*) AS n
FROM food
WHERE LOWER(brands) LIKE '%nestlé%'
GROUP BY 1 ORDER BY 1
""")
# Products added by year (database growth)
off.query("""
SELECT
YEAR(to_timestamp(created_t)) AS year,
COUNT(*) AS added
FROM food
WHERE created_t IS NOT NULL
GROUP BY 1 ORDER BY 1
""")Step 7: Visualise findings
import matplotlib.pyplot as plt
import pandas as pd
NUTRISCORE_COLORS = {
'a': '#038141', 'b': '#85BB2F', 'c': '#FECB02',
'd': '#EE8100', 'e': '#E63E11',
}
NOVA_COLORS = {1: '#4CAF50', 2: '#8BC34A', 3: '#FF9800', 4: '#F44336'}
# Example: NOVA distribution bar chart
df = off.nova_distribution(country="en:france")
fig, ax = plt.subplots(figsize=(9, 5))
colors = [NOVA_COLORS.get(g, '#999') for g in df["nova_group"]]
bars = ax.bar(df["nova_label"], df["pct"], color=colors)
# Journalism-style title: finding, not description
ax.set_title(
"One in three French food products is ultra-processed (NOVA 4)",
fontsize=13, fontweight='bold', loc='left'
)
ax.set_ylabel("% of products with NOVA data")
ax.bar_label(bars, labels=[f"{v:.1f}%" for v in df["pct"]], padding=3)
# Add sample size
n = df["count"].sum()
ax.annotate(f"n = {n:,} products with NOVA classification",
xy=(0, 1.01), xycoords='axes fraction', fontsize=9, color='#555')
# Attribution (required for ODbL)
ax.annotate(
OFFParquet.attribution(),
xy=(0, -0.12), xycoords='axes fraction', fontsize=8, color='grey'
)
plt.tight_layout()
plt.savefig('output/nova_france.png', dpi=200, bbox_inches='tight')
plt.show()Key schema reference
The parquet has ~180 columns. The most useful for investigations:
Identity
| Column | Type | Description |
|---|---|---|
code |
string | Product barcode (EAN-13/UPC) |
product_name |
string | Product name |
brands |
string | Brand(s), comma-separated |
brands_tags |
list[string] | Brand taxonomy tags |
lang |
string | Product language code |
Geography
| Column | Type | Description |
|---|---|---|
countries_tags |
list[string] | Countries where sold (e.g. ["en:france"]) |
stores |
string | Store names |
purchase_places |
string | Where purchased |
Classification
| Column | Type | Description |
|---|---|---|
categories_tags |
list[string] | Category taxonomy (e.g. ["en:breakfast-cereals"]) |
labels_tags |
list[string] | Labels (e.g. ["en:organic", "en:fair-trade"]) |
packaging_tags |
list[string] | Packaging types |
Scores (key for investigations)
| Column | Type | Description |
|---|---|---|
nutriscore_grade |
string | A–E (null if unavailable) |
nutriscore_score |
integer | Underlying numeric score |
nova_group |
integer | 1–4 processing level |
ecoscore_grade |
string | Environmental score A–E |
Nutrition (in nutriments struct)
Access struct fields: nutriments['energy-kcal_100g']
Or with the helper: off.nutriments(country, category)
| Field | Description |
|---|---|
energy-kcal_100g |
Energy in kcal per 100g |
fat_100g |
Total fat |
saturated-fat_100g |
Saturated fat |
carbohydrates_100g |
Total carbohydrates |
sugars_100g |
Sugars |
fiber_100g |
Dietary fibre |
proteins_100g |
Proteins |
salt_100g |
Salt |
Ingredients & additives
| Column | Type | Description |
|---|---|---|
ingredients_text |
string | Full ingredient list text |
ingredients_tags |
list[string] | Parsed ingredient tags |
additives_tags |
list[string] | E-number tags (e.g. ["en:e330"]) |
allergens_tags |
list[string] | Allergen tags |
Timestamps
| Column | Type | Description |
|---|---|---|
created_t |
integer | Unix timestamp: when product was ENTERED into OFF |
created_datetime |
string | ISO 8601 creation date |
last_modified_t |
integer | Unix timestamp: last edit |
last_modified_datetime |
string | ISO 8601 last modified |
⚠ Important for "change over time" angles: created_t shows when
a product was added to OFF, NOT when it was manufactured. The current
values (nutriscore, ingredients, etc.) reflect TODAY's state of the
product entry, not its state when first added.
For true historical comparison, Open Food Facts does NOT maintain annual
snapshots. The per-product revision history is accessible via the API
(?revisions endpoint) but is impractical at scale. Story angles that
DO work with timestamp data: database growth by year, nutriscore adoption
timeline, countries that ramped up submissions, categories that expanded.
Key Open Food Facts data concepts
Nutri-Score (a–e)
A letter grade for nutritional quality used across Europe. a (dark
green) is best, e (dark red) is worst. Based on a points system weighing
negative factors (energy, sugars, saturated fat, salt) against positive
factors (fibre, protein, fruits/vegetables/nuts).
Coverage: ~40–50% of products have a Nutri-Score. It requires complete
nutritional information AND a category assignment. Missing scores are a
finding: report the percentage always.
NOVA groups (1–4)
Food processing classification:
- 1 — Unprocessed or minimally processed (fresh fruit, rice, eggs)
- 2 — Processed culinary ingredients (oil, butter, sugar, flour)
- 3 — Processed foods (canned vegetables, cheese, bread)
- 4 — Ultra-processed foods (soft drinks, chips, instant noodles)
NOVA 4 is the journalistically interesting category. Research links UPF
consumption to adverse health outcomes. Coverage varies: ~30–40% of products
have NOVA data.
Eco-Score (a–e)
Environmental impact score based on Life Cycle Analysis. Coverage is
lower than Nutri-Score (~20%). Less useful for workshop investigations
unless specifically requested.
Taxonomy tags
All categories, labels, allergens, additives, and countries come as arrays
of taxonomy tags in the format en:breakfast-cereals. The en: prefix
is the language. Use list_contains(column, 'tag') in DuckDB SQL for
filtering.
Key tag namespaces:
- Countries:
en:france,en:germany,en:netherlands - Categories:
en:breakfast-cereals,en:yogurts,en:sodas - Labels:
en:organic,en:fair-trade,en:vegan - Additives:
en:e171,en:e621,en:e951 - Allergens:
en:gluten,en:milk,en:nuts
Coverage caveats (always report these)
OFF is not a random sample. It's a crowd-sourced database. France
has ~2M products; smaller countries may have far fewer. Engaged
communities (health-conscious consumers, France, Belgium) are
overrepresented. This affects representativeness for any country analysis.Score coverage varies. ~40–50% of products have Nutri-Score; ~30–40%
have NOVA. Always report: how many products in your analysis had the
score, and what percentage that is of the total. If 60% are missing, that
itself may be the story.Brand names are messy. Products from the same brand can appear as
"Nestlé", "nestle", "Nestle S.A." etc. UseLIKE '%nestl%'for fuzzy
matching. Group by brand cautiously.Categories are nested. A product can be in both
en:cerealsANDen:breakfast-cereals.list_containschecks for an exact tag, so be
specific. To find any cereal: useILIKE '%cereal%'oncategories_tags::TEXT.Additives require ingredient text. NOVA and additive detection
depends on accurate ingredient parsing. Products with incomplete
ingredient text will lack these fields.
Data ethics & attribution
Every analysis, chart, and export must include:
Data: Open Food Facts (openfoodfacts.org), ODbL v1.0Open Food Facts is a non-profit. If journalists use the data in a
published story, they should credit OFF and optionally link back.
Visualisation
Chart selection
| Goal | Chart type |
|---|---|
| Nutri-Score distribution | Horizontal bar, coloured by grade |
| NOVA distribution | Stacked or grouped bar (never pie) |
| Cross-country comparison | Grouped bar, one group per country |
| Brand comparison | Dot plot or grouped bar |
| Additive frequency | Horizontal bar, sorted descending |
| Database growth by year | Line chart or area chart |
Journalism chart principles
- Title states the finding, not the data. Write "One in three French
products is ultra-processed" — not "NOVA distribution in France". - Show sample size in subtitle:
n = 45,203 products with NOVA data. - Always include attribution in every chart footer.
- Use colorblind-safe palettes. For Nutri-Score use official colours.
- Report missingness. "60% of products lack a Nutri-Score in this
category — the analysis covers the 40% that do."
Nutri-Score official colours
NUTRISCORE_COLORS = {
'a': '#038141', # dark green
'b': '#85BB2F', # light green
'c': '#FECB02', # yellow
'd': '#EE8100', # orange
'e': '#E63E11', # red
}NOVA colours
NOVA_COLORS = {
1: '#4CAF50', # green — unprocessed
2: '#8BC34A', # light green — culinary ingredients
3: '#FF9800', # orange — processed
4: '#F44336', # red — ultra-processed
}Output format
Every investigation must produce:
- A clear investigation question (the hypothesis being tested)
- Complete, runnable Python code — no pseudocode, no placeholders
- A finding summary (2–3 sentences on the key insight)
- Data caveats (coverage rate, what's missing, representativeness)
- ODbL attribution:
Data: Open Food Facts (openfoodfacts.org), ODbL v1.0
API mode (secondary — individual lookups only)
Use the REST API only when the user needs real-time data for a specific
product by barcode, or when the parquet file is not available.
sys.path.insert(0, "/path/to/skill/scripts")
from off_client import OFFClient
client = OFFClient(contact_email="workshop@example.com")
# Look up a single product by barcode
product = client.get_product("3017620422003") # Nutella
# Search (rate-limited: 10 req/min, 6s delay per call)
df = client.search_products("olive oil", country="france", page_size=50)The API helper handles rate limiting automatically. See references/api-guide.md
for endpoint details and scripts/off_client.py for method documentation.
Do not use the API for population-level analysis. The search endpoint
returns at most a few hundred products per query and is rate-limited to
10 calls/minute. Use the parquet instead.