openfoodfacts

"Skill for investigating food product data using the Open Food Facts database. Use this skill when the user wants to analyse food products at scale — Nutri-Score distributions, NOVA ultra-processed food rates, additive frequencies, brand comparisons, or European cross-country nutrition patterns. SHOWCASE TRIGGER: when the user says 'showcase [brand]' or 'show me [brand]' generate a full HTML brand profile page. Triggers on: Open Food Facts, OFF, food.parquet, DuckDB food data, Nutri-Score, NOVA groups, ultra-processed foods, food labelling, E-numbers, food additives, European food data, food product databases, food transparency, Dataharvest workshop, food journalism, showcase, brand profile, brand dashboard. Primary method: DuckDB queries on a local food.parquet file (4.5M products, sub-second queries). Secondary: live API for individual product lookups."

linksmith 0 Updated 1mo ago

Resources

GitHub

Install

npx skillscat add linksmith/openfoodfacts-skill

Install via the SkillsCat registry.

SKILL.md

Purpose

Turn an AI agent into a food data investigative partner that analyses the
Open Food Facts database at scale — distribution patterns across millions
of products, brand comparisons, additive inventories, and cross-country
nutrition stories — all using a pre-downloaded local parquet file and
DuckDB for sub-second queries.

Open Food Facts is the world's largest open food product database:
4.5 million products, 150 countries, licensed under the Open Database
License (ODbL). Every output you produce must include attribution.

Parquet file

The full Open Food Facts parquet lives in data/ in the project root
(not inside the skill directory).

File	Contents	Approx. size
`data/food.parquet`	Full dataset (4.5M+ products, 150 countries)	~7.5 GB

Loading data:

from off_parquet import OFFParquet

off = OFFParquet("data/food.parquet")
# ✓ Connected: food.parquet
# ✓ 4,490,000 products · struct nutriments
# ✓ Data: Open Food Facts (openfoodfacts.org), ODbL v1.0

Country filtering is done at query time using DuckDB's list_contains:

# All French products
off.query("SELECT * FROM food WHERE list_contains(countries_tags, 'en:france') LIMIT 10")

# Built-in methods also accept a country filter
df = off.nova_distribution(country="en:france")
df = off.top_additives(country="en:netherlands", category="en:breakfast-cereals")

Available country-split data files

In addition to the full food.parquet, the dataset is split into smaller
per-country files hosted on Hetzner Object Storage and available locally
on workshop VMs at ~/data/openfoodfacts/.

Use a country-specific file when the workshop focuses on a single country
or a small set of countries — queries run faster and require less RAM.

Always check data density before running analysis. Countries with
fewer than 5,000 products have limited Open Food Facts coverage; results
may not be statistically representative. Flag this in any published output.

File registry

Row counts below are populated after running scripts/split_parquet.py.
Until then, use off.info() on each file to check coverage.
The manifest.json file in the same directory has exact counts and sizes.

File	Region	Label	Rows (approx.)	Notes
`food.parquet`	Global	Full dataset	~4.5M	All countries
`food_eu_all.parquet`	EU combined	EU-27 combined	~3–4M	Deduped union of all EU countries
`food_us.parquet`	Americas	United States	~200K
`food_france.parquet`	EU	France	~1.5–2M	Largest OFF dataset
`food_germany.parquet`	EU	Germany	~200–400K
`food_united-kingdom.parquet`	Non-EU Europe	United Kingdom	~150–300K
`food_spain.parquet`	EU	Spain	~100–200K
`food_italy.parquet`	EU	Italy	~100–200K
`food_netherlands.parquet`	EU	Netherlands	~80–150K
`food_belgium.parquet`	EU	Belgium	~60–120K
`food_switzerland.parquet`	Non-EU Europe	Switzerland	~50–100K
`food_sweden.parquet`	EU	Sweden	~30–80K
`food_austria.parquet`	EU	Austria	~30–60K
`food_poland.parquet`	EU	Poland	~30–60K
`food_denmark.parquet`	EU	Denmark	~20–50K
`food_portugal.parquet`	EU	Portugal	~20–50K
`food_norway.parquet`	Non-EU Europe	Norway	~15–40K
`food_czech-republic.parquet`	EU	Czech Republic	~15–30K
`food_greece.parquet`	EU	Greece	~10–25K
`food_romania.parquet`	EU	Romania	~10–20K
`food_hungary.parquet`	EU	Hungary	~10–20K
`food_ireland.parquet`	EU	Ireland	~8–20K
`food_turkey.parquet`	Non-EU Europe	Turkey	~8–15K
`food_finland.parquet`	EU	Finland	~8–15K
`food_russia.parquet`	Non-EU Europe	Russia	~5–10K
`food_ukraine.parquet`	Non-EU Europe	Ukraine	~5–10K
`food_slovakia.parquet`	EU	Slovakia	~5–10K
`food_croatia.parquet`	EU	Croatia	~5–10K
`food_bulgaria.parquet`	EU	Bulgaria	~5–10K
`food_serbia.parquet`	Non-EU Europe	Serbia	~3–8K	⚠ may be sparse
`food_slovenia.parquet`	EU	Slovenia	~3–8K	⚠ may be sparse
`food_lithuania.parquet`	EU	Lithuania	~3–8K	⚠ may be sparse
`food_latvia.parquet`	EU	Latvia	~2–6K	⚠ sparse
`food_estonia.parquet`	EU	Estonia	~2–5K	⚠ sparse
`food_belarus.parquet`	Non-EU Europe	Belarus	~2–5K	⚠ sparse
`food_iceland.parquet`	Non-EU Europe	Iceland	~1–4K	⚠ sparse
`food_luxembourg.parquet`	EU	Luxembourg	~1–4K	⚠ sparse
`food_cyprus.parquet`	EU	Cyprus	~1–3K	⚠ sparse
`food_malta.parquet`	EU	Malta	~500–2K	⚠ very sparse
`food_albania.parquet`	Non-EU Europe	Albania	<1K	⚠ very sparse
`food_moldova.parquet`	Non-EU Europe	Moldova	<1K	⚠ very sparse
`food_north-macedonia.parquet`	Non-EU Europe	North Macedonia	<1K	⚠ very sparse
`food_bosnia-and-herzegovina.parquet`	Non-EU Europe	Bosnia and Herzegovina	<1K	⚠ very sparse
`food_liechtenstein.parquet`	Non-EU Europe	Liechtenstein	<500	⚠ very sparse
`food_andorra.parquet`	Non-EU Europe	Andorra	<500	⚠ very sparse
`food_monaco.parquet`	Non-EU Europe	Monaco	<500	⚠ very sparse
`food_san-marino.parquet`	Non-EU Europe	San Marino	<100	⚠ nearly empty
`food_montenegro.parquet`	Non-EU Europe	Montenegro	<100	⚠ nearly empty
`food_kosovo.parquet`	Non-EU Europe	Kosovo	<100	⚠ nearly empty
`food_armenia.parquet`	Non-EU Europe	Armenia	<100	⚠ nearly empty
`food_azerbaijan.parquet`	Non-EU Europe	Azerbaijan	<100	⚠ nearly empty
`food_georgia.parquet`	Non-EU Europe	Georgia	<100	⚠ nearly empty
`food_vatican-city.parquet`	Non-EU Europe	Vatican City	~0	⚠ no data

Using country-specific files

from off_parquet import OFFParquet, COUNTRY_FILES, load_manifest

# Load a country file directly
off_fr = OFFParquet("data/food_france.parquet")

# Check available files and their row counts
manifest = load_manifest("data")  # reads data/manifest.json
for slug, info in manifest.items():
    print(f"{slug}: {info['rows']:,} rows  ({info['size_bytes']/1e6:.1f} MB)")

# Warn if data coverage is low (< 5,000 rows)
off_mt = OFFParquet("data/food_malta.parquet")
off_mt.warn_if_thin("malta", "data")
# ⚠ WARNING: 'malta' has only 847 products in Open Food Facts ...

# Look up a file path by country slug
france_file = COUNTRY_FILES["france"]   # → "food_france.parquet"
uk_file     = COUNTRY_FILES["united-kingdom"]  # → "food_united-kingdom.parquet"
eu_file     = COUNTRY_FILES["eu_all"]   # → "food_eu_all.parquet"

Low-data guideline

Before running any country-specific analysis:

Check the row count in the manifest table above
If rows < 5,000: tell the user the count and explain that results may not be representative
Suggest food_eu_all.parquet or food.parquet for cross-EU context
Always include the row count as a caveat in the output

S3 public download URLs

Files are publicly readable (no authentication required):

{OFF_S3_ENDPOINT}/{OFF_S3_BUCKET}/food_france.parquet
{OFF_S3_ENDPOINT}/{OFF_S3_BUCKET}/food_eu_all.parquet
{OFF_S3_ENDPOINT}/{OFF_S3_BUCKET}/manifest.json

Example (with default bucket):

https://nbg1.your-objectstorage.com/openfoodfacts-dataharvest-2026/food_france.parquet

The OFF_S3_ENDPOINT and OFF_S3_BUCKET environment variables are set
on workshop VMs via cloud-config and available in ~/.config/dev/env.sh.

🎯 Showcase trigger — fast-track brand analysis

Trigger pattern: any prompt containing showcase + a food brand name,
or variations like "show me [brand]", "brand profile for [brand]",
"demo [brand]", "create a dashboard for [brand]".

When you detect this pattern, follow these steps exactly and do not
ask clarifying questions unless the brand name is completely ambiguous:

Step-by-step

1. Extract the brand name from the prompt.

2. Choose the parquet file. Use data/food.parquet (the full dataset).
Filter by country at query time using list_contains(countries_tags, 'en:france').

3. Run the showcase generator:

import sys
sys.path.insert(0, "path/to/skill/scripts")  # adjust to actual skill location
from brand_showcase import generate_brand_showcase

path = generate_brand_showcase(
    brand="[brand name from prompt]",
    parquet_path="data/food.parquet",
    output_dir="output",
    open_browser=True,       # opens in browser automatically
)
print(f"Saved to: {path}")

4. Report the key findings to the user in this format:

✅ Brand showcase generated: output/[slug]_showcase.html

Key findings for [Brand]:
• [X] products in the [eu/full] dataset
• [Y]% of products are ultra-processed (NOVA 4)
• Most common Nutri-Score: [grade]  ([ns_coverage]% of products scored)
• Avg. sugar: [X]g/100g vs. [Y]g/100g for [category] average
• [N] products contain additives of concern: [list top 1-2]

⚠️ Caveat: [brief data completeness note]

5. If the brand returns 0 products, try:

A shorter brand name variant (e.g. "Kellogg" instead of "Kellogg's")
The "full" parquet if using a subset
Suggest similar brand names

What the showcase generates

The HTML page includes 8 visualisations and tables:

Section	What it shows
Hero stats	Product count · countries · % ultra-processed · avg sugar
Nutri-Score distribution	Stacked bar A→E with official colours
NOVA distribution	Doughnut chart, NOVA 4 highlighted red
Nutrition vs. category avg	Grouped bar: brand vs. category average (sugars, fat, salt, proteins)
Top categories	Horizontal bar — what kinds of products the brand makes
Additives	Table with E-numbers, flagging controversial ones (⚠️)
Best products	Table of Nutri-Score A/B products
Worst products	Table of Nutri-Score D/E products

Output file: output/{brand_slug}_showcase.html (self-contained, no server needed).

What makes this journalistically interesting

The three most surprising findings that OFF data reliably produces for major brands:

NOVA 4 rate — Major food brands often have 50–80% of their range
classified as ultra-processed, even when they market products as "natural".
This is the headline number.
Nutri-Score vs. processing level mismatch — A Nutri-Score B product
can still be NOVA 4. Research shows 51% of B-grade products are
ultra-processed. The showcase surfaces this tension.
Additive fingerprint — Each brand has a characteristic set of
additives. Comparing the E-numbers across brands in the same category
reveals very different food engineering approaches.

Two data access modes

Mode	When to use	Tool
Parquet + DuckDB (primary)	Any population-level analysis, distribution, aggregation, comparison	`scripts/off_parquet.py`
REST API (secondary)	Individual product lookup by barcode, real-time data	`scripts/off_client.py`

Default to parquet mode. The API is rate-limited (10 searches/min),
returns at most a few hundred products per query, and cannot answer
population-level questions. Parquet queries run against all 4.5M products
in under one second.

Quick-reference: files to read

Need	File
Story templates and DuckDB query patterns	`references/story-recipes.md`
Nutri-Score methodology, NOVA classification	`references/nutriscore-nova-reference.md`
API endpoint details (for individual lookups only)	`references/api-guide.md`

Parquet workflow (primary)

Step 1: Locate the parquet file

The workshop VMs have food.parquet pre-downloaded. Look for it in:

The working directory: ./food.parquet
./data/food.parquet
The user's home directory: ~/food.parquet

The OFFParquet class auto-detects files in data/. If it raises
FileNotFoundError, run the setup script.

One-command setup:

# Install dependencies
pip install duckdb pandas requests tqdm

# Download food.parquet (~7.5 GB, ~10 min on 100 Mbit)
python scripts/download_data.py

Check status:

python scripts/download_data.py --status

VM provisioning script (run during VM setup, before workshop):

pip install duckdb pandas matplotlib plotly requests tqdm
python scripts/download_data.py   # ~10 min download

Step 2: Install dependencies

# Core (always required for parquet mode)
pip install duckdb pandas

# Visualisation (install when the user wants charts)
pip install matplotlib plotly

# For download only
pip install requests tqdm huggingface_hub

Step 3: Import and connect

import sys
sys.path.insert(0, "/path/to/skill/scripts")  # adjust to actual skill location
from off_parquet import OFFParquet

off = OFFParquet("data/food.parquet")
# ✓ Connected: food.parquet
# ✓ 4,490,000 products · struct nutriments
# ✓ Data: Open Food Facts (openfoodfacts.org), ODbL v1.0

Step 4: Explore the schema first

Always start an investigation by inspecting the schema and running info().
The parquet has ~180 columns; most analyses use a small subset.

# Full column list with types
print(off.schema())

# Quick dataset overview
off.info()

# Sample 3 products to understand data shape
print(off.sample(3))

# Country coverage
print(off.country_coverage(top_n=15))

Step 5: Choose your query approach

Investigation	Method
NOVA ultra-processed distribution	`off.nova_distribution(country, category)`
NOVA by category (which foods are most processed?)	`off.nova_by_category(country)`
Nutri-Score distribution	`off.nutriscore_distribution(country, category, brand)`
Cross-country Nutri-Score comparison	`off.nutriscore_by_country(countries, category)`
Nutri-Score coverage gaps	`off.nutriscore_gaps(country)`
Most common additives	`off.top_additives(country, category)`
Products containing a specific additive	`off.products_with_additive("en:e171", country)`
Additive across countries	`off.additive_country_comparison("en:e171")`
Brand comparison	`off.brand_comparison(["Nestlé", "Danone"], country)`
Organic vs conventional	`off.label_nutrition_comparison("en:organic", category)`
Product search by name	`off.search("chocolate", country, category)`
Custom analysis	`off.query("SELECT ... FROM food WHERE ...")`
Export results	`off.export_subset(sql, "results.parquet")`

Step 6: Run the analysis

All methods return pandas DataFrames. For custom analyses, use off.query(sql).

Key DuckDB patterns for food data:

# Filter by country (tags are arrays)
off.query("""
    SELECT product_name, nutriscore_grade
    FROM food
    WHERE list_contains(countries_tags, 'en:france')
    LIMIT 100
""")

# Count products per country
off.query("""
    SELECT UNNEST(countries_tags) AS country, COUNT(*) AS n
    FROM food
    GROUP BY 1 ORDER BY 2 DESC LIMIT 20
""")

# Most common additives in French sodas
off.query("""
    SELECT UNNEST(additives_tags) AS additive, COUNT(*) AS n
    FROM food
    WHERE list_contains(countries_tags, 'en:france')
      AND list_contains(categories_tags, 'en:sodas')
    GROUP BY 1 ORDER BY 2 DESC LIMIT 20
""")

# Brand comparison: Nutri-Score and NOVA for Nestlé products
off.query("""
    SELECT
        LOWER(nutriscore_grade) AS grade,
        COUNT(*) AS n
    FROM food
    WHERE LOWER(brands) LIKE '%nestlé%'
    GROUP BY 1 ORDER BY 1
""")

# Products added by year (database growth)
off.query("""
    SELECT
        YEAR(to_timestamp(created_t)) AS year,
        COUNT(*) AS added
    FROM food
    WHERE created_t IS NOT NULL
    GROUP BY 1 ORDER BY 1
""")

Step 7: Visualise findings

import matplotlib.pyplot as plt
import pandas as pd

NUTRISCORE_COLORS = {
    'a': '#038141', 'b': '#85BB2F', 'c': '#FECB02',
    'd': '#EE8100', 'e': '#E63E11',
}
NOVA_COLORS = {1: '#4CAF50', 2: '#8BC34A', 3: '#FF9800', 4: '#F44336'}

# Example: NOVA distribution bar chart
df = off.nova_distribution(country="en:france")

fig, ax = plt.subplots(figsize=(9, 5))
colors = [NOVA_COLORS.get(g, '#999') for g in df["nova_group"]]
bars = ax.bar(df["nova_label"], df["pct"], color=colors)

# Journalism-style title: finding, not description
ax.set_title(
    "One in three French food products is ultra-processed (NOVA 4)",
    fontsize=13, fontweight='bold', loc='left'
)
ax.set_ylabel("% of products with NOVA data")
ax.bar_label(bars, labels=[f"{v:.1f}%" for v in df["pct"]], padding=3)

# Add sample size
n = df["count"].sum()
ax.annotate(f"n = {n:,} products with NOVA classification",
            xy=(0, 1.01), xycoords='axes fraction', fontsize=9, color='#555')

# Attribution (required for ODbL)
ax.annotate(
    OFFParquet.attribution(),
    xy=(0, -0.12), xycoords='axes fraction', fontsize=8, color='grey'
)

plt.tight_layout()
plt.savefig('output/nova_france.png', dpi=200, bbox_inches='tight')
plt.show()

Key schema reference

The parquet has ~180 columns. The most useful for investigations:

Identity

Column	Type	Description
`code`	string	Product barcode (EAN-13/UPC)
`product_name`	string	Product name
`brands`	string	Brand(s), comma-separated
`brands_tags`	list[string]	Brand taxonomy tags
`lang`	string	Product language code

Geography

Column	Type	Description
`countries_tags`	list[string]	Countries where sold (e.g. `["en:france"]`)
`stores`	string	Store names
`purchase_places`	string	Where purchased

Classification

Column	Type	Description
`categories_tags`	list[string]	Category taxonomy (e.g. `["en:breakfast-cereals"]`)
`labels_tags`	list[string]	Labels (e.g. `["en:organic", "en:fair-trade"]`)
`packaging_tags`	list[string]	Packaging types

Scores (key for investigations)

Column	Type	Description
`nutriscore_grade`	string	A–E (null if unavailable)
`nutriscore_score`	integer	Underlying numeric score
`nova_group`	integer	1–4 processing level
`ecoscore_grade`	string	Environmental score A–E

Nutrition (in `nutriments` struct)

Access struct fields: nutriments['energy-kcal_100g']
Or with the helper: off.nutriments(country, category)

Field	Description
`energy-kcal_100g`	Energy in kcal per 100g
`fat_100g`	Total fat
`saturated-fat_100g`	Saturated fat
`carbohydrates_100g`	Total carbohydrates
`sugars_100g`	Sugars
`fiber_100g`	Dietary fibre
`proteins_100g`	Proteins
`salt_100g`	Salt

Ingredients & additives

Column	Type	Description
`ingredients_text`	string	Full ingredient list text
`ingredients_tags`	list[string]	Parsed ingredient tags
`additives_tags`	list[string]	E-number tags (e.g. `["en:e330"]`)
`allergens_tags`	list[string]	Allergen tags

Timestamps

Column	Type	Description
`created_t`	integer	Unix timestamp: when product was ENTERED into OFF
`created_datetime`	string	ISO 8601 creation date
`last_modified_t`	integer	Unix timestamp: last edit
`last_modified_datetime`	string	ISO 8601 last modified

⚠ Important for "change over time" angles: created_t shows when
a product was added to OFF, NOT when it was manufactured. The current
values (nutriscore, ingredients, etc.) reflect TODAY's state of the
product entry, not its state when first added.

For true historical comparison, Open Food Facts does NOT maintain annual
snapshots. The per-product revision history is accessible via the API
(?revisions endpoint) but is impractical at scale. Story angles that
DO work with timestamp data: database growth by year, nutriscore adoption
timeline, countries that ramped up submissions, categories that expanded.

Key Open Food Facts data concepts

Nutri-Score (a–e)

A letter grade for nutritional quality used across Europe. a (dark
green) is best, e (dark red) is worst. Based on a points system weighing
negative factors (energy, sugars, saturated fat, salt) against positive
factors (fibre, protein, fruits/vegetables/nuts).

Coverage: ~40–50% of products have a Nutri-Score. It requires complete
nutritional information AND a category assignment. Missing scores are a
finding: report the percentage always.

NOVA groups (1–4)

Food processing classification:

1 — Unprocessed or minimally processed (fresh fruit, rice, eggs)
2 — Processed culinary ingredients (oil, butter, sugar, flour)
3 — Processed foods (canned vegetables, cheese, bread)
4 — Ultra-processed foods (soft drinks, chips, instant noodles)

NOVA 4 is the journalistically interesting category. Research links UPF
consumption to adverse health outcomes. Coverage varies: ~30–40% of products
have NOVA data.

Eco-Score (a–e)

Environmental impact score based on Life Cycle Analysis. Coverage is
lower than Nutri-Score (~20%). Less useful for workshop investigations
unless specifically requested.

Taxonomy tags

All categories, labels, allergens, additives, and countries come as arrays
of taxonomy tags in the format en:breakfast-cereals. The en: prefix
is the language. Use list_contains(column, 'tag') in DuckDB SQL for
filtering.

Key tag namespaces:

Countries: en:france, en:germany, en:netherlands
Categories: en:breakfast-cereals, en:yogurts, en:sodas
Labels: en:organic, en:fair-trade, en:vegan
Additives: en:e171, en:e621, en:e951
Allergens: en:gluten, en:milk, en:nuts

Coverage caveats (always report these)

OFF is not a random sample. It's a crowd-sourced database. France
has ~2M products; smaller countries may have far fewer. Engaged
communities (health-conscious consumers, France, Belgium) are
overrepresented. This affects representativeness for any country analysis.
Score coverage varies. ~40–50% of products have Nutri-Score; ~30–40%
have NOVA. Always report: how many products in your analysis had the
score, and what percentage that is of the total. If 60% are missing, that
itself may be the story.
Brand names are messy. Products from the same brand can appear as
"Nestlé", "nestle", "Nestle S.A." etc. Use LIKE '%nestl%' for fuzzy
matching. Group by brand cautiously.
Categories are nested. A product can be in both en:cereals AND
en:breakfast-cereals. list_contains checks for an exact tag, so be
specific. To find any cereal: use ILIKE '%cereal%' on categories_tags::TEXT.
Additives require ingredient text. NOVA and additive detection
depends on accurate ingredient parsing. Products with incomplete
ingredient text will lack these fields.

Data ethics & attribution

Every analysis, chart, and export must include:

Data: Open Food Facts (openfoodfacts.org), ODbL v1.0

Open Food Facts is a non-profit. If journalists use the data in a
published story, they should credit OFF and optionally link back.

Visualisation

Chart selection

Goal	Chart type
Nutri-Score distribution	Horizontal bar, coloured by grade
NOVA distribution	Stacked or grouped bar (never pie)
Cross-country comparison	Grouped bar, one group per country
Brand comparison	Dot plot or grouped bar
Additive frequency	Horizontal bar, sorted descending
Database growth by year	Line chart or area chart

Journalism chart principles

Title states the finding, not the data. Write "One in three French
products is ultra-processed" — not "NOVA distribution in France".
Show sample size in subtitle: n = 45,203 products with NOVA data.
Always include attribution in every chart footer.
Use colorblind-safe palettes. For Nutri-Score use official colours.
Report missingness. "60% of products lack a Nutri-Score in this
category — the analysis covers the 40% that do."

Nutri-Score official colours

NUTRISCORE_COLORS = {
    'a': '#038141',  # dark green
    'b': '#85BB2F',  # light green
    'c': '#FECB02',  # yellow
    'd': '#EE8100',  # orange
    'e': '#E63E11',  # red
}

NOVA colours

NOVA_COLORS = {
    1: '#4CAF50',  # green — unprocessed
    2: '#8BC34A',  # light green — culinary ingredients
    3: '#FF9800',  # orange — processed
    4: '#F44336',  # red — ultra-processed
}

Output format

Every investigation must produce:

A clear investigation question (the hypothesis being tested)
Complete, runnable Python code — no pseudocode, no placeholders
A finding summary (2–3 sentences on the key insight)
Data caveats (coverage rate, what's missing, representativeness)
ODbL attribution: Data: Open Food Facts (openfoodfacts.org), ODbL v1.0

API mode (secondary — individual lookups only)

Use the REST API only when the user needs real-time data for a specific
product by barcode, or when the parquet file is not available.

sys.path.insert(0, "/path/to/skill/scripts")
from off_client import OFFClient

client = OFFClient(contact_email="workshop@example.com")

# Look up a single product by barcode
product = client.get_product("3017620422003")  # Nutella

# Search (rate-limited: 10 req/min, 6s delay per call)
df = client.search_products("olive oil", country="france", page_size=50)

The API helper handles rate limiting automatically. See references/api-guide.md
for endpoint details and scripts/off_client.py for method documentation.

Do not use the API for population-level analysis. The search endpoint
returns at most a few hundred products per query and is rate-limited to
10 calls/minute. Use the parquet instead.

openfoodfacts

Resources

Install

Purpose

Parquet file

Available country-split data files

File registry

Using country-specific files

Low-data guideline

S3 public download URLs

🎯 Showcase trigger — fast-track brand analysis

Step-by-step

What the showcase generates

What makes this journalistically interesting

Two data access modes

Quick-reference: files to read

Parquet workflow (primary)

Step 1: Locate the parquet file

Step 2: Install dependencies

Step 3: Import and connect

Step 4: Explore the schema first

Step 5: Choose your query approach

Step 6: Run the analysis

Step 7: Visualise findings

Key schema reference

Identity

Geography

Classification

Scores (key for investigations)

Nutrition (in nutriments struct)

Ingredients & additives

Timestamps

Key Open Food Facts data concepts

Nutri-Score (a–e)

NOVA groups (1–4)

Eco-Score (a–e)

Taxonomy tags

Coverage caveats (always report these)

Data ethics & attribution

Visualisation

Chart selection

Journalism chart principles

Nutri-Score official colours

NOVA colours

Output format

API mode (secondary — individual lookups only)

Categories

Install

Recommended Skills

Nutrition (in `nutriments` struct)