Search papers from Scopus and then download PDFs by DOI using Unpaywall first, with optional scihub-cli fallback. Includes quantity-aware and latest-aware handling.
Install
npx skillscat add thienrabbit/sci-papers-downloder Install via the SkillsCat registry.
SKILL.md
sci papers downloder
Overview
Use this pipeline:
- Search Scopus to get metadata (title, DOI, year, source, cited-by).
- Download by DOI via Unpaywall first.
- If needed, fallback to
scihub-cli.
One-time setup (persistent env, no per-command params)
Add credentials to ~/.bashrc:
# >>> sci papers downloder env >>>
export ELSEVIER_API_KEY="<your_elsevier_key>"
export UNPAYWALL_EMAIL="<your_unpaywall_email>"
# <<< sci papers downloder env <<<For login/non-interactive shells (e.g. bash -lc), keep the same block in ~/.profile.
Apply now:
source ~/.bashrc
source ~/.profileOptional but recommended:
uv tool install git+https://github.com/Oxidane-bot/scihub-cli.gitIntent mapping: quantity + freshness
This section is the no-context deterministic policy for other agents.
Quantity mapping (Chinese wording)
- "几篇" / "一些" / "几篇就行" ->
--quantity-mode few(target 5) - "一批" / "批量" ->
--quantity-mode batch(target 20) - "尽可能多" / "越多越好" ->
--quantity-mode max(high caps, bounded runtime) - explicit number (e.g. "12 篇") ->
--target 12(overrides quantity mode) - if quantity is not mentioned -> default
--quantity-mode batch
Freshness mapping (latest papers)
- "最新" / "近几年" / "最近" -> add
--latest- auto adds year filter: last 3 years by default
- auto switches sort to
-coverDate
- "最近 N 年" ->
--latest --years-back N - explicit lower year (e.g. "2023年以来") ->
--from-year 2023
Combination rules
- "最新一批" ->
--quantity-mode batch --latest - "最新一些" ->
--quantity-mode few --latest - "最新尽可能多" ->
--quantity-mode max --latest - explicit number + latest (e.g. "最新 8 篇") ->
--target 8 --latest
Priority rules (must follow)
- explicit number (
--target) > quantity keywords - explicit year (
--from-year) > years-back - latest keyword implies date-first ranking (
-coverDate) - if latest is requested and no year is given, use 3-year window
Recommended command (end-to-end)
Use scripts/topic_batch_download.py for search + download in one step.
Standard batch
python3 scripts/topic_batch_download.py --keywords "pedestrian simulation" --quantity-mode batch --outdir ./downloadsLatest batch (recommended for "最新")
python3 scripts/topic_batch_download.py --keywords "pedestrian simulation" --quantity-mode batch --latest --outdir ./downloadsLatest with explicit window
python3 scripts/topic_batch_download.py --keywords "pedestrian simulation" --quantity-mode batch --latest --years-back 2 --outdir ./downloads
python3 scripts/topic_batch_download.py --keywords "pedestrian simulation" --quantity-mode batch --from-year 2023 --outdir ./downloadsExplicit count
python3 scripts/topic_batch_download.py --keywords "pedestrian simulation" --target 12 --latest --outdir ./downloadsAlternative split workflow
Search only
python3 scripts/search_scopus.py --keywords "pedestrian evacuation simulation" --count 20 --sort=-citedby-count
python3 scripts/search_scopus.py --query 'TITLE-ABS-KEY("pedestrian simulation") AND PUBYEAR > 2022' --count 20 --sort=-coverDateDownload by DOI only
python3 scripts/download_open_access.py --doi "10.2307/2392994" --outdir ./downloads --scihub-fallback auto
python3 scripts/download_open_access.py --doi-file ./dois.txt --outdir ./downloads --scihub-fallback autoFallback command resolution
download_open_access.py chooses fallback command in order:
--scihub-cmd- local
scihub-cliinPATH uvx --from git+https://github.com/Oxidane-bot/scihub-cli.git scihub-cli
Output contract
Include:
- query + sort + year filter (
from_year) - total hits + scanned entries + candidate DOI count
- attempted DOI count + downloaded count
- per DOI status/method/path/error
Resources
scripts/search_scopus.py: Scopus query + metadata extractionscripts/download_open_access.py: Unpaywall + fallback downloaderscripts/topic_batch_download.py: quantity-aware and latest-aware end-to-end runner