Fast local Ollama model inference via FGP daemon. Use when user needs local LLM chat, text generation, embeddings, or model management. Triggers on "ollama chat", "local llm", "ollama generate", "ollama embed", "pull model", "run llama locally".
Resources
1Install
npx skillscat add fast-gateway-protocol/fgp-skills/ollama-daemon Install via the SkillsCat registry.
FGP Ollama Daemon
Fast, persistent gateway to local Ollama models. Run LLMs locally with minimal latency overhead between requests.
Why FGP?
FGP daemons maintain persistent connections and avoid cold-start overhead. Instead of spawning a new API client for each request, the daemon stays warm and ready.
Benefits:
- No cold-start latency
- Connection pooling
- Persistent authentication
Installation
# Via Homebrew (recommended)
brew tap fast-gateway-protocol/fgp
brew install fgp-ollama
# Via npx
npx add-skill fgp-ollamaPrerequisites: Ollama must be installed and running.
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Start Ollama server
ollama serveQuick Start
# Start the daemon
fgp start ollama
# Pull a model
fgp call ollama.pull --model "llama3.2"
# Chat
fgp call ollama.chat \
--model "llama3.2" \
--messages '[{"role": "user", "content": "Hello!"}]'
# Generate embeddings
fgp call ollama.embed --model "nomic-embed-text" --input "Hello world"Methods
Chat & Generation
ollama.chat- Chat completionmodel(string, required): Model namemessages(array, required): Chat messagesoptions(object, optional): Generation optionstemperature(float): Sampling temperaturenum_predict(int): Max tokens to generatetop_p(float): Nucleus samplingtop_k(int): Top-k sampling
stream(bool, optional): Enable streaming
ollama.generate- Raw text generationmodel(string, required): Model nameprompt(string, required): Input promptsystem(string, optional): System promptoptions(object, optional): Generation options
Embeddings
ollama.embed- Generate embeddingsmodel(string, required): Embedding model nameinput(string|array, required): Text(s) to embed
Model Management
ollama.list- List available models- No parameters required
ollama.pull- Download a modelmodel(string, required): Model name to pullinsecure(bool, optional): Allow insecure connections
ollama.delete- Remove a modelmodel(string, required): Model name to delete
ollama.show- Get model detailsmodel(string, required): Model name
ollama.copy- Duplicate a modelsource(string, required): Source model namedestination(string, required): New model name
Popular Models
Chat Models
llama3.2- Llama 3.2 (3B) - Fast, good qualityllama3.2:70b- Llama 3.2 (70B) - Best qualitymistral- Mistral 7Bmixtral- Mixtral 8x7Bphi3- Microsoft Phi-3gemma2- Google Gemma 2qwen2.5- Alibaba Qwen 2.5
Code Models
codellama- Code Llamadeepseek-coder- DeepSeek Coderstarcoder2- StarCoder 2
Embedding Models
nomic-embed-text- Fast, high qualitymxbai-embed-large- Large embeddingsall-minilm- Compact, efficient
Vision Models
llava- LLaVA (vision + language)bakllava- BakLLaVA
Configuration
Environment variables:
OLLAMA_HOST(optional): Ollama server URL (default: http://localhost:11434)
Examples
Multi-turn conversation
fgp call ollama.chat \
--model "llama3.2" \
--messages '[
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python function to reverse a string"},
{"role": "assistant", "content": "def reverse_string(s): return s[::-1]"},
{"role": "user", "content": "Now add type hints"}
]'Generate with custom parameters
fgp call ollama.generate \
--model "mistral" \
--prompt "Write a haiku about programming:" \
--system "You are a creative poet." \
--options '{
"temperature": 0.9,
"num_predict": 50,
"top_p": 0.95
}'Batch embeddings
fgp call ollama.embed \
--model "nomic-embed-text" \
--input '["First document", "Second document", "Third document"]'Pull and use a new model
# Pull the model
fgp call ollama.pull --model "deepseek-coder:6.7b"
# Use it for code generation
fgp call ollama.generate \
--model "deepseek-coder:6.7b" \
--prompt "Write a Rust function to calculate fibonacci numbers"Check available models
fgp call ollama.listGet model information
fgp call ollama.show --model "llama3.2"Use with vision model
fgp call ollama.chat \
--model "llava" \
--messages '[{
"role": "user",
"content": "What is in this image?",
"images": ["/path/to/image.jpg"]
}]'Performance Tips
Keep models loaded: Ollama keeps recently used models in memory. Reuse the same model to avoid load times.
Use smaller models for simple tasks:
llama3.2(3B) is often sufficient and much faster than 70B.Batch embeddings: Send multiple texts in one call to
ollama.embedfor better throughput.Adjust num_predict: Set a reasonable
num_predictlimit to avoid unnecessarily long generations.