This guide covers two different ways to interact with Ollama models: the native HTTP API and the OpenAI-compatible API.
Install Ollama
# Download from https://ollama.ai# Or use package manager (macOS):brew install ollamaPull a model
ollama pull llama3.2Verify Ollama is running
ollama list # Should show llama3.2requests.post()Best for: Understanding HTTP APIs, Ollama-specific features, raw control
pip install requestsimport requests
# Using /api/generate for simple completionsresponse = requests.post( 'http://localhost:11434/api/generate', json={ 'model': 'llama3.2', 'prompt': 'The capital of France is', 'stream': False })
result = response.json()print(result['response']){ 'model': 'llama3.2', 'created_at': '2024-01-30T12:00:00.123456Z', 'response': 'Paris. Paris is the capital...', # ← Direct string 'done': True, 'context': [128006, 882, 128007, ...], # Token IDs for context 'total_duration': 1234567890, 'load_duration': 123456789, 'prompt_eval_count': 10, 'prompt_eval_duration': 234567890, 'eval_count': 25, 'eval_duration': 345678901}
# Access: result['response']import requests
# Using the native /api/chat endpointresponse = requests.post( 'http://localhost:11434/api/chat', json={ 'model': 'llama3.2', 'messages': [ {'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'What is 2+2?'} ], 'stream': False })
result = response.json()print(result['message']['content']){ 'model': 'llama3.2', 'created_at': '2024-01-30T12:00:00.123456Z', 'message': { # ← Message object with role 'role': 'assistant', 'content': '2+2 equals 4.' }, 'done': True, 'total_duration': 1234567890, # Nanoseconds 'load_duration': 123456789, 'prompt_eval_count': 10, # Input tokens 'prompt_eval_duration': 234567890, 'eval_count': 5, # Output tokens 'eval_duration': 345678901}
# Access: result['message']['content']Best for: Portability, industry standard, switching between providers
xxxxxxxxxxpip install openaixxxxxxxxxxfrom openai import OpenAI
# Create client pointing to Ollamaclient = OpenAI( base_url='http://localhost:11434/v1', api_key='ollama' # Required by SDK, but not used by Ollama)
response = client.chat.completions.create( model='llama3.2', messages=[ {'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'What is 2+2?'} ])
print(response.choices[0].message.content)xxxxxxxxxxfrom openai import OpenAI
client = OpenAI( base_url='http://localhost:11434/v1', api_key='ollama')
response = client.chat.completions.create( model='llama3.2', messages=[ {'role': 'user', 'content': 'Tell me a creative story'} ], temperature=0.9, # Standard OpenAI parameter names max_tokens=200, # Note: max_tokens, not num_predict top_p=0.9)
print(response.choices[0].message.content)xxxxxxxxxxfrom openai import OpenAI
client = OpenAI( base_url='http://localhost:11434/v1', api_key='ollama')
# Define available toolstools = [ { 'type': 'function', 'function': { 'name': 'get_weather', 'description': 'Get the current weather for a location', 'parameters': { 'type': 'object', 'properties': { 'location': { 'type': 'string', 'description': 'The city and state, e.g. San Francisco, CA' } }, 'required': ['location'] } } }]
response = client.chat.completions.create( model='llama3.2', messages=[ {'role': 'user', 'content': 'What is the weather in Paris?'} ], tools=tools)
message = response.choices[0].message
# Check if model wants to call a toolif message.tool_calls: tool_call = message.tool_calls[0] print(f"Model wants to call: {tool_call.function.name}") print(f"With arguments: {tool_call.function.arguments}")else: print(message.content)LangChain provides a more convenient way to work with tools using bind_tools():
xxxxxxxxxxfrom langchain_openai import ChatOpenAIfrom langchain_core.tools import tool
# Define tools using @tool decoratordef get_weather(location: str) -> str: """Get the current weather for a location.""" return f"The weather in {location} is sunny and 72°F"
def calculate(expression: str) -> str: """Calculate a mathematical expression.""" try: result = eval(expression) return str(result) except: return "Error in calculation"
# Create LLM pointing to Ollamallm = ChatOpenAI( base_url='http://localhost:11434/v1', api_key='ollama', model='llama3.1', # Use a model with good tool support temperature=0)
# Bind tools to the modelllm_with_tools = llm.bind_tools([get_weather, calculate])
# Invokeresponse = llm_with_tools.invoke("What's the weather in Paris?")
# Check for tool callsif response.tool_calls: print("Tool calls detected:") for tool_call in response.tool_calls: print(f" Tool: {tool_call['name']}") print(f" Args: {tool_call['args']}") # Execute the tool if tool_call['name'] == 'get_weather': result = get_weather.invoke(tool_call['args']) print(f" Result: {result}") elif tool_call['name'] == 'calculate': result = calculate.invoke(tool_call['args']) print(f" Result: {result}")else: print("Direct response:", response.content)Note on model compatibility: Not all Ollama models support tool calling well. Models with good tool support include:
✅ llama3.1 (8B, 70B, 405B)
✅ mistral (7B v0.3+)
✅ mixtral (8x7B, 8x22B)
✅ qwen2.5 (various sizes)
Smaller or older models may not follow tool calling instructions reliably. Use temperature=0 for more consistent tool calling behavior.
xxxxxxxxxx# OpenAI-standard format{ 'choices': [ { 'message': { 'role': 'assistant', 'content': '2+2 equals 4.' }, 'finish_reason': 'stop', 'index': 0 } ], 'model': 'llama3.2', 'usage': { 'prompt_tokens': 10, 'completion_tokens': 5, 'total_tokens': 15 }, 'created': 1234567890}xxxxxxxxxxfrom openai import OpenAI
def chat_with_model(base_url, api_key, model, prompt): """Same code works with Ollama, OpenAI, Together.ai, etc.""" client = OpenAI(base_url=base_url, api_key=api_key) response = client.chat.completions.create( model=model, messages=[{'role': 'user', 'content': prompt}] ) return response.choices[0].message.content
# Use with Ollama (local, free)result = chat_with_model( 'http://localhost:11434/v1', 'ollama', 'llama3.2', 'What is AI?')
# Use with OpenAI (cloud, paid) - SAME CODE!result = chat_with_model( 'https://api.openai.com/v1', 'sk-your-key-here', 'gpt-4o-mini', 'What is AI?')
# Use with Together.ai (cloud, free tier) - SAME CODE!result = chat_with_model( 'https://api.together.xyz/v1', 'your-together-key', 'meta-llama/Llama-3.2-3B-Instruct-Turbo', 'What is AI?')| Feature | requests.post() | OpenAI SDK |
|---|---|---|
| Installation | pip install requests | pip install openai |
| Endpoint | localhost:11434/api/* | localhost:11434/v1/* |
| Code complexity | Medium (manual HTTP) | Low (standard SDK) |
| Response format | Ollama-specific | OpenAI-standard |
| Portability | ❌ Ollama-only | ✅ Works everywhere |
| Model management | ✅ Full access | ❌ Chat only |
| Embeddings | ✅ /api/embeddings | ✅ /v1/embeddings |
| Best for | Learning HTTP APIs | Production, portability |
xxxxxxxxxx# Method 1: requests.postimport requestsresponse = requests.post( 'http://localhost:11434/api/chat', json={ 'model': 'llama3.2', 'messages': [{'role': 'user', 'content': 'What is Python?'}], 'stream': False })print("requests:", response.json()['message']['content'])
# Method 2: OpenAI SDKfrom openai import OpenAIclient = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')response = client.chat.completions.create( model='llama3.2', messages=[{'role': 'user', 'content': 'What is Python?'}])print("OpenAI:", response.choices[0].message.content)xxxxxxxxxx# Using OpenAI SDK (easiest for conversations)from openai import OpenAI
client = OpenAI( base_url='http://localhost:11434/v1', api_key='ollama')
messages = []
# Turn 1messages.append({'role': 'user', 'content': 'Hi! My name is Alice.'})response = client.chat.completions.create(model='llama3.2', messages=messages)assistant_msg = response.choices[0].message.contentmessages.append({'role': 'assistant', 'content': assistant_msg})print(f"Assistant: {assistant_msg}")
# Turn 2messages.append({'role': 'user', 'content': 'What is my name?'})response = client.chat.completions.create(model='llama3.2', messages=messages)assistant_msg = response.choices[0].message.contentprint(f"Assistant: {assistant_msg}")---
## Recommendations
### For Learning- **Start with**: `requests.post()` to understand HTTP APIs- **Then use**: OpenAI SDK for industry-standard patterns
### For Projects- **Use OpenAI SDK** if you want portable code (easy to switch providers)- **Use requests** if you need very fine-grained control or model management
### For Teaching1. Week 1: Show `requests.post()` - demystify APIs2. Week 2: Teach OpenAI SDK - demonstrate portability and industry standard
---
## Troubleshooting
### Ollama not running```bash# Check if Ollama is runningcurl http://localhost:11434
# Start Ollama service (if needed)ollama servexxxxxxxxxx# List available modelsollama list
# Pull the model you needollama pull llama3.2xxxxxxxxxx# Make sure you're using the right port# Native API: http://localhost:11434/api/*# OpenAI-compatible: http://localhost:11434/v1/*xxxxxxxxxx# Install the right package for your methodpip install requests # For Method 1pip install openai # For Method 2Ollama Documentation: https://github.com/ollama/ollama/blob/main/docs/api.md
OpenAI API Documentation: https://platform.openai.com/docs/api-reference
Available Models: https://ollama.ai/library
Both methods accomplish the same goal but with different tradeoffs:
requests.post(): Raw HTTP, educational, maximum control, model management
OpenAI SDK: Industry standard, portable, production-ready
Choose based on your needs: learning → requests, production/portability → OpenAI SDK.