Ollama Usage Guide: Two Ways to Use Ollama

This guide covers two different ways to interact with Ollama models: the native HTTP API and the OpenAI-compatible API.

Prerequisites

Install Ollama


# Download from https://ollama.ai
# Or use package manager (macOS):
brew install ollama

Pull a model


ollama pull llama3.2

Verify Ollama is running


ollama list  # Should show llama3.2

Method 1: Native Ollama API with `requests.post()`

Best for: Understanding HTTP APIs, Ollama-specific features, raw control

Installation


pip install requests

Text Completion (single question / answer)


import requests

# Using /api/generate for simple completions
response = requests.post(
    'http://localhost:11434/api/generate',
    json={
        'model': 'llama3.2',
        'prompt': 'The capital of France is',
        'stream': False
    }
)

result = response.json()
print(result['response'])

Text Completion Response Format


{
    'model': 'llama3.2',
    'created_at': '2024-01-30T12:00:00.123456Z',
    'response': 'Paris. Paris is the capital...',  # ← Direct string
    'done': True,
    'context': [128006, 882, 128007, ...],  # Token IDs for context
    'total_duration': 1234567890,
    'load_duration': 123456789,
    'prompt_eval_count': 10,
    'prompt_eval_duration': 234567890,
    'eval_count': 25,
    'eval_duration': 345678901
}

# Access: result['response']

Chat (role-based chat history)


import requests

# Using the native /api/chat endpoint
response = requests.post(
    'http://localhost:11434/api/chat',
    json={
        'model': 'llama3.2',
        'messages': [
            {'role': 'system', 'content': 'You are a helpful assistant.'},
            {'role': 'user', 'content': 'What is 2+2?'}
        ],
        'stream': False
    }
)

result = response.json()
print(result['message']['content'])

Chat Response Format


{
    'model': 'llama3.2',
    'created_at': '2024-01-30T12:00:00.123456Z',
    'message': {                    # ← Message object with role
        'role': 'assistant',
        'content': '2+2 equals 4.'
    },
    'done': True,
    'total_duration': 1234567890,      # Nanoseconds
    'load_duration': 123456789,
    'prompt_eval_count': 10,           # Input tokens
    'prompt_eval_duration': 234567890,
    'eval_count': 5,                   # Output tokens
    'eval_duration': 345678901
}

# Access: result['message']['content']

Method 2: OpenAI-Compatible API

Best for: Portability, industry standard, switching between providers

Installation


xxxxxxxxxx
pip install openai

Basic Chat Example


xxxxxxxxxx
from openai import OpenAI

# Create client pointing to Ollama
client = OpenAI(
    base_url='http://localhost:11434/v1',
    api_key='ollama'  # Required by SDK, but not used by Ollama
)

response = client.chat.completions.create(
    model='llama3.2',
    messages=[
        {'role': 'system', 'content': 'You are a helpful assistant.'},
        {'role': 'user', 'content': 'What is 2+2?'}
    ]
)

print(response.choices[0].message.content)

With Parameters


xxxxxxxxxx
from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1',
    api_key='ollama'
)

response = client.chat.completions.create(
    model='llama3.2',
    messages=[
        {'role': 'user', 'content': 'Tell me a creative story'}
    ],
    temperature=0.9,      # Standard OpenAI parameter names
    max_tokens=200,       # Note: max_tokens, not num_predict
    top_p=0.9
)

print(response.choices[0].message.content)

With Tools (Function Calling)


xxxxxxxxxx
from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1',
    api_key='ollama'
)

# Define available tools
tools = [
    {
        'type': 'function',
        'function': {
            'name': 'get_weather',
            'description': 'Get the current weather for a location',
            'parameters': {
                'type': 'object',
                'properties': {
                    'location': {
                        'type': 'string',
                        'description': 'The city and state, e.g. San Francisco, CA'
                    }
                },
                'required': ['location']
            }
        }
    }
]

response = client.chat.completions.create(
    model='llama3.2',
    messages=[
        {'role': 'user', 'content': 'What is the weather in Paris?'}
    ],
    tools=tools
)

message = response.choices[0].message

# Check if model wants to call a tool
if message.tool_calls:
    tool_call = message.tool_calls[0]
    print(f"Model wants to call: {tool_call.function.name}")
    print(f"With arguments: {tool_call.function.arguments}")
else:
    print(message.content)

Using LangChain's bind_tools (Alternative)

LangChain provides a more convenient way to work with tools using bind_tools():


xxxxxxxxxx
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool

# Define tools using @tool decorator
@tool
def get_weather(location: str) -> str:
    """Get the current weather for a location."""
    return f"The weather in {location} is sunny and 72°F"

@tool
def calculate(expression: str) -> str:
    """Calculate a mathematical expression."""
    try:
        result = eval(expression)
        return str(result)
    except:
        return "Error in calculation"

# Create LLM pointing to Ollama
llm = ChatOpenAI(
    base_url='http://localhost:11434/v1',
    api_key='ollama',
    model='llama3.1',  # Use a model with good tool support
    temperature=0
)

# Bind tools to the model
llm_with_tools = llm.bind_tools([get_weather, calculate])

# Invoke
response = llm_with_tools.invoke("What's the weather in Paris?")

# Check for tool calls
if response.tool_calls:
    print("Tool calls detected:")
    for tool_call in response.tool_calls:
        print(f"  Tool: {tool_call['name']}")
        print(f"  Args: {tool_call['args']}")
        
        # Execute the tool
        if tool_call['name'] == 'get_weather':
            result = get_weather.invoke(tool_call['args'])
            print(f"  Result: {result}")
        elif tool_call['name'] == 'calculate':
            result = calculate.invoke(tool_call['args'])
            print(f"  Result: {result}")
else:
    print("Direct response:", response.content)

Note on model compatibility: Not all Ollama models support tool calling well. Models with good tool support include:

✅ llama3.1 (8B, 70B, 405B)
✅ mistral (7B v0.3+)
✅ mixtral (8x7B, 8x22B)
✅ qwen2.5 (various sizes)

Smaller or older models may not follow tool calling instructions reliably. Use temperature=0 for more consistent tool calling behavior.

Response Format


xxxxxxxxxx
# OpenAI-standard format
{
    'choices': [
        {
            'message': {
                'role': 'assistant',
                'content': '2+2 equals 4.'
            },
            'finish_reason': 'stop',
            'index': 0
        }
    ],
    'model': 'llama3.2',
    'usage': {
        'prompt_tokens': 10,
        'completion_tokens': 5,
        'total_tokens': 15
    },
    'created': 1234567890
}

Portability Example


xxxxxxxxxx
from openai import OpenAI

def chat_with_model(base_url, api_key, model, prompt):
    """Same code works with Ollama, OpenAI, Together.ai, etc."""
    client = OpenAI(base_url=base_url, api_key=api_key)
    
    response = client.chat.completions.create(
        model=model,
        messages=[{'role': 'user', 'content': prompt}]
    )
    
    return response.choices[0].message.content

# Use with Ollama (local, free)
result = chat_with_model(
    'http://localhost:11434/v1',
    'ollama',
    'llama3.2',
    'What is AI?'
)

# Use with OpenAI (cloud, paid) - SAME CODE!
result = chat_with_model(
    'https://api.openai.com/v1',
    'sk-your-key-here',
    'gpt-4o-mini',
    'What is AI?'
)

# Use with Together.ai (cloud, free tier) - SAME CODE!
result = chat_with_model(
    'https://api.together.xyz/v1',
    'your-together-key',
    'meta-llama/Llama-3.2-3B-Instruct-Turbo',
    'What is AI?'
)

Comparison Table

Feature	`requests.post()`	OpenAI SDK
Installation	`pip install requests`	`pip install openai`
Endpoint	`localhost:11434/api/*`	`localhost:11434/v1/*`
Code complexity	Medium (manual HTTP)	Low (standard SDK)
Response format	Ollama-specific	OpenAI-standard
Portability	❌ Ollama-only	✅ Works everywhere
Model management	✅ Full access	❌ Chat only
Embeddings	✅ `/api/embeddings`	✅ `/v1/embeddings`
Best for	Learning HTTP APIs	Production, portability

Complete Working Examples

Example 1: Simple Q&A (Both Methods)


xxxxxxxxxx
# Method 1: requests.post
import requests
response = requests.post(
    'http://localhost:11434/api/chat',
    json={
        'model': 'llama3.2',
        'messages': [{'role': 'user', 'content': 'What is Python?'}],
        'stream': False
    }
)
print("requests:", response.json()['message']['content'])

# Method 2: OpenAI SDK
from openai import OpenAI
client = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')
response = client.chat.completions.create(
    model='llama3.2',
    messages=[{'role': 'user', 'content': 'What is Python?'}]
)
print("OpenAI:", response.choices[0].message.content)

Example 2: Multi-turn Conversation


xxxxxxxxxx
# Using OpenAI SDK (easiest for conversations)
from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1',
    api_key='ollama'
)

messages = []

# Turn 1
messages.append({'role': 'user', 'content': 'Hi! My name is Alice.'})
response = client.chat.completions.create(model='llama3.2', messages=messages)
assistant_msg = response.choices[0].message.content
messages.append({'role': 'assistant', 'content': assistant_msg})
print(f"Assistant: {assistant_msg}")

# Turn 2
messages.append({'role': 'user', 'content': 'What is my name?'})
response = client.chat.completions.create(model='llama3.2', messages=messages)
assistant_msg = response.choices[0].message.content
print(f"Assistant: {assistant_msg}")
---

## Recommendations

### For Learning
- **Start with**: `requests.post()` to understand HTTP APIs
- **Then use**: OpenAI SDK for industry-standard patterns

### For Projects
- **Use OpenAI SDK** if you want portable code (easy to switch providers)
- **Use requests** if you need very fine-grained control or model management

### For Teaching
1. Week 1: Show `requests.post()` - demystify APIs
2. Week 2: Teach OpenAI SDK - demonstrate portability and industry standard

---

## Troubleshooting

### Ollama not running
```bash
# Check if Ollama is running
curl http://localhost:11434

# Start Ollama service (if needed)
ollama serve

Model not found


xxxxxxxxxx
# List available models
ollama list

# Pull the model you need
ollama pull llama3.2

Connection refused


xxxxxxxxxx
# Make sure you're using the right port
# Native API: http://localhost:11434/api/*
# OpenAI-compatible: http://localhost:11434/v1/*

Import errors


xxxxxxxxxx
# Install the right package for your method
pip install requests      # For Method 1
pip install openai        # For Method 2

Additional Resources

Ollama Documentation: https://github.com/ollama/ollama/blob/main/docs/api.md
OpenAI API Documentation: https://platform.openai.com/docs/api-reference
Available Models: https://ollama.ai/library

Summary

Both methods accomplish the same goal but with different tradeoffs:

requests.post(): Raw HTTP, educational, maximum control, model management
OpenAI SDK: Industry standard, portable, production-ready

Choose based on your needs: learning → requests, production/portability → OpenAI SDK.

Ollama Usage Guide: Two Ways to Use Ollama

Prerequisites

Method 1: Native Ollama API with requests.post()

Installation

Text Completion (single question / answer)

Text Completion Response Format

Chat (role-based chat history)

Chat Response Format

Method 2: OpenAI-Compatible API

Installation

Basic Chat Example

With Parameters

With Tools (Function Calling)

Using LangChain's bind_tools (Alternative)

Response Format

Portability Example

Comparison Table

Complete Working Examples

Example 1: Simple Q&A (Both Methods)

Example 2: Multi-turn Conversation

Model not found

Connection refused

Import errors

Additional Resources

Summary

Method 1: Native Ollama API with `requests.post()`