Strategies for Forcing an LLM to Use Tools

Question: I wrote a calculator tool for an agent built in langgraph. However, the LLM keeps trying to do math itself rather than using the tool. How can I discourage it from that?

This is a common challenge with LLMs and tool calling! Here are several strategies to encourage your agent to use the calculator tool instead of doing math itself:

Prompt Engineering Approaches

1. Explicit Instructions in System Prompt Be very direct about when to use the tool:

python


x
system_prompt = """You are a helpful assistant with access to a calculator tool.

CRITICAL: You must ALWAYS use the calculator tool for ANY mathematical operations, including:
- Basic arithmetic (addition, subtraction, multiplication, division)
- Comparisons and numerical evaluations
- Any calculation involving numbers

Never attempt to calculate results yourself. Always call the calculator tool, even for simple operations like 2+2.

After receiving the tool result, present it to the user naturally."""

2. Add Examples in Few-Shot Format Show the desired behavior explicitly:

python


xxxxxxxxxx
system_prompt = """You have access to a calculator tool.

Examples of correct behavior:
User: "What is 15 + 27?"
Assistant: [calls calculator tool with expression="15 + 27"]
Tool result: 42
Assistant: "15 + 27 equals 42"

User: "What's 8 times 9?"
Assistant: [calls calculator tool with expression="8 * 9"]

ALWAYS use the calculator for any math operation."""

LangGraph-Specific Strategies

3. Validation Node Add a node that checks if the LLM skipped the tool when it shouldn't have:

python


xxxxxxxxxx
def validate_response(state):
    """Check if LLM did math without using tool"""
    last_message = state["messages"][-1]
    
    # Check if response contains numbers but no tool call was made
    if has_calculation_in_text(last_message.content) and not state.get("tool_calls"):
        # Force a tool call or regenerate
        return {
            "messages": [SystemMessage(content="You must use the calculator tool. Try again.")]
        }
    return state

4. Conditional Routing Route based on whether tools were called:

python


xxxxxxxxxx
def should_continue(state):
    last_message = state["messages"][-1]
    
    # If LLM provided an answer with numbers but didn't call tool
    if contains_math_answer(last_message) and not last_message.tool_calls:
        return "force_tool_use"  # Route back to retry
    
    return "end"

Model Selection Considerations

5. Model-Specific Behavior Some models are better at following tool-use instructions:

Claude (Anthropic) tends to be very compliant with tool usage instructions
GPT-4 sometimes tries to "help" by doing simple math
Smaller models may struggle more with consistent tool usage

For your teaching context, this is actually a great opportunity to show students the difference between models!

Debugging Tips

6. Force Tool-Only Mode Make the tool call mandatory by modifying your agent structure:

python


xxxxxxxxxx
# In your graph definition
def agent_node(state):
    # Get LLM response
    response = model.invoke(state["messages"])
    
    # If it's a math question and no tool was called
    if is_math_question(state["messages"][-1]) and not response.tool_calls:
        # Inject a tool call manually or force regeneration
        return {"messages": [
            SystemMessage(content="ERROR: You must use the calculator tool for this. Do not provide an answer without calling the tool first.")
        ]}
    
    return {"messages": [response]}

Teaching Opportunity

For your course, this behavior difference is actually pedagogically valuable! You could:

First show the problem - let students see the LLM doing math incorrectly
Demonstrate progressive solutions - from weak prompting to strong guardrails
Compare models - show which models are most/least compliant
Add validation layers - teach defensive programming for agent systems