Claudette’s source

This is the ‘literate’ source code for Claudette. You can view the fully rendered version of the notebook here, or you can clone the git repo and run the interactive notebook in Jupyter. The notebook is converted the Python module claudette/core.py using nbdev. The goal of this source code is to both create the Python module, and also to teach the reader how it is created, without assuming much existing knowledge about Claude’s API.

Most of the time you’ll see that we write some source code first, and then a description or discussion of it afterwards.

Setup

import os
# os.environ['ANTHROPIC_LOG'] = 'debug'

To print every HTTP request and response in full, uncomment the above line. This functionality is provided by Anthropic’s SDK.

Tip

If you’re reading the rendered version of this notebook, you’ll see an “Exported source” collapsible widget below. If you’re reading the source notebook directly, you’ll see #| exports at the top of the cell. These show that this piece of code will be exported into the python module that this notebook creates. No other code will be included – any other code in this notebook is just for demonstration, documentation, and testing.

You can toggle expanding/collapsing the source code of all exported sections by using the </> Code menu in the top right of the rendered notebook page.

Exported source
model_types = {
    # Anthropic
    'claude-3-opus-20240229': 'opus',
    'claude-3-7-sonnet-20250219': 'sonnet',
    'claude-3-5-sonnet-20241022': 'sonnet-3-5',
    'claude-3-haiku-20240307': 'haiku-3',
    'claude-3-5-haiku-20241022': 'haiku-3-5',
    # AWS
    'anthropic.claude-3-opus-20240229-v1:0': 'opus',
    'anthropic.claude-3-5-sonnet-20241022-v2:0': 'sonnet',
    'anthropic.claude-3-sonnet-20240229-v1:0': 'sonnet',
    'anthropic.claude-3-haiku-20240307-v1:0': 'haiku',
    # Google
    'claude-3-opus@20240229': 'opus',
    'claude-3-5-sonnet-v2@20241022': 'sonnet',
    'claude-3-sonnet@20240229': 'sonnet',
    'claude-3-haiku@20240307': 'haiku',
}

all_models = list(model_types)

Warning: between Anthropic SDK 0.4.2 and 0.4.7 the interface to the Model type changed.

models
['claude-3-opus-20240229',
 'claude-3-7-sonnet-20250219',
 'claude-3-5-sonnet-20241022',
 'claude-3-haiku-20240307',
 'claude-3-5-haiku-20241022']
Exported source
text_only_models = ('claude-3-5-haiku-20241022',)
Exported source
has_streaming_models = set(all_models)
has_system_prompt_models = set(all_models)
has_temperature_models = set(all_models)
has_extended_thinking_models = {'claude-3-7-sonnet-20250219'}
has_streaming_models
{'anthropic.claude-3-5-sonnet-20241022-v2:0',
 'anthropic.claude-3-haiku-20240307-v1:0',
 'anthropic.claude-3-opus-20240229-v1:0',
 'anthropic.claude-3-sonnet-20240229-v1:0',
 'claude-3-5-haiku-20241022',
 'claude-3-5-sonnet-20241022',
 'claude-3-5-sonnet-v2@20241022',
 'claude-3-7-sonnet-20250219',
 'claude-3-haiku-20240307',
 'claude-3-haiku@20240307',
 'claude-3-opus-20240229',
 'claude-3-opus@20240229',
 'claude-3-sonnet@20240229'}

source

can_use_extended_thinking

 can_use_extended_thinking (m)
Exported source
def can_stream(m): return m in has_streaming_models
def can_set_system_prompt(m): return m in has_system_prompt_models
def can_set_temperature(m): return m in has_temperature_models
def can_use_extended_thinking(m): return m in has_extended_thinking_models

source

can_set_temperature

 can_set_temperature (m)

source

can_set_system_prompt

 can_set_system_prompt (m)

source

can_stream

 can_stream (m)

We include these functions to provide a uniform library interface with cosette since openai models such as o1 do not have many of these capabilities.

assert can_stream('claude-3-5-sonnet-20241022') and can_set_system_prompt('claude-3-5-sonnet-20241022') and can_set_temperature('claude-3-5-sonnet-20241022')

These are the current versions and prices of Anthropic’s models at the time of writing.

model = models[1]; model
'claude-3-7-sonnet-20250219'

For examples, we’ll use Sonnet 3.5, since it’s awesome.

Antropic SDK

cli = Anthropic()

This is what Anthropic’s SDK provides for interacting with Python. To use it, pass it a list of messages, with content and a role. The roles should alternate between user and assistant.

Tip

After the code below you’ll see an indented section with an orange vertical line on the left. This is used to show the result of running the code above. Because the code is running in a Jupyter Notebook, we don’t have to use print to display results, we can just type the expression directly, as we do with r here.

m = {'role': 'user', 'content': "I'm Jeremy"}
r = cli.messages.create(messages=[m], model=model, max_tokens=100)
r

Hello Jeremy! It’s nice to meet you. How are you doing today? Is there anything I can help you with or something you’d like to talk about?

  • id: msg_01YTFnV5W6dv72G56yVgETgc
  • content: [{'citations': None, 'text': "Hello Jeremy! It's nice to meet you. How are you doing today? Is there anything I can help you with or something you'd like to talk about?", 'type': 'text'}]
  • model: claude-3-7-sonnet-20250219
  • role: assistant
  • stop_reason: end_turn
  • stop_sequence: None
  • type: message
  • usage: {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 10, 'output_tokens': 36, 'server_tool_use': None}

Formatting output

That output is pretty long and hard to read, so let’s clean it up. We’ll start by pulling out the Content part of the message. To do that, we’re going to write our first function which will be included to the claudette/core.py module.

Tip

This is the first exported public function or class we’re creating (the previous export was of a variable). In the rendered version of the notebook for these you’ll see 4 things, in this order (unless the symbol starts with a single _, which indicates it’s private):

  • The signature (with the symbol name as a heading, with a horizontal rule above)
  • A table of paramater docs (if provided)
  • The doc string (in italics).
  • The source code (in a collapsible “Exported source” block)

After that, we generally provide a bit more detail on what we’ve created, and why, along with a sample usage.


source

find_block

 find_block (r:collections.abc.Mapping, blk_type:type=<class
             'anthropic.types.text_block.TextBlock'>)

Find the first block of type blk_type in r.content.

Type Default Details
r Mapping The message to look in
blk_type type TextBlock The type of block to find
Exported source
def find_block(r:abc.Mapping, # The message to look in
               blk_type:type=TextBlock  # The type of block to find
              ):
    "Find the first block of type `blk_type` in `r.content`."
    return first(o for o in r.content if isinstance(o,blk_type))

This makes it easier to grab the needed parts of Claude’s responses, which can include multiple pieces of content. By default, we look for the first text block. That will generally have the content we want to display.

find_block(r)
TextBlock(citations=None, text="Hello Jeremy! It's nice to meet you. How are you doing today? Is there anything I can help you with or something you'd like to talk about?", type='text')
def contents(r):
    "Helper to get the contents from Claude response `r`."
    blk = find_block(r)
    if not blk and r.content: blk = r.content[0]
    return blk.text.strip() if hasattr(blk,'text') else str(blk)

For display purposes, we often just want to show the text itself.

contents(r)
"Hello Jeremy! It's nice to meet you. How are you doing today? Is there anything I can help you with or something you'd like to talk about?"
Exported source
@patch
def _repr_markdown_(self:(Message)):
    det = '\n- '.join(f'{k}: `{v}`' for k,v in self.model_dump().items())
    cts = re.sub(r'\$', '&#36;', contents(self))  # escape `$` for jupyter latex
    return f"""{cts}

<details>

- {det}

</details>"""

Jupyter looks for a _repr_markdown_ method in displayed objects; we add this in order to display just the content text, and collapse full details into a hideable section. Note that patch is from fastcore, and is used to add (or replace) functionality in an existing class. We pass the class(es) that we want to patch as type annotations to self. In this case, _repr_markdown_ is being added to Anthropic’s Message class, so when we display the message now we just see the contents, and the details are hidden away in a collapsible details block.

r

Hello Jeremy! It’s nice to meet you. How are you doing today? Is there anything I can help you with or something you’d like to talk about?

  • id: msg_01YTFnV5W6dv72G56yVgETgc
  • content: [{'citations': None, 'text': "Hello Jeremy! It's nice to meet you. How are you doing today? Is there anything I can help you with or something you'd like to talk about?", 'type': 'text'}]
  • model: claude-3-7-sonnet-20250219
  • role: assistant
  • stop_reason: end_turn
  • stop_sequence: None
  • type: message
  • usage: {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 10, 'output_tokens': 36, 'server_tool_use': None}

One key part of the response is the usage key, which tells us how many tokens we used by returning a Usage object.

We’ll add some helpers to make things a bit cleaner for creating and formatting these objects.

r.usage
In: 10; Out: 36; Cache create: 0; Cache read: 0; Total Tokens: 46; Server tool use (web search requests): 0

source

server_tool_usage

 server_tool_usage (web_search_requests=0)

Little helper to create a server tool usage object

Exported source
def server_tool_usage(web_search_requests=0):
    'Little helper to create a server tool usage object'
    return ServerToolUsage(web_search_requests=web_search_requests)

source

usage

 usage (inp=0, out=0, cache_create=0, cache_read=0,
        server_tool_use=ServerToolUsage(web_search_requests=0))

Slightly more concise version of Usage.

Type Default Details
inp int 0 input tokens
out int 0 Output tokens
cache_create int 0 Cache creation tokens
cache_read int 0 Cache read tokens
server_tool_use ServerToolUsage ServerToolUsage(web_search_requests=0) server tool use
Exported source
def usage(inp=0, # input tokens
          out=0,  # Output tokens
          cache_create=0, # Cache creation tokens
          cache_read=0, # Cache read tokens
          server_tool_use=server_tool_usage() # server tool use
         ):
    'Slightly more concise version of `Usage`.'
    return Usage(input_tokens=inp, output_tokens=out, cache_creation_input_tokens=cache_create,
                 cache_read_input_tokens=cache_read, server_tool_use=server_tool_use)

The constructor provided by Anthropic is rather verbose, so we clean it up a bit, using a lowercase version of the name.

usage(5)
In: 5; Out: 0; Cache create: 0; Cache read: 0; Total Tokens: 5; Server tool use (web search requests): 0

source

Usage.total

 Usage.total ()
Exported source
def _dgetattr(o,s,d): 
    "Like getattr, but returns the default if the result is None"
    return getattr(o,s,d) or d

@patch(as_prop=True)
def total(self:Usage): return self.input_tokens+self.output_tokens+_dgetattr(self, "cache_creation_input_tokens",0)+_dgetattr(self, "cache_read_input_tokens",0)

Adding a total property to Usage makes it easier to see how many tokens we’ve used up altogether.

usage(5,1).total
6

source

Usage.__repr__

 Usage.__repr__ ()

Return repr(self).

Exported source
@patch
def __repr__(self:Usage):
    io_toks = f'In: {self.input_tokens}; Out: {self.output_tokens}'
    cache_toks = f'Cache create: {_dgetattr(self, "cache_creation_input_tokens",0)}; Cache read: {_dgetattr(self, "cache_read_input_tokens",0)}'
    server_tool_use = _dgetattr(self, "server_tool_use",server_tool_usage())
    server_tool_use_str = f'Server tool use (web search requests): {server_tool_use.web_search_requests}'
    total_tok = f'Total Tokens: {self.total}'
    return f'{io_toks}; {cache_toks}; {total_tok}; {server_tool_use_str}'

In python, patching __repr__ lets us change how an object is displayed. (More generally, methods starting and ending in __ in Python are called dunder methods, and have some magic behavior – such as, in this case, changing how an object is displayed.) We won’t be directly displaying ServerToolUsage’s, so we can handle its display behavior in the same Usage __repr__

usage(5)
In: 5; Out: 0; Cache create: 0; Cache read: 0; Total Tokens: 5; Server tool use (web search requests): 0

source

ServerToolUsage.__add__

 ServerToolUsage.__add__ (b)

Add together each of the server tool use counts

Exported source
@patch
def __add__(self:ServerToolUsage, b):
    "Add together each of the server tool use counts"
    return ServerToolUsage(web_search_requests=self.web_search_requests+b.web_search_requests)

And, patching __add__ lets + work on a ServerToolUsage as well as a Usage object.

server_tool_usage(1) + server_tool_usage(2)
ServerToolUsage(web_search_requests=3)

source

Usage.__add__

 Usage.__add__ (b)

Add together each of input_tokens and output_tokens

Exported source
@patch
def __add__(self:Usage, b):
    "Add together each of `input_tokens` and `output_tokens`"
    return usage(self.input_tokens+b.input_tokens, self.output_tokens+b.output_tokens,
                 _dgetattr(self,'cache_creation_input_tokens',0)+_dgetattr(b,'cache_creation_input_tokens',0),
                 _dgetattr(self,'cache_read_input_tokens',0)+_dgetattr(b,'cache_read_input_tokens',0),
                 _dgetattr(self,'server_tool_use',server_tool_usage())+_dgetattr(b,'server_tool_use',server_tool_usage()))
r.usage+r.usage + usage(server_tool_use=server_tool_usage(1))
In: 20; Out: 72; Cache create: 0; Cache read: 0; Total Tokens: 92; Server tool use (web search requests): 1

Creating messages

Creating correctly formatted dicts from scratch every time isn’t very handy, so we’ll import a couple of helper functions from the msglm library.

Let’s use mk_msg to recreate our msg {'role': 'user', 'content': "I'm Jeremy"} from earlier.

prompt = "I'm Jeremy"
m = mk_msg(prompt)
r = cli.messages.create(messages=[m], model=model, max_tokens=100)
r

Hello, Jeremy! It’s nice to meet you. How are you doing today? Is there something I can help you with or would you like to discuss something specific?

  • id: msg_01T5fCHX6KMXPe41wHJe3RvM
  • content: [{'citations': None, 'text': "Hello, Jeremy! It's nice to meet you. How are you doing today? Is there something I can help you with or would you like to discuss something specific?", 'type': 'text'}]
  • model: claude-3-7-sonnet-20250219
  • role: assistant
  • stop_reason: end_turn
  • stop_sequence: None
  • type: message
  • usage: {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 10, 'output_tokens': 37, 'server_tool_use': None}

We can pass more than just text messages to Claude. As we’ll see later we can also pass images, SDK objects, etc. To handle these different data types we need to pass the type along with our content to Claude.

Here’s an example of a multimodal message containing text and images.

{
    'role': 'user', 
    'content': [
        {'type':'text', 'text':'What is in the image?'},
        {
            'type':'image', 
            'source': {
                'type':'base64', 'media_type':'media_type', 'data': 'data'
            }
        }
    ]
}

mk_msg infers the type automatically and creates the appropriate data structure.

LLMs, don’t actually have state, but instead dialogs are created by passing back all previous prompts and responses every time. With Claude, they always alternate user and assistant. We’ll use mk_msgs from msglm to make it easier to build up these dialog lists.

msgs = mk_msgs([prompt, r, "I forgot my name. Can you remind me please?"]) 
msgs
[{'role': 'user', 'content': "I'm Jeremy"},
 {'role': 'assistant',
  'content': [TextBlock(citations=None, text="Hello, Jeremy! It's nice to meet you. How are you doing today? Is there something I can help you with or would you like to discuss something specific?", type='text')]},
 {'role': 'user', 'content': 'I forgot my name. Can you remind me please?'}]
cli.messages.create(messages=msgs, model=model, max_tokens=200)

You mentioned that your name is Jeremy. That’s how you introduced yourself at the beginning of our conversation.

  • id: msg_014Ryt7moN693dhH9WF6wU4q
  • content: [{'citations': None, 'text': "You mentioned that your name is Jeremy. That's how you introduced yourself at the beginning of our conversation.", 'type': 'text'}]
  • model: claude-3-7-sonnet-20250219
  • role: assistant
  • stop_reason: end_turn
  • stop_sequence: None
  • type: message
  • usage: {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 61, 'output_tokens': 24, 'server_tool_use': None}

Client


source

Client

 Client (model, cli=None, log=False, cache=False)

Basic Anthropic messages client.

Exported source
class Client:
    def __init__(self, model, cli=None, log=False, cache=False):
        "Basic Anthropic messages client."
        self.model,self.use = model,usage()
        self.text_only = model in text_only_models
        self.log = [] if log else None
        self.c = (cli or Anthropic(default_headers={'anthropic-beta': 'prompt-caching-2024-07-31'}))
        self.cache = cache

We’ll create a simple Client for Anthropic which tracks usage stores the model to use. We don’t add any methods right away – instead we’ll use patch for that so we can add and document them incrementally.

c = Client(model)
c.use
In: 0; Out: 0; Cache create: 0; Cache read: 0; Total Tokens: 0; Server tool use (web search requests): 0
Exported source
@patch
def _r(self:Client, r:Message, prefill=''):
    "Store the result of the message and accrue total usage."
    if prefill:
        blk = find_block(r)
        blk.text = prefill + (blk.text or '')
    self.result = r
    self.use += r.usage
    self.stop_reason = r.stop_reason
    self.stop_sequence = r.stop_sequence
    return r

We use a _ prefix on private methods, but we document them here in the interests of literate source code.

_r will be used each time we get a new result, to track usage and also to keep the result available for later.

c._r(r)
c.use
In: 10; Out: 37; Cache create: 0; Cache read: 0; Total Tokens: 47; Server tool use (web search requests): 0

Whereas OpenAI’s models use a stream parameter for streaming, Anthropic’s use a separate method. We implement Anthropic’s approach in a private method, and then use a stream parameter in __call__ for consistency:

Exported source
@patch
def _log(self:Client, final, prefill, msgs, maxtok=None, sp=None, temp=None, stream=None, stop=None, **kwargs):
    self._r(final, prefill)
    if self.log is not None: self.log.append({
        "msgs": msgs, "prefill": prefill, **kwargs,
        "msgs": msgs, "prefill": prefill, "maxtok": maxtok, "sp": sp, "temp": temp, "stream": stream, "stop": stop, **kwargs,
        "result": self.result, "use": self.use, "stop_reason": self.stop_reason, "stop_sequence": self.stop_sequence
    })
    return self.result
Exported source
@patch
def _stream(self:Client, msgs:list, prefill='', **kwargs):
    with self.c.messages.stream(model=self.model, messages=mk_msgs(msgs, cache=self.cache, cache_last_ckpt_only=self.cache), **kwargs) as s:
        if prefill: yield(prefill)
        yield from s.text_stream
        self._log(s.get_final_message(), prefill, msgs, **kwargs)

Claude supports adding an extra assistant message at the end, which contains the prefill – i.e. the text we want Claude to assume the response starts with. However Claude doesn’t actually repeat that in the response, so for convenience we add it.

Exported source
@patch
def _precall(self:Client, msgs, prefill, stop, kwargs):
    pref = [prefill.strip()] if prefill else []
    if not isinstance(msgs,list): msgs = [msgs]
    if stop is not None:
        if not isinstance(stop, (list)): stop = [stop]
        kwargs["stop_sequences"] = stop
    msgs = mk_msgs(msgs+pref, cache=self.cache, cache_last_ckpt_only=self.cache)
    return msgs
@patch
@delegates(messages.Messages.create)
def __call__(self:Client,
             msgs:list, # List of messages in the dialog
             sp='', # The system prompt
             temp=0, # Temperature
             maxtok=4096, # Maximum tokens
             prefill='', # Optional prefill to pass to Claude as start of its response
             stream:bool=False, # Stream response?
             stop=None, # Stop sequence
             **kwargs):
    "Make a call to Claude."
    msgs = self._precall(msgs, prefill, stop, kwargs)
    if stream: return self._stream(msgs, prefill=prefill, max_tokens=maxtok, system=sp, temperature=temp, **kwargs)
    res = self.c.messages.create(
        model=self.model, messages=msgs, max_tokens=maxtok, system=sp, temperature=temp, **kwargs)
    return self._log(res, prefill, msgs, maxtok, sp, temp, stream=stream, **kwargs)

Defining __call__ let’s us use an object like a function (i.e it’s callable). We use it as a small wrapper over messages.create. However we’re not exporting this version just yet – we have some additions we’ll make in a moment…

c = Client(model, log=True)
c.use
In: 0; Out: 0; Cache create: 0; Cache read: 0; Total Tokens: 0; Server tool use (web search requests): 0
c('Hi')

Hello! How can I assist you today? Feel free to ask any questions or let me know what you’d like to discuss.

  • id: msg_015UyBYhtQVzWatX7FXLhvDg
  • content: [{'citations': None, 'text': "Hello! How can I assist you today? Feel free to ask any questions or let me know what you'd like to discuss.", 'type': 'text'}]
  • model: claude-3-7-sonnet-20250219
  • role: assistant
  • stop_reason: end_turn
  • stop_sequence: None
  • type: message
  • usage: {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 8, 'output_tokens': 29, 'server_tool_use': None}
c.use
In: 8; Out: 29; Cache create: 0; Cache read: 0; Total Tokens: 37; Server tool use (web search requests): 0

Let’s try out prefill:

q = "Concisely, what is the meaning of life?"
pref = 'According to Douglas Adams,'
c(q, prefill=pref)

According to Douglas Adams, it’s 42. More seriously, the meaning of life is deeply personal and varies across philosophical traditions - from finding happiness, serving others, pursuing knowledge, or creating your own purpose in an inherently meaningless universe. There’s no universal answer, which is perhaps what makes the question so enduring.

  • id: msg_01V158u6gauy1e8TyDPKtpkb
  • content: [{'citations': None, 'text': "According to Douglas Adams, it's 42. More seriously, the meaning of life is deeply personal and varies across philosophical traditions - from finding happiness, serving others, pursuing knowledge, or creating your own purpose in an inherently meaningless universe. There's no universal answer, which is perhaps what makes the question so enduring.", 'type': 'text'}]
  • model: claude-3-7-sonnet-20250219
  • role: assistant
  • stop_reason: end_turn
  • stop_sequence: None
  • type: message
  • usage: {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 24, 'output_tokens': 65, 'server_tool_use': None}

We can pass stream=True to stream the response back incrementally:

for o in c('Hi', stream=True): print(o, end='')
Hello! How can I assist you today? Feel free to ask any questions or let me know what you'd like to discuss.
c.use
In: 40; Out: 123; Cache create: 0; Cache read: 0; Total Tokens: 163; Server tool use (web search requests): 0
for o in c(q, prefill=pref, stream=True): print(o, end='')
According to Douglas Adams,  it's 42. More seriously, the meaning of life is deeply personal and varies across philosophical traditions - from finding happiness, serving others, pursuing knowledge, or creating your own purpose in an inherently meaningless universe. There's no universal answer, which is perhaps what makes the question so enduring.
c.use
In: 64; Out: 188; Cache create: 0; Cache read: 0; Total Tokens: 252; Server tool use (web search requests): 0

Pass a stop seauence if you want claude to stop generating text when it encounters it.

c("Count from 1 to 10", stop="5")

1, 2, 3, 4,

  • id: msg_013CBJrJQgR4ardtvmAxo3Xv
  • content: [{'citations': None, 'text': '1, 2, 3, 4, ', 'type': 'text'}]
  • model: claude-3-7-sonnet-20250219
  • role: assistant
  • stop_reason: stop_sequence
  • stop_sequence: 5
  • type: message
  • usage: {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 15, 'output_tokens': 14, 'server_tool_use': None}

This also works with streaming, and you can pass more than one stop sequence:

for o in c("Count from 1 to 10", stop=["2", "yellow"], stream=True): print(o, end='')
print(c.stop_reason, c.stop_sequence)
1, stop_sequence 2

You can check the logs:

c.log[-1]
{'msgs': [{'role': 'user', 'content': 'Count from 1 to 10'}],
 'prefill': '',
 'max_tokens': 4096,
 'system': '',
 'temperature': 0,
 'stop_sequences': ['2', 'yellow'],
 'maxtok': None,
 'sp': None,
 'temp': None,
 'stream': None,
 'stop': None,
 'result': Message(id='msg_01G2v8NthxnGMcVVjtE7VcBQ', content=[TextBlock(citations=None, text='1, ', type='text')], model='claude-3-7-sonnet-20250219', role='assistant', stop_reason='stop_sequence', stop_sequence='2', type='message', usage=In: 15; Out: 5; Cache create: 0; Cache read: 0; Total Tokens: 20; Server tool use (web search requests): 0),
 'use': In: 94; Out: 207; Cache create: 0; Cache read: 0; Total Tokens: 301; Server tool use (web search requests): 0,
 'stop_reason': 'stop_sequence',
 'stop_sequence': '2'}

We’ve shown the token usage but we really care about is pricing. Let’s extract the latest pricing from Anthropic into a pricing dict.


source

get_pricing

 get_pricing (m, u)
Exported source
def get_pricing(m, u):
    return pricing[m][:3] if u.prompt_token_count < 128_000 else pricing[m][3:]

Similarly, let’s get the pricing for the latest server tools:

We’ll patch Usage to enable it compute the cost given pricing.


source

Usage.cost

 Usage.cost (costs:tuple)
Exported source
@patch
def cost(self:Usage, costs:tuple) -> float:
    cache_w, cache_r = _dgetattr(self, "cache_creation_input_tokens",0), _dgetattr(self, "cache_read_input_tokens",0)
    tok_cost = sum([self.input_tokens * costs[0] +  self.output_tokens * costs[1] +  cache_w * costs[2] + cache_r * costs[3]]) / 1e6
    server_tool_use = _dgetattr(self, "server_tool_use",server_tool_usage())
    server_tool_cost = server_tool_use.web_search_requests * server_tool_pricing['web_search_requests'] / 1e3
    return tok_cost + server_tool_cost

source

Client.cost

 Client.cost ()
Exported source
@patch(as_prop=True)
def cost(self: Client) -> float: return self.use.cost(pricing[model_types[self.model]])

source

get_costs

 get_costs (c)
Exported source
def get_costs(c):
    costs = pricing[model_types[c.model]]
    
    inp_cost = c.use.input_tokens * costs[0] / 1e6
    out_cost = c.use.output_tokens * costs[1] / 1e6

    cache_w = c.use.cache_creation_input_tokens   
    cache_r = c.use.cache_read_input_tokens
    cache_cost = (cache_w * costs[2] + cache_r * costs[3]) / 1e6

    server_tool_use = c.use.server_tool_use
    server_tool_cost = server_tool_use.web_search_requests * server_tool_pricing['web_search_requests'] / 1e3
    return inp_cost, out_cost, cache_cost, cache_w + cache_r, server_tool_cost
Exported source
@patch
def _repr_markdown_(self:Client):
    if not hasattr(self,'result'): return 'No results yet'
    msg = contents(self.result)
    inp_cost, out_cost, cache_cost, cached_toks, server_tool_cost = get_costs(self)
    return f"""{msg}

| Metric | Count | Cost (USD) |
|--------|------:|-----:|
| Input tokens | {self.use.input_tokens:,} | {inp_cost:.6f} |
| Output tokens | {self.use.output_tokens:,} | {out_cost:.6f} |
| Cache tokens | {cached_toks:,} | {cache_cost:.6f} |
| Server tool use | {self.use.server_tool_use.web_search_requests:,} | {server_tool_cost:.6f} |
| **Total** | **{self.use.total:,}** | **${self.cost:.6f}** |"""
c

1,

Metric Count Cost (USD)
Input tokens 94 0.000282
Output tokens 207 0.003105
Cache tokens 0 0.000000
Server tool use 0 0.000000
Total 301 $0.003387

Tool use

Let’s now add tool use (aka function calling).


source

mk_tool_choice

 mk_tool_choice (choose:Union[str,bool,NoneType])

Create a tool_choice dict that’s ‘auto’ if choose is None, ‘any’ if it is True, or ‘tool’ otherwise

print(mk_tool_choice('sums'))
print(mk_tool_choice(True))
print(mk_tool_choice(None))
{'type': 'tool', 'name': 'sums'}
{'type': 'any'}
{'type': 'auto'}

Claude can be forced to use a particular tool, or select from a specific list of tools, or decide for itself when to use a tool. If you want to force a tool (or force choosing from a list), include a tool_choice param with a dict from mk_tool_choice.

For testing, we need a function that Claude can call; we’ll write a simple function that adds numbers together, and will tell us when it’s being called:

def sums(
    a:int,  # First thing to sum
    b:int=1 # Second thing to sum
) -> int: # The sum of the inputs
    "Adds a + b."
    print(f"Finding the sum of {a} and {b}")
    return a + b
a,b = 604542,6458932
pr = f"What is {a}+{b}?"
sp = "You are a summing expert."

Claudette can autogenerate a schema thanks to the toolslm library. We’ll force the use of the tool using the function we created earlier.

tools=[get_schema(sums)]
choice = mk_tool_choice('sums')

We’ll start a dialog with Claude now. We’ll store the messages of our dialog in msgs. The first message will be our prompt pr, and we’ll pass our tools schema.

msgs = mk_msgs(pr)
r = c(msgs, sp=sp, tools=tools, tool_choice=choice)
r

ToolUseBlock(id=‘toolu_01CpF6zzMQztfisMkrQRkci1’, input={‘a’: 604542, ‘b’: 6458932}, name=‘sums’, type=‘tool_use’)

  • id: msg_01C5m7hmZVisaFxCWKkgekKF
  • content: [{'id': 'toolu_01CpF6zzMQztfisMkrQRkci1', 'input': {'a': 604542, 'b': 6458932}, 'name': 'sums', 'type': 'tool_use'}]
  • model: claude-3-7-sonnet-20250219
  • role: assistant
  • stop_reason: tool_use
  • stop_sequence: None
  • type: message
  • usage: {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 442, 'output_tokens': 53, 'server_tool_use': None}

When Claude decides that it should use a tool, it passes back a ToolUseBlock with the name of the tool to call, and the params to use.

We don’t want to allow it to call just any possible function (that would be a security disaster!) so we create a namespace – that is, a dictionary of allowable function names to call.

ns = mk_ns(sums)
ns
{'sums': <function __main__.sums(a: int, b: int = 1) -> int>}

source

mk_funcres

 mk_funcres (fc, ns)

Given tool use block fc, get tool result, and create a tool_result response.

Exported source
def mk_funcres(fc, ns):
    "Given tool use block `fc`, get tool result, and create a tool_result response."
    res = call_func(fc.name, fc.input, ns=ns)
    return dict(type="tool_result", tool_use_id=fc.id, content=str(res))

We can now use the function requested by Claude. We look it up in ns, and pass in the provided parameters.

fcs = [o for o in r.content if isinstance(o,ToolUseBlock)]
fcs
[ToolUseBlock(id='toolu_01CpF6zzMQztfisMkrQRkci1', input={'a': 604542, 'b': 6458932}, name='sums', type='tool_use')]
res = [mk_funcres(fc, ns=ns) for fc in fcs]
res
Finding the sum of 604542 and 6458932
[{'type': 'tool_result',
  'tool_use_id': 'toolu_01CpF6zzMQztfisMkrQRkci1',
  'content': '7063474'}]
def contents(r):
    "Helper to get the contents from Claude response `r`."
    blk = find_block(r)
    if not blk and r.content: blk = r.content[0]
    if hasattr(blk,'text'): return blk.text.strip()
    elif hasattr(blk,'content'): return blk.content.strip()
    return str(blk)

source

mk_toolres

 mk_toolres (r:collections.abc.Mapping,
             ns:Optional[collections.abc.Mapping]=None, obj:Optional=None)

Create a tool_result message from response r.

Type Default Details
r Mapping Tool use request response from Claude
ns Optional None Namespace to search for tools
obj Optional None Class to search for tools
Exported source
def mk_toolres(
    r:abc.Mapping, # Tool use request response from Claude
    ns:Optional[abc.Mapping]=None, # Namespace to search for tools
    obj:Optional=None # Class to search for tools
    ):
    "Create a `tool_result` message from response `r`."
    cts = getattr(r, 'content', [])
    res = [mk_msg(r.model_dump(), role='assistant')]
    if ns is None: ns=globals()
    if obj is not None: ns = mk_ns(obj)
    tcs = [mk_funcres(o, ns) for o in cts if isinstance(o,ToolUseBlock)]
    if tcs: res.append(mk_msg(tcs))
    return res

In order to tell Claude the result of the tool call, we pass back the tool use assistant request and the tool_result response.

tr = mk_toolres(r, ns=ns)
tr
Finding the sum of 604542 and 6458932
[{'role': 'assistant',
  'content': [{'id': 'toolu_01CpF6zzMQztfisMkrQRkci1',
    'input': {'a': 604542, 'b': 6458932},
    'name': 'sums',
    'type': 'tool_use'}]},
 {'role': 'user',
  'content': [{'type': 'tool_result',
    'tool_use_id': 'toolu_01CpF6zzMQztfisMkrQRkci1',
    'content': '7063474'}]}]
msgs
[{'role': 'user', 'content': 'What is 604542+6458932?'}]

We add this to our dialog, and now Claude has all the information it needs to answer our question.

msgs += tr
contents(c(msgs, sp=sp, tools=tools))
'The sum of 604542 and 6458932 is 7,063,474.'
contents(msgs[-1])
'7063474'
msgs
[{'role': 'user', 'content': 'What is 604542+6458932?'},
 {'role': 'assistant',
  'content': [{'id': 'toolu_01CpF6zzMQztfisMkrQRkci1',
    'input': {'a': 604542, 'b': 6458932},
    'name': 'sums',
    'type': 'tool_use'}]},
 {'role': 'user',
  'content': [{'type': 'tool_result',
    'tool_use_id': 'toolu_01CpF6zzMQztfisMkrQRkci1',
    'content': '7063474'}]}]

This works with methods as well – in this case, use the object itself for ns:

class Dummy:
    def sums(
        self,
        a:int,  # First thing to sum
        b:int=1 # Second thing to sum
    ) -> int: # The sum of the inputs
        "Adds a + b."
        print(f"Finding the sum of {a} and {b}")
        return a + b
tools = [get_schema(Dummy.sums)]
o = Dummy()
r = c(pr, sp=sp, tools=tools, tool_choice=choice)
tr = mk_toolres(r, obj=o)
msgs += tr
contents(c(msgs, sp=sp, tools=tools))
Finding the sum of 604542 and 6458932
'The sum of 604542 and 6458932 is 7063474.'

Anthropic, also has a special tool type specific to text editing.

tools = [text_editor_conf['sonnet']]
tools
[{'type': 'text_editor_20250124', 'name': 'str_replace_editor'}]
pr = 'Could you please explain my _quarto.yml file?'
msgs = [mk_msg(pr)]
r = c(msgs, sp=sp, tools=tools)
find_block(r, ToolUseBlock)
ToolUseBlock(id='toolu_01ArY8DQsYsyNF3tRNnXWWWJ', input={'command': 'view', 'path': '_quarto.yml'}, name='str_replace_editor', type='tool_use')

We’ve gone ahead and create a reference implementation that you can directly use from our text_editor module. Or use as reference for creating your own.

ns = mk_ns(str_replace_editor)
tr = mk_toolres(r, ns=ns)
msgs += tr
print(contents(c(msgs, sp=sp, tools=tools))[:128])
## Explanation of Your _quarto.yml File

Your _quarto.yml file is a configuration file for Quarto, which is a scientific and tec

Callable Client


source

get_types

 get_types (msgs)
get_types(msgs)
['text', 'text', 'tool_use', 'tool_result']

source

Client.__call__

 Client.__call__ (msgs:list, sp='', temp=0, maxtok=4096, maxthinktok=0,
                  prefill='', stream:bool=False, stop=None,
                  tools:Optional[list]=None,
                  tool_choice:Optional[dict]=None,
                  metadata:MetadataParam|NotGiven=NOT_GIVEN,
                  stop_sequences:List[str]|NotGiven=NOT_GIVEN, system:Unio
                  n[str,Iterable[TextBlockParam]]|NotGiven=NOT_GIVEN,
                  temperature:float|NotGiven=NOT_GIVEN,
                  thinking:ThinkingConfigParam|NotGiven=NOT_GIVEN,
                  top_k:int|NotGiven=NOT_GIVEN,
                  top_p:float|NotGiven=NOT_GIVEN,
                  extra_headers:Headers|None=None,
                  extra_query:Query|None=None, extra_body:Body|None=None,
                  timeout:float|httpx.Timeout|None|NotGiven=NOT_GIVEN)

Make a call to Claude.

Type Default Details
msgs list List of messages in the dialog
sp str The system prompt
temp int 0 Temperature
maxtok int 4096 Maximum tokens
maxthinktok int 0 Maximum thinking tokens
prefill str Optional prefill to pass to Claude as start of its response
stream bool False Stream response?
stop NoneType None Stop sequence
tools Optional None List of tools to make available to Claude
tool_choice Optional None Optionally force use of some tool
metadata MetadataParam | NotGiven NOT_GIVEN
stop_sequences List[str] | NotGiven NOT_GIVEN
system Union[str, Iterable[TextBlockParam]] | NotGiven NOT_GIVEN
temperature float | NotGiven NOT_GIVEN
thinking ThinkingConfigParam | NotGiven NOT_GIVEN
top_k int | NotGiven NOT_GIVEN
top_p float | NotGiven NOT_GIVEN
extra_headers Optional None Use the following arguments if you need to pass additional parameters to the API that aren’t available via kwargs.
The extra values given here take precedence over values defined on the client or passed to this method.
extra_query Query | None None
extra_body Body | None None
timeout float | httpx.Timeout | None | NotGiven NOT_GIVEN
Exported source
@patch
@delegates(messages.Messages.create)
def __call__(self:Client,
             msgs:list, # List of messages in the dialog
             sp='', # The system prompt
             temp=0, # Temperature
             maxtok=4096, # Maximum tokens
             maxthinktok=0, # Maximum thinking tokens
             prefill='', # Optional prefill to pass to Claude as start of its response
             stream:bool=False, # Stream response?
             stop=None, # Stop sequence
             tools:Optional[list]=None, # List of tools to make available to Claude
             tool_choice:Optional[dict]=None, # Optionally force use of some tool
             **kwargs):
    "Make a call to Claude."
    if tools: kwargs['tools'] = [get_schema(o) if callable(o) else o for o in listify(tools)]
    if tool_choice: kwargs['tool_choice'] = mk_tool_choice(tool_choice)
    if maxthinktok: 
        kwargs['thinking']={'type':'enabled', 'budget_tokens':maxthinktok} 
        temp=1; prefill=''
    msgs = self._precall(msgs, prefill, stop, kwargs)
    if any(t == 'image' for t in get_types(msgs)): assert not self.text_only, f"Images are not supported by the current model type: {self.model}"
    if stream: return self._stream(msgs, prefill=prefill, max_tokens=maxtok, system=sp, temperature=temp, **kwargs)
    res = self.c.messages.create(model=self.model, messages=msgs, max_tokens=maxtok, system=sp, temperature=temp, **kwargs)
    return self._log(res, prefill, msgs, maxtok, sp, temp, stream=stream, stop=stop, **kwargs)
for tools in [sums, [get_schema(sums)]]:
    r = c(pr, sp=sp, tools=sums, tool_choice='sums')
    print(r)
Message(id='msg_01Cn7G1vgw8YV5eyKXSQFaLQ', content=[ToolUseBlock(id='toolu_01Jy9HjLxc9ND581HqcjzqKd', input={'a': 0}, name='sums', type='tool_use')], model='claude-3-7-sonnet-20250219', role='assistant', stop_reason='tool_use', stop_sequence=None, type='message', usage=In: 444; Out: 33; Cache create: 0; Cache read: 0; Total Tokens: 477; Server tool use (web search requests): 0)
Message(id='msg_01JNUGteXRfLQakhSXAV1ehu', content=[ToolUseBlock(id='toolu_01DFHbFskemZbrqew1spHFA4', input={'a': 0}, name='sums', type='tool_use')], model='claude-3-7-sonnet-20250219', role='assistant', stop_reason='tool_use', stop_sequence=None, type='message', usage=In: 444; Out: 33; Cache create: 0; Cache read: 0; Total Tokens: 477; Server tool use (web search requests): 0)
ns = mk_ns(sums)
tr = mk_toolres(r, ns=ns)
Finding the sum of 0 and 1

source

Client.structured

 Client.structured (msgs:list, tools:Optional[list]=None,
                    obj:Optional=None,
                    ns:Optional[collections.abc.Mapping]=None, sp='',
                    temp=0, maxtok=4096, maxthinktok=0, prefill='',
                    stream:bool=False, stop=None,
                    tool_choice:Optional[dict]=None,
                    metadata:MetadataParam|NotGiven=NOT_GIVEN,
                    stop_sequences:List[str]|NotGiven=NOT_GIVEN, system:Un
                    ion[str,Iterable[TextBlockParam]]|NotGiven=NOT_GIVEN,
                    temperature:float|NotGiven=NOT_GIVEN,
                    thinking:ThinkingConfigParam|NotGiven=NOT_GIVEN,
                    top_k:int|NotGiven=NOT_GIVEN,
                    top_p:float|NotGiven=NOT_GIVEN,
                    extra_headers:Headers|None=None,
                    extra_query:Query|None=None,
                    extra_body:Body|None=None,
                    timeout:float|httpx.Timeout|None|NotGiven=NOT_GIVEN)

Return the value of all tool calls (generally used for structured outputs)

Type Default Details
msgs list List of messages in the dialog
tools Optional None List of tools to make available to Claude
obj Optional None Class to search for tools
ns Optional None Namespace to search for tools
sp str The system prompt
temp int 0 Temperature
maxtok int 4096 Maximum tokens
maxthinktok int 0 Maximum thinking tokens
prefill str Optional prefill to pass to Claude as start of its response
stream bool False Stream response?
stop NoneType None Stop sequence
tool_choice Optional None Optionally force use of some tool
metadata MetadataParam | NotGiven NOT_GIVEN
stop_sequences List[str] | NotGiven NOT_GIVEN
system Union[str, Iterable[TextBlockParam]] | NotGiven NOT_GIVEN
temperature float | NotGiven NOT_GIVEN
thinking ThinkingConfigParam | NotGiven NOT_GIVEN
top_k int | NotGiven NOT_GIVEN
top_p float | NotGiven NOT_GIVEN
extra_headers Optional None Use the following arguments if you need to pass additional parameters to the API that aren’t available via kwargs.
The extra values given here take precedence over values defined on the client or passed to this method.
extra_query Query | None None
extra_body Body | None None
timeout float | httpx.Timeout | None | NotGiven NOT_GIVEN
Exported source
@patch
@delegates(Client.__call__)
def structured(self:Client,
               msgs:list, # List of messages in the dialog
               tools:Optional[list]=None, # List of tools to make available to Claude
               obj:Optional=None, # Class to search for tools
               ns:Optional[abc.Mapping]=None, # Namespace to search for tools
               **kwargs):
    "Return the value of all tool calls (generally used for structured outputs)"
    tools = listify(tools)
    res = self(msgs, tools=tools, tool_choice=tools, **kwargs)
    if ns is None: ns=mk_ns(*tools)
    if obj is not None: ns = mk_ns(obj)
    cts = getattr(res, 'content', [])
    tcs = [call_func(o.name, o.input, ns=ns) for o in cts if isinstance(o,ToolUseBlock)]
    return tcs

Anthropic’s API does not support response formats directly, so instead we provide a structured method to use tool calling to achieve the same result. The result of the tool is not passed back to Claude in this case, but instead is returned directly to the user.

c.structured(pr, tools=[sums])
Finding the sum of 1 and 1
[2]
c

ToolUseBlock(id=‘toolu_01FGiPmwqfSxvSFmEa8FJDAK’, input={‘a’: 1}, name=‘sums’, type=‘tool_use’)

Metric Count Cost (USD)
Input tokens 5,860 0.017580
Output tokens 1,442 0.021630
Cache tokens 0 0.000000
Server tool use 0 0.000000
Total 7,302 $0.039210

Custom Types with Tools Use

We need to add tool support for custom types too. Let’s test out custom types using a minimal example.

class Book(BasicRepr):
    def __init__(self, title: str, pages: int): store_attr()
    def __repr__(self):
        return f"Book Title : {self.title}\nNumber of Pages : {self.pages}"
Book("War and Peace", 950)
Book Title : War and Peace
Number of Pages : 950
def find_page(book: Book, # The book to find the halfway point of
              percent: int, # Percent of a book to read to, e.g. halfway == 50, 
) -> int:
    "The page number corresponding to `percent` completion of a book"
    return round(book.pages * (percent / 100.0))
get_schema(find_page)
{'name': 'find_page',
 'description': 'The page number corresponding to `percent` completion of a book\n\nReturns:\n- type: integer',
 'input_schema': {'type': 'object',
  'properties': {'book': {'type': 'object',
    'description': 'The book to find the halfway point of',
    '$ref': '#/$defs/Book'},
   'percent': {'type': 'integer',
    'description': 'Percent of a book to read to, e.g. halfway == 50,'}},
  'title': None,
  'required': ['book', 'percent'],
  '$defs': {'Book': {'type': 'object',
    'properties': {'title': {'type': 'string', 'description': ''},
     'pages': {'type': 'integer', 'description': ''}},
    'title': 'Book',
    'required': ['title', 'pages']}}}}
choice = mk_tool_choice('find_page')
choice
{'type': 'tool', 'name': 'find_page'}

Claudette will pack objects as dict, so we’ll transform tool functions with user-defined types into tool functions that accept a dict in lieu of the user-defined type.

First let’s convert a single argument:

_is_builtin decides whether to pass an argument through as-is. Let’s check the argument conversion:

(_is_builtin(int), _is_builtin(Book), _is_builtin(List))
(True, False, True)
(_convert(555, int),
 _convert({"title": "War and Peace", "pages": 923}, Book),
 _convert([1, 2, 3, 4], List))
(555,
 Book Title : War and Peace
 Number of Pages : 923,
 [1, 2, 3, 4])

To apply tool() to a function is to return a new function where the user-defined types are replaced with dictionary inputs.


source

tool

 tool (func)

A function is transformed into a function with dict arguments substituted for user-defined types. Built-in types such as percent here are left untouched.

find_page(book=Book("War and Peace", 950), percent=50)
475
tool(find_page)({"title": "War and Peace", "pages": 950}, percent=50)
475

By passing tools wrapped by tool(), user-defined types now work completes without failing in tool calls.

pr = "How many pages do I have to read to get halfway through my 950 page copy of War and Peace"
tools = tool(find_page)
tools
<function __main__.find_page(book: __main__.Book, percent: int) -> int>
r = c(pr, tools=[tools])
find_block(r, ToolUseBlock)
ToolUseBlock(id='toolu_012HsKTgeqwpJSBwoPxJaTiZ', input={'book': {'title': 'War and Peace', 'pages': 950}, 'percent': 50}, name='find_page', type='tool_use')
tr = mk_toolres(r, ns=[tools])
tr
[{'role': 'assistant',
  'content': [{'citations': None,
    'text': 'I can help you find the halfway point of your book. Let me calculate how many pages you need to read to get halfway through your 950-page copy of War and Peace.',
    'type': 'text'},
   {'id': 'toolu_012HsKTgeqwpJSBwoPxJaTiZ',
    'input': {'book': {'title': 'War and Peace', 'pages': 950}, 'percent': 50},
    'name': 'find_page',
    'type': 'tool_use'}]},
 {'role': 'user',
  'content': [{'type': 'tool_result',
    'tool_use_id': 'toolu_012HsKTgeqwpJSBwoPxJaTiZ',
    'content': '475'}]}]
msgs = [pr]+tr
contents(c(msgs, sp=sp, tools=[tools]))
'You need to read 475 pages to reach the halfway point of your 950-page copy of War and Peace.'

Chat

Rather than manually adding the responses to a dialog, we’ll create a simple Chat class to do that for us, each time we make a request. We’ll also store the system prompt and tools here, to avoid passing them every time.


source

Chat

 Chat (model:Optional[str]=None, cli:Optional[__main__.Client]=None,
       sp='', tools:Optional[list]=None, temp=0,
       cont_pr:Optional[str]=None, cache:bool=False, hist:list=None,
       ns:Optional[collections.abc.Mapping]=None)

Anthropic chat client.

Type Default Details
model Optional None Model to use (leave empty if passing cli)
cli Optional None Client to use (leave empty if passing model)
sp str Optional system prompt
tools Optional None List of tools to make available to Claude
temp int 0 Temperature
cont_pr Optional None User prompt to continue an assistant response
cache bool False Use Claude cache?
hist list None Initialize history
ns Optional None Namespace to search for tools

The class stores the Client that will provide the responses in c, and a history of messages in h.

sp = "Never mention what tools you use."
chat = Chat(model, sp=sp)
chat.c.use, chat.h
(In: 0; Out: 0; Cache create: 0; Cache read: 0; Total Tokens: 0; Server tool use (web search requests): 0,
 [])
chat.c.use.cost(pricing[model_types[chat.c.model]])
0.0

This is clunky. Let’s add cost as a property for the Chat class. It will pass in the appropriate prices for the current model to the usage cost calculator.


source

Chat.cost

 Chat.cost ()
Exported source
@patch(as_prop=True)
def cost(self: Chat) -> float: return self.c.cost
chat.cost
0.0

source

Chat.__call__

 Chat.__call__ (pr=None, temp=None, maxtok=4096, maxthinktok=0,
                stream=False, prefill='', tool_choice:Optional[dict]=None,
                **kw)

Call self as a function.

Type Default Details
pr NoneType None Prompt / message
temp NoneType None Temperature
maxtok int 4096 Maximum tokens
maxthinktok int 0 Maximum thinking tokens
stream bool False Stream response?
prefill str Optional prefill to pass to Claude as start of its response
tool_choice Optional None Optionally force use of some tool
kw VAR_KEYWORD
Exported source
@patch
def _stream(self:Chat, res):
    yield from res
    self.h += mk_toolres(self.c.result, ns=self.tools, obj=self)
Exported source
@patch
def _post_pr(self:Chat, pr, prev_role):
    if pr is None and prev_role == 'assistant':
        if self.cont_pr is None:
            raise ValueError("Prompt must be given after assistant completion, or use `self.cont_pr`.")
        pr = self.cont_pr # No user prompt, keep the chain
    if pr: self.h.append(mk_msg(pr, cache=self.cache))
Exported source
@patch
def _append_pr(self:Chat,
               pr=None,  # Prompt / message
              ):
    prev_role = nested_idx(self.h, -1, 'role') if self.h else 'assistant' # First message should be 'user'
    if pr and prev_role == 'user': self() # already user request pending
    self._post_pr(pr, prev_role)
Exported source
@patch
def __call__(self:Chat,
             pr=None,  # Prompt / message
             temp=None, # Temperature
             maxtok=4096, # Maximum tokens
             maxthinktok=0, # Maximum thinking tokens
             stream=False, # Stream response?
             prefill='', # Optional prefill to pass to Claude as start of its response
             tool_choice:Optional[dict]=None, # Optionally force use of some tool
             **kw):
    if temp is None: temp=self.temp
    self._append_pr(pr)
    res = self.c(self.h, stream=stream, prefill=prefill, sp=self.sp, temp=temp, maxtok=maxtok, maxthinktok=maxthinktok, tools=self.tools, tool_choice=tool_choice,**kw)
    if stream: return self._stream(res)
    self.h += mk_toolres(self.c.result, ns=self.ns)
    return res

The __call__ method just passes the request along to the Client, but rather than just passing in this one prompt, it appends it to the history and passes it all along. As a result, we now have state!

chat = Chat(model, sp=sp)
chat("I'm Jeremy")
chat("What's my name?")

Your name is Jeremy, as you mentioned in your previous message.

  • id: msg_018q1k9EPACg8QjBAeP7qpE1
  • content: [{'citations': None, 'text': 'Your name is Jeremy, as you mentioned in your previous message.', 'type': 'text'}]
  • model: claude-3-7-sonnet-20250219
  • role: assistant
  • stop_reason: end_turn
  • stop_sequence: None
  • type: message
  • usage: {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 59, 'output_tokens': 16, 'server_tool_use': None}
chat.use, chat.cost
(In: 76; Out: 50; Cache create: 0; Cache read: 0; Total Tokens: 126; Server tool use (web search requests): 0,
 0.000978)

Let’s try out prefill too:

q = "Concisely, what is the meaning of life?"
pref = 'According to Douglas Adams,'
chat.c.result

Your name is Jeremy, as you mentioned in your previous message.

  • id: msg_018q1k9EPACg8QjBAeP7qpE1
  • content: [{'citations': None, 'text': 'Your name is Jeremy, as you mentioned in your previous message.', 'type': 'text'}]
  • model: claude-3-7-sonnet-20250219
  • role: assistant
  • stop_reason: end_turn
  • stop_sequence: None
  • type: message
  • usage: {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 59, 'output_tokens': 16, 'server_tool_use': None}
chat(q, prefill=pref)

According to Douglas Adams, 42. More seriously, the meaning of life is likely what you create through your relationships, pursuits, and values.

  • id: msg_01QmdBDZ7vf8pLcHoTCqsmQn
  • content: [{'citations': None, 'text': 'According to Douglas Adams, 42. More seriously, the meaning of life is likely what you create through your relationships, pursuits, and values.', 'type': 'text'}]
  • model: claude-3-7-sonnet-20250219
  • role: assistant
  • stop_reason: end_turn
  • stop_sequence: None
  • type: message
  • usage: {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 95, 'output_tokens': 28, 'server_tool_use': None}

By default messages must be in user, assistant, user format. If this isn’t followed (aka calling chat() without a user message) it will error out:

try: chat()
except ValueError as e: print("Error:", e)
Error: Prompt must be given after assistant completion, or use `self.cont_pr`.

Setting cont_pr allows a “default prompt” to be specified when a prompt isn’t specified. Usually used to prompt the model to continue.

chat.cont_pr = "keep going..."
chat()

The meaning of life varies across philosophical traditions: finding happiness, serving others, seeking knowledge, fulfilling one’s potential, or connecting with something greater than oneself. Ultimately, many find meaning in personal growth, loving relationships, contributing to society, and pursuing what brings them genuine fulfillment. Rather than a single universal answer, meaning often emerges from our individual journeys and choices.

  • id: msg_01YJDurNFrknbtQjLMUGkL6S
  • content: [{'citations': None, 'text': "The meaning of life varies across philosophical traditions: finding happiness, serving others, seeking knowledge, fulfilling one's potential, or connecting with something greater than oneself. Ultimately, many find meaning in personal growth, loving relationships, contributing to society, and pursuing what brings them genuine fulfillment. Rather than a single universal answer, meaning often emerges from our individual journeys and choices.", 'type': 'text'}]
  • model: claude-3-7-sonnet-20250219
  • role: assistant
  • stop_reason: end_turn
  • stop_sequence: None
  • type: message
  • usage: {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 129, 'output_tokens': 82, 'server_tool_use': None}

We can also use streaming:

chat = Chat(model, sp=sp)
for o in chat("I'm Jeremy", stream=True): print(o, end='')
Hello Jeremy! It's nice to meet you. How are you doing today? Is there something I can help you with or would you like to chat?
for o in chat(q, prefill=pref, stream=True): print(o, end='')
According to Douglas Adams,  it's 42. More seriously, the meaning of life is deeply personal - many find it in relationships, creating positive impact, pursuing passions, or spiritual fulfillment. There's no universal answer; meaning is what you choose to create.

You can provide a history of messages to initialise Chat with:

chat = Chat(model, sp=sp, hist=["Can you guess my name?", "Hmmm I really don't know. Is it 'Merlin G. Penfolds'?"])
chat('Wow how did you know?')

I didn’t actually know your name! I was just making a random guess for fun. It’s quite surprising that I happened to guess correctly. What are the chances of that?

If you’d like, you can share how you’d prefer me to address you in our conversation.

  • id: msg_012dvDUX8udLpFYPNfUa2sv5
  • content: [{'citations': None, 'text': "I didn't actually know your name! I was just making a random guess for fun. It's quite surprising that I happened to guess correctly. What are the chances of that?\n\nIf you'd like, you can share how you'd prefer me to address you in our conversation.", 'type': 'text'}]
  • model: claude-3-7-sonnet-20250219
  • role: assistant
  • stop_reason: end_turn
  • stop_sequence: None
  • type: message
  • usage: {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 58, 'output_tokens': 60, 'server_tool_use': None}

Chat tool use

We automagically get streamlined tool use as well:

pr = f"What is {a}+{b}?"
pr
'What is 604542+6458932?'
chat = Chat(model, sp=sp, tools=[sums])
r = chat(pr)
r
Finding the sum of 604542 and 6458932

I’ll calculate the sum of those two numbers for you.

  • id: msg_01QFwCbk595VHhnKqqZc2EMi
  • content: [{'citations': None, 'text': "I'll calculate the sum of those two numbers for you.", 'type': 'text'}, {'id': 'toolu_016ViHPsnWnzVq65MwRyr8gW', 'input': {'a': 604542, 'b': 6458932}, 'name': 'sums', 'type': 'tool_use'}]
  • model: claude-3-7-sonnet-20250219
  • role: assistant
  • stop_reason: tool_use
  • stop_sequence: None
  • type: message
  • usage: {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 437, 'output_tokens': 85, 'server_tool_use': None}

Now we need to send this result to Claude—calling the object with no parameters tells it to return the tool result to Claude:

chat()

The sum of 604542 and 6458932 is 7,063,474.

  • id: msg_01PaLtHRATzhFrQFV3eg82pG
  • content: [{'citations': None, 'text': 'The sum of 604542 and 6458932 is 7,063,474.', 'type': 'text'}]
  • model: claude-3-7-sonnet-20250219
  • role: assistant
  • stop_reason: end_turn
  • stop_sequence: None
  • type: message
  • usage: {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 536, 'output_tokens': 25, 'server_tool_use': None}

It should be correct, because it actually used our Python function to do the addition. Let’s check:

a+b
7063474

Let’s test a function with user defined types.

chat = Chat(model, sp=sp, tools=[find_page])
r = chat("How many pages is three quarters of the way through my 80 page edition of Tao Te Ching?")
r

To find out how many pages is three quarters of the way through your 80-page edition of Tao Te Ching, I’ll calculate that for you.

  • id: msg_01FPy3rjcqggtxdRxgHG37PP
  • content: [{'citations': None, 'text': "To find out how many pages is three quarters of the way through your 80-page edition of Tao Te Ching, I'll calculate that for you.", 'type': 'text'}, {'id': 'toolu_013YVRza4qFjXh7Qv9aryPp4', 'input': {'book': {'title': 'Tao Te Ching', 'pages': 80}, 'percent': 75}, 'name': 'find_page', 'type': 'tool_use'}]
  • model: claude-3-7-sonnet-20250219
  • role: assistant
  • stop_reason: tool_use
  • stop_sequence: None
  • type: message
  • usage: {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 547, 'output_tokens': 122, 'server_tool_use': None}

Now we need to send this result to Claude—calling the object with no parameters tells it to return the tool result to Claude:

chat()

Three quarters (75%) of the way through your 80-page edition of Tao Te Ching would be page 60.

  • id: msg_01JUU6YMCqFBW1shmzM7Fj8w
  • content: [{'citations': None, 'text': 'Three quarters (75%) of the way through your 80-page edition of Tao Te Ching would be page 60.', 'type': 'text'}]
  • model: claude-3-7-sonnet-20250219
  • role: assistant
  • stop_reason: end_turn
  • stop_sequence: None
  • type: message
  • usage: {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 681, 'output_tokens': 33, 'server_tool_use': None}

It should be correct, because it actually used our Python function to do the addition. Let’s check:

80 * .75
60.0
chat = Chat(model, tools=[text_editor_conf['sonnet']], ns=mk_ns(str_replace_editor))

Note that mk_ns(str_replace_editor) is used here. When not providing tools directly as Python functions (like sum), you must create and pass a namespace dictionary (mapping the tool name string to the function object) using the ns parameter to methods like mk_toolres or toolloop. toolslm cannot automatically generate the namespace in this case. For schema-based tools (i.e., Python functions), claudette handles namespace creation automatically.

r = chat('Please explain what my _quarto.yml does. Use your tools')
find_block(r, ToolUseBlock)
ToolUseBlock(id='toolu_01E6LUbTPMsTGZMbiSMbQ3vm', input={'command': 'view', 'path': '_quarto.yml'}, name='str_replace_editor', type='tool_use')
chat()

Explanation of Your _quarto.yml File

Your _quarto.yml is a configuration file for Quarto, which is a scientific and technical publishing system. Here’s a breakdown of what it does:

Project Configuration

  • project.type: website: Defines this as a website project (as opposed to a book or other format)
  • project.resources: ["*.txt"]: Includes all .txt files as resources in the build
  • project.preview.port: 3000: Sets the preview server to run on port 3000
  • project.preview.browser: false: Prevents automatically opening a browser when previewing

Format Settings

  • format.html: Configures HTML output with several settings:
    • theme: cosmo: Uses the “cosmo” theme for styling
    • css: styles.css: Applies additional custom styles from styles.css
    • toc: true: Enables table of contents
    • code-tools: true: Enables code tools (like copy button)
    • code-block-bg: true: Adds background to code blocks
    • code-block-border-left: "#31BAE9": Sets a blue left border for code blocks
    • highlight-style: arrow: Uses the “arrow” syntax highlighting style
    • grid: Configures the page layout with specific widths for sidebar (180px), body (1800px), margins (150px), and gutters (1.0rem)
    • keep-md: true: Preserves Markdown files after rendering
  • format.commonmark: default: Also enables CommonMark format with default settings

Website Configuration

  • website.twitter-card: true: Enables Twitter card metadata
  • website.open-graph: true: Enables Open Graph metadata for social media sharing
  • website.repo-actions: [issue]: Adds an “issue” button for repository actions
  • website.navbar.background: primary: Sets the navbar background to the primary theme color
  • website.navbar.search: true: Enables search functionality in the navbar
  • website.sidebar.style: floating: Uses a floating style for the sidebar

Metadata Files

  • Includes two external metadata files:
    • nbdev.yml: Likely contains nbdev-specific configurations (nbdev is a library for developing Python packages)
    • sidebar.yml: Likely contains sidebar navigation structure

This configuration sets up a website with good code display features, responsive layout, and integration with development tools. It appears to be designed for technical documentation, possibly for a Python package using nbdev.

  • id: msg_016Hxyn9LbZ5EuhYVXYSUoPv
  • content: [{'citations': None, 'text': '# Explanation of Your _quarto.yml File\n\nYour _quarto.yml is a configuration file for Quarto, which is a scientific and technical publishing system. Here\'s a breakdown of what it does:\n\n## Project Configuration\n-project.type: website: Defines this as a website project (as opposed to a book or other format)\n-project.resources: [“*.txt”]: Includes all .txt files as resources in the build\n-project.preview.port: 3000: Sets the preview server to run on port 3000\n-project.preview.browser: false: Prevents automatically opening a browser when previewing\n\n## Format Settings\n-format.html: Configures HTML output with several settings:\n -theme: cosmo: Uses the "cosmo" theme for styling\n -css: styles.css: Applies additional custom styles from styles.css\n -toc: true: Enables table of contents\n -code-tools: true: Enables code tools (like copy button)\n -code-block-bg: true: Adds background to code blocks\n -code-block-border-left: “#31BAE9”: Sets a blue left border for code blocks\n -highlight-style: arrow: Uses the "arrow" syntax highlighting style\n -grid: Configures the page layout with specific widths for sidebar (180px), body (1800px), margins (150px), and gutters (1.0rem)\n -keep-md: true: Preserves Markdown files after rendering\n\n-format.commonmark: default: Also enables CommonMark format with default settings\n\n## Website Configuration\n-website.twitter-card: true: Enables Twitter card metadata\n-website.open-graph: true: Enables Open Graph metadata for social media sharing\n-website.repo-actions: [issue]: Adds an "issue" button for repository actions\n-website.navbar.background: primary: Sets the navbar background to the primary theme color\n-website.navbar.search: true: Enables search functionality in the navbar\n-website.sidebar.style: floating: Uses a floating style for the sidebar\n\n## Metadata Files\n- Includes two external metadata files:\n -nbdev.yml: Likely contains nbdev-specific configurations (nbdev is a library for developing Python packages)\n -sidebar.yml: Likely contains sidebar navigation structure\n\nThis configuration sets up a website with good code display features, responsive layout, and integration with development tools. It appears to be designed for technical documentation, possibly for a Python package using nbdev.', 'type': 'text'}]
  • model: claude-3-7-sonnet-20250219
  • role: assistant
  • stop_reason: end_turn
  • stop_sequence: None
  • type: message
  • usage: {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 1372, 'output_tokens': 593, 'server_tool_use': None}
Exported source
@patch
def _repr_markdown_(self:Chat):
    if not hasattr(self.c, 'result'): return 'No results yet'
    last_msg = contents(self.c.result)
    
    def fmt_msg(m):
        t = contents(m)
        if isinstance(t, dict): return t['content']
        return t
        
    history = '\n\n'.join(f"**{m['role']}**: {fmt_msg(m)}" 
                         for m in self.h)
    det = self.c._repr_markdown_().split('\n\n')[-1]
    return f"""{last_msg}

<details>
<summary>History</summary>

{history}
</details>

{det}"""
chat

Explanation of Your _quarto.yml File

Your _quarto.yml is a configuration file for Quarto, which is a scientific and technical publishing system. Here’s a breakdown of what it does:

Project Configuration

  • project.type: website: Defines this as a website project (as opposed to a book or other format)
  • project.resources: ["*.txt"]: Includes all .txt files as resources in the build
  • project.preview.port: 3000: Sets the preview server to run on port 3000
  • project.preview.browser: false: Prevents automatically opening a browser when previewing

Format Settings

  • format.html: Configures HTML output with several settings:
    • theme: cosmo: Uses the “cosmo” theme for styling
    • css: styles.css: Applies additional custom styles from styles.css
    • toc: true: Enables table of contents
    • code-tools: true: Enables code tools (like copy button)
    • code-block-bg: true: Adds background to code blocks
    • code-block-border-left: "#31BAE9": Sets a blue left border for code blocks
    • highlight-style: arrow: Uses the “arrow” syntax highlighting style
    • grid: Configures the page layout with specific widths for sidebar (180px), body (1800px), margins (150px), and gutters (1.0rem)
    • keep-md: true: Preserves Markdown files after rendering
  • format.commonmark: default: Also enables CommonMark format with default settings

Website Configuration

  • website.twitter-card: true: Enables Twitter card metadata
  • website.open-graph: true: Enables Open Graph metadata for social media sharing
  • website.repo-actions: [issue]: Adds an “issue” button for repository actions
  • website.navbar.background: primary: Sets the navbar background to the primary theme color
  • website.navbar.search: true: Enables search functionality in the navbar
  • website.sidebar.style: floating: Uses a floating style for the sidebar

Metadata Files

  • Includes two external metadata files:
    • nbdev.yml: Likely contains nbdev-specific configurations (nbdev is a library for developing Python packages)
    • sidebar.yml: Likely contains sidebar navigation structure

This configuration sets up a website with good code display features, responsive layout, and integration with development tools. It appears to be designed for technical documentation, possibly for a Python package using nbdev.

History

user: P

assistant: I’ll examine your _quarto.yml file to explain what it does. Let me first view the file.

user: project: type: website resources: - “*.txt” preview: port: 3000 browser: false

format: html: theme: cosmo css: styles.css toc: true code-tools: true code-block-bg: true code-block-border-left: “#31BAE9” highlight-style: arrow grid: sidebar-width: 180px body-width: 1800px margin-width: 150px gutter-width: 1.0rem keep-md: true commonmark: default

website: twitter-card: true open-graph: true repo-actions: [issue] navbar: background: primary search: true sidebar: style: floating

metadata-files: - nbdev.yml - sidebar.yml

assistant: # Explanation of Your _quarto.yml File

Your _quarto.yml is a configuration file for Quarto, which is a scientific and technical publishing system. Here’s a breakdown of what it does:

Project Configuration

  • project.type: website: Defines this as a website project (as opposed to a book or other format)
  • project.resources: ["*.txt"]: Includes all .txt files as resources in the build
  • project.preview.port: 3000: Sets the preview server to run on port 3000
  • project.preview.browser: false: Prevents automatically opening a browser when previewing

Format Settings

  • format.html: Configures HTML output with several settings:
    • theme: cosmo: Uses the “cosmo” theme for styling
    • css: styles.css: Applies additional custom styles from styles.css
    • toc: true: Enables table of contents
    • code-tools: true: Enables code tools (like copy button)
    • code-block-bg: true: Adds background to code blocks
    • code-block-border-left: "#31BAE9": Sets a blue left border for code blocks
    • highlight-style: arrow: Uses the “arrow” syntax highlighting style
    • grid: Configures the page layout with specific widths for sidebar (180px), body (1800px), margins (150px), and gutters (1.0rem)
    • keep-md: true: Preserves Markdown files after rendering
  • format.commonmark: default: Also enables CommonMark format with default settings

Website Configuration

  • website.twitter-card: true: Enables Twitter card metadata
  • website.open-graph: true: Enables Open Graph metadata for social media sharing
  • website.repo-actions: [issue]: Adds an “issue” button for repository actions
  • website.navbar.background: primary: Sets the navbar background to the primary theme color
  • website.navbar.search: true: Enables search functionality in the navbar
  • website.sidebar.style: floating: Uses a floating style for the sidebar

Metadata Files

  • Includes two external metadata files:
    • nbdev.yml: Likely contains nbdev-specific configurations (nbdev is a library for developing Python packages)
    • sidebar.yml: Likely contains sidebar navigation structure
This configuration sets up a website with good code display features, responsive layout, and integration with development tools. It appears to be designed for technical documentation, possibly for a Python package using nbdev.
Metric Count Cost (USD)
Input tokens 2,408 0.007224
Output tokens 693 0.010395
Cache tokens 0 0.000000
Server tool use 0 0.000000
Total 3,101 $0.017619

Images

Claude can handle image data as well. As everyone knows, when testing image APIs you have to use a cute puppy.

# Image is Cute_dog.jpg from Wikimedia
fn = Path('samples/puppy.jpg')
display.Image(filename=fn, width=200)

img = fn.read_bytes()

Claude expects an image message to have the following structure

{
    'role': 'user', 
    'content': [
        {'type':'text', 'text':'What is in the image?'},
        {
            'type':'image', 
            'source': {
                'type':'base64', 'media_type':'media_type', 'data': 'data'
            }
        }
    ]
}

msglm automatically detects if a message is an image, encodes it, and generates the data structure above. All we need to do is a create a list containing our image and a query and then pass it to mk_msg.

Let’s try it out…

q = "In brief, what color flowers are in this image?"
msg = mk_msg([img, q])
c([msg])

The flowers in the image are purple/lavender in color. They appear to be small daisy-like flowers or asters blooming next to where the adorable Cavalier King Charles Spaniel puppy is resting on the grass. The purple flowers create a nice contrast with the puppy’s white and reddish-brown fur.

  • id: msg_01SD3FDQbuJ4x98uNv62K2Xp
  • content: [{'citations': None, 'text': "The flowers in the image are purple/lavender in color. They appear to be small daisy-like flowers or asters blooming next to where the adorable Cavalier King Charles Spaniel puppy is resting on the grass. The purple flowers create a nice contrast with the puppy's white and reddish-brown fur.", 'type': 'text'}]
  • model: claude-3-7-sonnet-20250219
  • role: assistant
  • stop_reason: end_turn
  • stop_sequence: None
  • type: message
  • usage: {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 110, 'output_tokens': 75, 'server_tool_use': None}

You don’t need to call mk_msg on each individual message before passing them to the Chat class. Instead you can pass your messages in a list and the Chat class will automatically call mk_msgs in the background.

c(["How are you?", r])

For messages that contain multiple content types (like an image with a question), you’ll need to enclose the message contents in a list as shown below:

c(["How are you?", r, [img, q]])
c = Chat(model)
c([img, q])

The flowers in the image are purple/lavender in color. They appear to be small daisy-like flowers blooming next to where the adorable Cavalier King Charles Spaniel puppy is resting on the grass. The purple flowers create a nice contrast with the puppy’s white and reddish-brown fur.

  • id: msg_01VFVzc4JCVU1ZYxCiY5PBqb
  • content: [{'citations': None, 'text': "The flowers in the image are purple/lavender in color. They appear to be small daisy-like flowers blooming next to where the adorable Cavalier King Charles Spaniel puppy is resting on the grass. The purple flowers create a nice contrast with the puppy's white and reddish-brown fur.", 'type': 'text'}]
  • model: claude-3-7-sonnet-20250219
  • role: assistant
  • stop_reason: end_turn
  • stop_sequence: None
  • type: message
  • usage: {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 110, 'output_tokens': 72, 'server_tool_use': None}
def contents(r):
    "Helper to get the contents from Claude response `r`."
    blk = find_block(r)
    if not blk and r.content: blk = r.content[0]
    if hasattr(blk,'text'): return blk.text.strip()
    elif hasattr(blk,'content'): return blk.content.strip()
    elif hasattr(blk,'source'): return f'*Media Type - {blk.type}*'
    return str(blk)
contents(c.h[0])
'*Media Type - image*'
c

The flowers in the image are purple/lavender in color. They appear to be small daisy-like flowers blooming next to where the adorable Cavalier King Charles Spaniel puppy is resting on the grass. The purple flowers create a nice contrast with the puppy’s white and reddish-brown fur.

History

user: Media Type - image

assistant: The flowers in the image are purple/lavender in color. They appear to be small daisy-like flowers blooming next to where the adorable Cavalier King Charles Spaniel puppy is resting on the grass. The purple flowers create a nice contrast with the puppy’s white and reddish-brown fur.
Metric Count Cost (USD)
Input tokens 110 0.000330
Output tokens 72 0.001080
Cache tokens 0 0.000000
Server tool use 0 0.000000
Total 182 $0.001410
Note

Unfortunately, not all Claude models support images 😞. This table summarizes the capabilities of each Claude model and the different modalities they support.

Caching

Claude supports context caching by adding a cache_control header to the message content.

{
    "role": "user",
    "content": [
        {
            "type": "text", 
            "text": "Please cache my message", 
            "cache_control": {"type": "ephemeral"}
        }
    ]
}

To cache a message, we simply set cache=True when calling mk_msg.

mk_msg(['hi', 'there'], cache=True)
{ 'content': [ {'text': 'hi', 'type': 'text'},
               { 'cache_control': {'type': 'ephemeral'},
                 'text': 'there',
                 'type': 'text'}],
  'role': 'user'}

Claude also now supports smart cache look-ups, so it’s very simple to keep an entire conversation in cache by constantly telling it to update the cache with the latest message. To do this, we just need to set cache=True when creating a Chat.

chat = Chat(model, sp=sp, cache=True)

Caching has a minimum token limit of 1024 tokens for Sonnet and Opus, and 2048 for Haiku. If your conversation is below this limit, it will not be cached.

chat("Hi, I'm Jeremy.")

Hello Jeremy! It’s nice to meet you. How are you doing today? Is there something I can help you with or would you like to chat?

  • id: msg_01NRp9ys88wxbuTysaZnausb
  • content: [{'citations': None, 'text': "Hello Jeremy! It's nice to meet you. How are you doing today? Is there something I can help you with or would you like to chat?", 'type': 'text'}]
  • model: claude-3-7-sonnet-20250219
  • role: assistant
  • stop_reason: end_turn
  • stop_sequence: None
  • type: message
  • usage: {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 20, 'output_tokens': 34, 'server_tool_use': None}

Note the usage: no cache is created, nor used. Now, let’s send a long enough message to trigger caching.

chat("""Lorem ipsum dolor sit amet""" * 150)

I notice you’ve sent a large amount of “Lorem ipsum” text, which is commonly used as placeholder or filler text in design and publishing.

Is there something specific you’d like to discuss or a question you have? I’m here to help with meaningful conversation or information if you need it. If you’re testing something or just curious about how I respond, feel free to let me know what you’re looking for.

  • id: msg_01H6ovSz8T1V2rj4iynbeKvH
  • content: [{'citations': None, 'text': 'I notice you\'ve sent a large amount of "Lorem ipsum" text, which is commonly used as placeholder or filler text in design and publishing. \n\nIs there something specific you\'d like to discuss or a question you have? I\'m here to help with meaningful conversation or information if you need it. If you\'re testing something or just curious about how I respond, feel free to let me know what you\'re looking for.', 'type': 'text'}]
  • model: claude-3-7-sonnet-20250219
  • role: assistant
  • stop_reason: end_turn
  • stop_sequence: None
  • type: message
  • usage: {'cache_creation_input_tokens': 1101, 'cache_read_input_tokens': 0, 'input_tokens': 4, 'output_tokens': 90, 'server_tool_use': None}

The context is now long enough for cache to be used. All the conversation history has now been written to the temporary cache. Any subsequent message will read from it rather than re-processing the entire conversation history.

chat("Oh thank you! Sorry, my lorem ipsum generator got out of control!")

No problem at all! Those lorem ipsum generators can certainly get enthusiastic sometimes. It happens to the best of us! Is there something I can actually help you with today?

  • id: msg_01KsvSSaJjpuRst2BejF2LbT
  • content: [{'citations': None, 'text': 'No problem at all! Those lorem ipsum generators can certainly get enthusiastic sometimes. It happens to the best of us! Is there something I can actually help you with today?', 'type': 'text'}]
  • model: claude-3-7-sonnet-20250219
  • role: assistant
  • stop_reason: end_turn
  • stop_sequence: None
  • type: message
  • usage: {'cache_creation_input_tokens': 108, 'cache_read_input_tokens': 1101, 'input_tokens': 4, 'output_tokens': 38, 'server_tool_use': None}

Extended Thinking

Claude 3.7 Sonnet has enhanced reasoning capabilities for complex tasks. See docs for more info.

We can enable extended thinking by passing a thinking param with the following structure.

thinking={
    "type": "enabled",
    "budget_tokens": 16000
}

When extended thinking is enabled a thinking block is included in the response as shown below.

{
  "content": [
    {
      "type": "thinking",
      "thinking": "To approach this, let's think about...",
      "signature": "Imtakcjsu38219c0.eyJoYXNoIjoiYWJjM0NTY3fQ...."
    },
    {
      "type": "text",
      "text": "Yes, there are infinitely many prime numbers such that..."
    }
  ]
}

Let’s add a maxthinktok param to the Client and Chat call methods. When this value is not 0, we’ll pass a thinking param to Claude {"type":"enabled", "budget_tokens":maxthinktok}.

Note: When thinking is enabled prefill must be empty and the temp must be 1.


source

think_md

 think_md (txt, thk)
def contents(r):
    "Helper to get the contents from Claude response `r`."
    blk = find_block(r)
    tk_blk = find_block(r, blk_type=ThinkingBlock)
    if tk_blk: return think_md(blk.text.strip(), tk_blk.thinking.strip())
    if not blk and r.content: blk = r.content[0]
    if hasattr(blk,'text'): return blk.text.strip()
    elif hasattr(blk,'content'): return blk.content.strip()
    elif hasattr(blk,'source'): return f'*Media Type - {blk.type}*'
    return str(blk)

Let’s call the model without extended thinking enabled.

tk_model = first(has_extended_thinking_models)
chat = Chat(tk_model)
chat("Write a sentence about Python!")

Python is a versatile programming language known for its readable syntax and wide application in fields ranging from web development to data science and artificial intelligence.

  • id: msg_01S9DaYTNoo31yNh6fHZcQZc
  • content: [{'citations': None, 'text': 'Python is a versatile programming language known for its readable syntax and wide application in fields ranging from web development to data science and artificial intelligence.', 'type': 'text'}]
  • model: claude-3-7-sonnet-20250219
  • role: assistant
  • stop_reason: end_turn
  • stop_sequence: None
  • type: message
  • usage: {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 13, 'output_tokens': 31, 'server_tool_use': None}

Now, let’s call the model with extended thinking enabled.

chat("Write a sentence about Python!", maxthinktok=1024)

Python is a beginner-friendly, high-level programming language with an extensive ecosystem of libraries that has become one of the most popular tools for both quick scripts and enterprise-level applications.

Thinking The person is asking me to write a sentence about Python again. I should provide a different sentence than before to offer variety. Let me think of another aspect of Python to highlight, such as its community, ease of learning, libraries, or another key feature that makes Python popular.
  • id: msg_01JiyHw3dYuqgbdHqi3jNvTd
  • content: [{'signature': 'ErUBCkYIAxgCIkDGyn1SbTWjPjZr8XN8EIk5Y7YWiDvWZT2A4r+vxn8pzP6hNNkf2MmWpOqe5a42Ap8tlgvNDLg/98+wbkAbdLVWEgx/k1Ann3OFL7gBruwaDEyTKpSxI5/ZwVXm2CIwtZtisIvmybu3gAfvnhosRn3aChBZjl+RIBCAEs+i5BKE142qhTJM2d0XQk+JnBhWKh0/GloGOLUPWVsV/HpAFB61np8MTxhGhCCf04mhDRgC', 'thinking': 'The person is asking me to write a sentence about Python again. I should provide a different sentence than before to offer variety. Let me think of another aspect of Python to highlight, such as its community, ease of learning, libraries, or another key feature that makes Python popular.', 'type': 'thinking'}, {'citations': None, 'text': 'Python is a beginner-friendly, high-level programming language with an extensive ecosystem of libraries that has become one of the most popular tools for both quick scripts and enterprise-level applications.', 'type': 'text'}]
  • model: claude-3-7-sonnet-20250219
  • role: assistant
  • stop_reason: end_turn
  • stop_sequence: None
  • type: message
  • usage: {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 81, 'output_tokens': 106, 'server_tool_use': None}

Third party providers

Amazon Bedrock

These are Amazon’s current Claude models:

models_aws
['claude-3-5-haiku-20241022',
 'claude-3-7-sonnet-20250219',
 'anthropic.claude-3-opus-20240229-v1:0',
 'anthropic.claude-3-5-sonnet-20241022-v2:0']
Note

anthropic at version 0.34.2 seems not to install boto3 as a dependency. You may need to do a pip install boto3 or the creation of the Client below fails.

Provided boto3 is installed, we otherwise don’t need any extra code to support Amazon Bedrock – we just have to set up the approach client:

ab = AnthropicBedrock(
    aws_access_key=os.environ['AWS_ACCESS_KEY'],
    aws_secret_key=os.environ['AWS_SECRET_KEY'],
)
client = Client(models_aws[-1], ab)
chat = Chat(cli=client)
chat("I'm Jeremy")

Google Vertex

models_goog
from anthropic import AnthropicVertex
import google.auth
project_id = google.auth.default()[1]
region = "us-east5"
gv = AnthropicVertex(project_id=project_id, region=region)
client = Client(models_goog[-1], gv)
chat = Chat(cli=client)
chat("I'm Jeremy")

Footnotes

  1. https://www.10news.com/weather/san-diegos-weather-forecast-for-may-12-2025-drastic-drop-in-temperatures “May Gray is back in the forecast after record heat this weekend, temperatures are expected to plummet up to 40 degrees by Tuesday!A cooling trend ramp…”↩︎

  2. https://www.10news.com/weather/san-diegos-weather-forecast-for-may-12-2025-drastic-drop-in-temperatures “Tuesday’s Highs: Coast: 60-66° Inland: 59-66° Mountains: 48-60° Deserts: 77-80° · Follow ABC 10News Meteorologist Megan Parry on Facebook at Megan Par…”↩︎

  3. https://www.10news.com/weather/san-diegos-weather-forecast-for-may-12-2025-drastic-drop-in-temperatures “… Tuesday will bring a return of May Gray with limited clearing into the afternoon and anything from heavy mist, drizzle to a few light showers. The…”↩︎

  4. https://www.10news.com/weather/san-diegos-weather-forecast-for-may-12-2025-drastic-drop-in-temperatures “Gusty winds will continue to target the mountains and deserts where a Wind Advisory is in effect until 5am Wednesday. Expect west and southwesterly wi…”↩︎

  5. https://www.accuweather.com/en/us/san-diego/92101/weather-forecast/347628 “The air quality is generally acceptable for most individuals. However, sensitive groups may experience minor to moderate symptoms from long-term expos…”↩︎

  6. https://www.10news.com/weather/san-diegos-weather-forecast-for-may-12-2025-drastic-drop-in-temperatures “Sunny skies and warming temperatures will be the trend early next week.”↩︎

  7. https://www.10news.com/weather/san-diegos-weather-forecast-for-may-12-2025-drastic-drop-in-temperatures “May Gray is back in the forecast after record heat this weekend, temperatures are expected to plummet up to 40 degrees by Tuesday!A cooling trend ramp…”↩︎

  8. https://www.10news.com/weather/san-diegos-weather-forecast-for-may-12-2025-drastic-drop-in-temperatures “Tuesday’s Highs: Coast: 60-66° Inland: 59-66° Mountains: 48-60° Deserts: 77-80° · Follow ABC 10News Meteorologist Megan Parry on Facebook at Megan Par…”↩︎

  9. https://www.10news.com/weather/san-diegos-weather-forecast-for-may-12-2025-drastic-drop-in-temperatures “… Tuesday will bring a return of May Gray with limited clearing into the afternoon and anything from heavy mist, drizzle to a few light showers. The…”↩︎

  10. https://www.10news.com/weather/san-diegos-weather-forecast-for-may-12-2025-drastic-drop-in-temperatures “Gusty winds will continue to target the mountains and deserts where a Wind Advisory is in effect until 5am Wednesday. Expect west and southwesterly wi…”↩︎

  11. https://www.accuweather.com/en/us/san-diego/92101/weather-forecast/347628 “The air quality is generally acceptable for most individuals. However, sensitive groups may experience minor to moderate symptoms from long-term expos…”↩︎

  12. https://www.10news.com/weather/san-diegos-weather-forecast-for-may-12-2025-drastic-drop-in-temperatures “Sunny skies and warming temperatures will be the trend early next week.”↩︎