Anthropic's LLM Claude Now Has a 75K Word Context Window. Consider What That Means.
More than three time the largest GPT-4 context window
Anthropic announced yesterday that it’s Claude large language model (LLM) now has a context window of 100,000 tokens which the companys say roughly translates into 75,000 words. It provides an example of what this means in the opening two paragraphs.
This means businesses can now submit hundreds of pages of materials for Claude to digest and analyze, and conversations with Claude can go on for hours or even days.
The average person can read 100,000 tokens of text in ~5+ hours, and then they might need substantially longer to digest, remember, and analyze that information. Claude can now do this in less than a minute. For example, we loaded the entire text of The Great Gatsby into Claude-Instant (72K tokens) and modified one line to say Mr. Carraway was “a software engineer that works on machine learning tooling at Anthropic.” When we asked the model to spot what was different, it responded with the correct answer in 22 seconds.
The company also provided a couple of videos that further highlight the usefulness, including one that shows an 85 page securities filing from Netflix uploaded to Claude. The ChatGPT rival can then answer natural language questions about the document.
What is a Context Window Anyway?
Maybe you already know about context windows. In case you don’t or the concept is fuzzy, this is not just the amount of data an LLM can access. It is the amount of content in a prompt that a model can employ to answer subsequent questions in the context of specific information.
You may have the experience of ChatGPT, Bing Chat, or Google Bard, losing their context window. This occurs when you are going back and forth on a series of relation questions and then, all of the sudden, a response comes back that it completely, you guessed it, out-of-context. The LLM has “forgotten” where the conversation started because it ran out of tokens it can hold in memory. Bing Chat dealt with this initially by limiting conversation turns to five.
This should not be a problem if you are using Claude with the highest context token setting. Granted, you are not likely to see that in many places other than specialty applications. GPT-4 has a 32,000 token context window option, but it is not widely used today and is not employed in ChatGPT.
Also, a large context window is not really a substitute for a knowledge base. It could be useful for ad hoc analysis or dynamically generated information. For more static information, or information that is used repeatedly, a retrieval model that accesses a knowledge base and works in concert with an LLM chat interface will be more efficient and practical.
Context Windows, Cost, and Latency
One reason you won’t see large context windows very often is their high cost. OpenAI prices use of its GPT-4 API at $0.03 / 1k tokens for prompts and $0.06k / 1k tokens for completions (i.e., responses) for the standard, 8K context window. The price for using the 32K context window is double, even if you are not using all of those available memory tokens in each turn of conversation.
With that said, it looks like Anthropic updated its pricing and Claude is the same price regardless of the context window. So, you pay for what you use. The more advanced Claude model which would be the closest comparison to GPT-4 is about $0.01 / 1k tokens for prompts and $0.03 / 1k tokens for completions. This means Claude with the larger context window is about ⅓ to ½ the cost of the 8K GPT-4 context window and 1/6 to ¼ for the 32K model.
This pricing savings will be attractive to services that are using large context windows on a regular basis.
Latency of the completion is also a consideration with large context windows. The more context a model must interpret before crafting a response, the longer the user may have to wait. The 22 seconds promoted by Anthropic is an extreme edge case, but it is also and incrible amount of time to wait compared to other digital services which measure their latency in milliseconds. Still, for some use cases, it will be worth it.
LLM Competition
Competition among LLM providers is rapidly shifting to consider more variables. When most use cases were just demonstrations, the key competition factor was output quality. We are now seeing competition based on cost efficiency, latency, context windows and specialization modes. There is no single formula that will win the LLM market. The prioritization of these factors will be use-case dependent.
Anthropic has talked a lot about safety guardrails and constitutional AI as a key differentiator from other LLMs without providing any meaningful measurement regarding how you compare models on this dimension. The company is now getting more concrete in its differentiation by becoming the product leader in context window size. I suspect that leadership won’t last long, but for today, among leading commercial LLMs, it is unique.