Skip to content

Anthropic

The Anthropic provider lives in crates/lime-provider-anthropic/ and uses the Messages API with adaptive thinking and tiered prompt caching.

Authentication

Terminal window
lime login --provider anthropic --with-api-key

Or via environment:

Terminal window
export ANTHROPIC_API_KEY="sk-ant-…"

Built-in catalog

Max output is the default response cap; Upper is the larger retry / extended-thinking cap that the runtime can switch up to when the turn needs the headroom.

SlugDisplayContextMax outputUpperCapabilities
claude-opus-4-7Claude Opus 4.71M32 K128 KVision, parallel tools, adaptive thinking, param_preset: verbosity_only
claude-sonnet-4-6Claude Sonnet 4.61M16 K64 KVision, parallel tools, extended thinking, param_preset: reasoning_and_verbosity
claude-haiku-4-5Claude Haiku 4.5200 K16 K64 KVision, parallel tools, extended thinking, param_preset: reasoning_and_verbosity
claude-opus-4-6Claude Opus 4.6 (legacy)1M32 K128 KVision, parallel tools, extended thinking
claude-sonnet-4-5Claude Sonnet 4.5 (legacy)200 K16 K64 KVision, parallel tools, extended thinking
claude-opus-4-5Claude Opus 4.5 (legacy)200 K32 K64 KVision, parallel tools, extended thinking
claude-opus-4-1Claude Opus 4.1 (legacy)200 K32 K32 KVision, parallel tools, extended thinking

Aliases

AliasResolves to
opusclaude-opus-4-7
sonnetclaude-sonnet-4-5
haikuclaude-haiku-4-5-20251001

What’s special

  • Tiered prompt caching. Long, stable prefixes get a 1-hour cache TTL; rolling tail content gets a 5-minute TTL. Intermediate cache breakpoints are inserted automatically for long turns so each cache hit is as large as possible.
  • Adaptive thinking. On models that support extended thinking, the provider chooses between adaptive (auto-budget) and explicit thinking-effort modes per turn based on --reasoning-effort and the remaining output budget.
  • Pause-turn handling. The provider treats pause_turn and context_window stop reasons as recoverable: the runtime continues the turn rather than aborting.
  • 429 + retry-after. Anthropic’s rate-limit signals are honored precisely; Lime backs off for the exact duration the API requests.

Common usage

Terminal window
lime --model claude-opus-4-7
lime --model claude-sonnet-4-6 --reasoning-effort xhigh
lime --model claude-haiku-4-5 --verbosity low

Notes on pricing

Lime’s telemetry crate (lime-telemetry) tracks per-turn input, cached input, output, and reasoning tokens, including separate buckets for 1-hour cache hits and 5-minute cache hits. /status shows the current session totals; /config env shows how the provider was authenticated.