Prompt Trimmer
Trim long prompts down to a token budget — boundary-aware, with strategies for chat history, document QA, and code review.
Prompt input
Paste a prompt, mark priority paragraphs, then choose a trimming strategy.
Priority marking mode
Click paragraphs to mark high (keep) or low (trim first).
Trim settings
Pick a budget, a strategy, and what to keep intact.
Template
Strategy
Drops whole paragraphs from the middle outward.
Target token budget
Reserve for output
Subtracted from the budget so the model has room to reply.
Preserve markers
Headings (#, ##)
Code blocks (```)
Lists (-, *, 1.)
Quote blocks (>)
Inline code (`)
Add trimmed indicator
Inject markers like [... 234 tokens trimmed ...] where content was removed.
Trimmed output
Updates live. Toggle the diff to see what was removed.
832 → 832
832
Saved
0%
Retained
100%
Save a snapshot to compare versions later.
Chars per token, by content type
| Content type | Chars/token | Notes |
|---|---|---|
| English prose | ≈ 4 | Default for chat-style text |
| Code | ≈ 3 | Denser punctuation tokenizes finer |
| CJK (中日韓) | ≈ 2 | One token often spans 1–2 glyphs |
| Numbers / IDs | ≈ 2.5 | Digit-heavy strings tokenize tightly |
| URLs | ≈ 3 | Lots of punctuation |
Common budget shapes
- Short system prompt: 200–600 tokens.
- RAG context window: 2k–8k per request.
- Long-doc QA over a chapter: 8k–32k.
- Whole-codebase context: 100k–1M tokens (Gemini, Claude long).
- Reserve 300–800 tokens for the model's reply.
Cost note
Trimming a prompt by 1k tokens saves ~$0.003 per call on a $3 / 1M input model — multiplied by every call you make.
Open AI cost estimator →Token counts are heuristic. Real tokenizers vary by model — use these numbers for planning, not for exact billing.