webfetch-clean
Web fetcher with 90-96% token savings
Fetch web pages, strip clutter (ads, scripts, navigation), and output clean Markdown or HTML. Provides 90-96% token cost savings compared to raw HTML fetching.
Features
- Triple-mode: CLI, stdio MCP server, or HTTP server
- Multi-pass HTML cleaning removes ads, scripts, nav, sidebars, popups
- Output as Markdown or HTML
- Screenshot capture via headless browser
- JavaScript-rendered page support via go-rod
- Docker deployment with Caddy reverse proxy
- File token auth for HTTP mode
Install
go install github.com/hegner123/webfetch-clean@latestThe Problem: Raw HTML wastes thousands of tokens
# Claude's built-in WebFetch returns the entire raw HTML
# A typical documentation page:
# - 100KB raw HTML
# - ~25,000 tokens consumed
# - 80% is scripts, styles, nav, ads, tracking
# You pay for ALL of it as input tokens:
# 25,000 tokens x $0.003/1K = $0.075 per page
# 100 pages/day = $7.50/day = $225/monthSolution
$ webfetch-clean --cli --url https://go.dev/doc/effective_goOutput
# Effective Go
## Introduction
Go is a new language. Although it borrows ideas from
existing languages, it has unusual properties that make
effective Go programs different in character from programs
written in its relatives...
## Formatting
Formatting issues are the most contentious but the least
consequential...
<!-- Clean markdown output: ~1,000 tokens instead of ~25,000 -->Comparison
| Metric | Value |
|---|---|
| Token savings (simple page, 10KB) | 93% (~2,334 tokens saved) |
| Token savings (docs page, 100KB) | 96% (~23,987 tokens saved) |
| Monthly savings (100 pages/day) | $432/month |
| Annual savings | ~$5,184 |