webfetch-clean

Web fetcher with 90-96% token savings

Go Web & Content GitHub

Fetch web pages, strip clutter (ads, scripts, navigation), and output clean Markdown or HTML. Provides 90-96% token cost savings compared to raw HTML fetching.

Features

  • Triple-mode: CLI, stdio MCP server, or HTTP server
  • Multi-pass HTML cleaning removes ads, scripts, nav, sidebars, popups
  • Output as Markdown or HTML
  • Screenshot capture via headless browser
  • JavaScript-rendered page support via go-rod
  • Docker deployment with Caddy reverse proxy
  • File token auth for HTTP mode

Install

go install github.com/hegner123/webfetch-clean@latest

The Problem: Raw HTML wastes thousands of tokens

# Claude's built-in WebFetch returns the entire raw HTML
# A typical documentation page:
#   - 100KB raw HTML
#   - ~25,000 tokens consumed
#   - 80% is scripts, styles, nav, ads, tracking

# You pay for ALL of it as input tokens:
# 25,000 tokens x $0.003/1K = $0.075 per page
# 100 pages/day = $7.50/day = $225/month

Solution

$ webfetch-clean --cli --url https://go.dev/doc/effective_go

Output

# Effective Go

## Introduction

Go is a new language. Although it borrows ideas from
existing languages, it has unusual properties that make
effective Go programs different in character from programs
written in its relatives...

## Formatting

Formatting issues are the most contentious but the least
consequential...

<!-- Clean markdown output: ~1,000 tokens instead of ~25,000 -->

Comparison

MetricValue
Token savings (simple page, 10KB)93% (~2,334 tokens saved)
Token savings (docs page, 100KB)96% (~23,987 tokens saved)
Monthly savings (100 pages/day)$432/month
Annual savings~$5,184