# Bun Reader API

Convert any URL to clean, LLM-friendly markdown.

**Base URL:** `http://localhost:3000` (or your deployed URL)

---

## Quick Start

```bash
# Get markdown from any URL
curl http://localhost:3000/https://example.com

# Get JSON response
curl -H "x-respond-with: json" http://localhost:3000/https://example.com
```

---

## Endpoints

### `GET /{url}`

Fetch and convert a URL to readable content.

**URL Formats:**
```
GET /{encoded-url}
GET /?url={encoded-url}
```

**Examples:**
```bash
# Path format
http://localhost:3000/https://example.com/page

# Query parameter format
http://localhost:3000/?url=https://example.com/page
```

### `GET /` or `GET /health`

Health check endpoint.

**Response:**
```json
{
  "status": "ok",
  "name": "bun-reader",
  "version": "1.0.0",
  "features": ["puppeteer", "screenshot", "cache", "stealth"]
}
```

---

## Request Headers

| Header | Values | Default | Description |
|--------|--------|---------|-------------|
| `x-respond-with` | `markdown`, `html`, `text`, `json`, `screenshot` | `markdown` | Response format |
| `x-no-cache` | `true`, `false` | `false` | Bypass cache |
| `x-use-browser` | `true`, `false` | `false` | Force headless browser (for JS-heavy sites) |
| `x-use-proxy` | `true`, `false` | `false` | Route through proxy |
| `x-target-selector` | CSS selector | - | Extract specific element only |
| `x-wait-for-selector` | CSS selector | - | Wait for element before extraction |
| `x-with-images` | `true`, `false` | `true` | Include images in output |
| `x-with-links` | `true`, `false` | `true` | Include link URLs in output |
| `x-timeout` | milliseconds | `30000` | Request timeout |
| `x-set-cookie` | cookie string | - | Set cookies for the request |

---

## Query Parameters

| Parameter | Description |
|-----------|-------------|
| `url` | Target URL (alternative to path format) |
| `format` | Response format (same as `x-respond-with`) |
| `cookies` | Set cookies for the request |

---

## Response Formats

### Markdown (default)

```bash
curl http://localhost:3000/https://example.com
```

```markdown
# Page Title

Content converted to clean markdown...
```

### JSON

```bash
curl -H "x-respond-with: json" http://localhost:3000/https://example.com
```

```json
{
  "url": "https://example.com",
  "title": "Page Title",
  "content": "Markdown content...",
  "html": "<p>Original HTML...</p>",
  "text": "Plain text content...",
  "description": "Meta description",
  "image": {
    "src": "https://example.com/product-image.jpg",
    "alt": "Product name"
  },
  "length": 1234,
  "tokenCount": 456,
  "fetchedAt": "2024-01-01T00:00:00.000Z",
  "warnings": []
}
```

The `image` field contains the main product/page image extracted from:
1. Open Graph `og:image` meta tag
2. JSON-LD Product schema
3. Twitter card image
4. Fallback: first significant image in content area

### HTML

```bash
curl -H "x-respond-with: html" http://localhost:3000/https://example.com
```

Returns cleaned HTML content.

### Text

```bash
curl -H "x-respond-with: text" http://localhost:3000/https://example.com
```

Returns plain text without formatting.

### Screenshot

```bash
curl -H "x-respond-with: screenshot" http://localhost:3000/https://example.com
```

Returns base64-encoded PNG screenshot.

---

## Response Headers

| Header | Description |
|--------|-------------|
| `X-Cache` | `HIT` or `MISS` - indicates cache status |
| `X-Token-Count` | Estimated token count for LLM usage |
| `X-Proxy-IP` | IP address used (when proxy enabled) |

---

## Examples

### Basic Usage

```bash
# Get article as markdown
curl "http://localhost:3000/https://news.ycombinator.com"

# Get as JSON with metadata
curl -H "x-respond-with: json" \
  "http://localhost:3000/https://news.ycombinator.com"
```

### JavaScript-Heavy Sites

```bash
# Force browser rendering for SPAs
curl -H "x-use-browser: true" \
  "http://localhost:3000/https://twitter.com/username"
```

### Extract Specific Content

```bash
# Get only the main article
curl -H "x-target-selector: article" \
  "http://localhost:3000/https://blog.example.com/post"

# Wait for dynamic content
curl -H "x-use-browser: true" \
     -H "x-wait-for-selector: .loaded" \
  "http://localhost:3000/https://spa-site.com"
```

### Clean Output for LLMs

```bash
# Remove images and link URLs for cleaner LLM input
curl -H "x-with-images: false" \
     -H "x-with-links: false" \
  "http://localhost:3000/https://example.com"
```

### Take Screenshot

```bash
# Get base64 screenshot
curl -H "x-respond-with: screenshot" \
     -H "x-use-browser: true" \
  "http://localhost:3000/https://example.com"
```

### Bypass Cache

```bash
# Force fresh fetch
curl -H "x-no-cache: true" \
  "http://localhost:3000/https://example.com"
```

---

## Error Handling

Errors return HTTP 400 with JSON body:

```json
{
  "error": "Error message description",
  "requestId": 123
}
```

Common errors:
- Invalid URL format
- Target site unreachable
- Timeout exceeded
- Browser rendering failed

---

## Rate Limits

- Results are cached for improved performance
- Use `x-no-cache: true` sparingly
- Browser mode (`x-use-browser: true`) is slower than fetch mode

---

## CORS

All endpoints support CORS with:
- `Access-Control-Allow-Origin: *`
- `Access-Control-Allow-Headers: *`

Can be used directly from browser JavaScript.
