# chrome-browser-tool

**Repository Path**: lvhaodeyeye/chrome-browser-tool

## Basic Information

- **Project Name**: chrome-browser-tool
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-02-12
- **Last Updated**: 2026-02-12

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Chrome Browser Tool

A Python tool for browser automation with Chrome using Selenium. Provides a high-level API for web scraping, search engine integration (Google, Bing China, Bing International), and content extraction optimized for AI context windows.

## Features

- **Browser Automation**: Control Chrome browser with a simple, intuitive API
- **Search Engine Integration**: Built-in support for Google, Bing China (cn.bing.com), and Bing International (www.bing.com)
- **Content Extraction**: Convert web pages to Markdown, extract structured data
- **Token-aware Processing**: Track token usage when processing web content, with truncation strategies for AI context windows
- **CLI Interface**: Command-line tool for quick browser operations
- **No API Keys Required**: Uses local Chrome browser for all operations

## Installation

### Using uv (recommended)

```bash
# Navigate to project directory
cd chrome-browser-tool

# Create virtual environment and install
uv venv
uv pip install -e ".[dev]"
```

### Using pip

```bash
cd chrome-browser-tool
pip install -e ".[dev]"
```

## Quick Start

### Python API

#### Search

```python
from chrome_browser_tool.search import SearchEngineFactory

# Search using Google
with SearchEngineFactory.create("google") as engine:
    results = engine.search("python programming", num_results=5)
    for result in results.results:
        print(f"{result.position}. {result.title}")
        print(f"   URL: {result.url}")
        print(f"   {result.snippet}")

# Search using Bing China
with SearchEngineFactory.create("bing_cn") as engine:
    results = engine.search("人工智能", num_results=5)
    ...

# Search using Bing International
with SearchEngineFactory.create("bing_intl") as engine:
    results = engine.search("machine learning", num_results=5)
    ...
```

#### Fetch and Extract Content

```python
from chrome_browser_tool.browser import Browser
from chrome_browser_tool.fetch import ContentExtractor, ContentProcessor

with Browser(headless=True) as browser:
    # Navigate to a page
    browser.get("https://example.com/article")

    # Get page source
    html = browser.get_page_source()

    # Extract content
    extractor = ContentExtractor()
    result = extractor.extract(html, "https://example.com/article")

    print(f"Title: {result.title}")
    print(f"Content: {result.content}")
    print(f"Tokens: {result.metadata.token_count}")

    # Truncate for AI context window
    processor = ContentProcessor()
    truncated, was_truncated = processor.truncate_by_tokens(result.content, max_tokens=4000)
```

### CLI Usage

```bash
# Search using Google
chrome-tool search "python tutorials"

# Search using Bing China
chrome-tool search "人工智能" --engine bing-cn

# Search using Bing International
chrome-tool search "machine learning" --engine bing-intl

# Fetch and extract content from URL
chrome-tool fetch https://example.com

# Fetch with token limit (AI-friendly)
chrome-tool fetch https://example.com --max-tokens 4000

# Fetch as JSON
chrome-tool fetch https://example.com --format json

# Configuration
chrome-tool config set default_engine bing-cn
chrome-tool config list
```

## Project Structure

```
chrome-browser-tool/
├── src/
│   └── chrome_browser_tool/
│       ├── __init__.py       # Package exports
│       ├── browser.py        # Browser class (Selenium wrapper)
│       ├── search.py         # Search engine implementations
│       ├── fetch.py          # Content extraction and processing
│       └── cli.py            # Command-line interface
├── examples/
│   ├── basic_search.py       # Search examples
│   ├── fetch_article.py      # Content extraction examples
│   └── batch_fetch.py        # Batch processing examples
├── tests/
│   ├── __init__.py
│   ├── test_browser.py       # Browser tests
│   ├── test_search.py        # Search engine tests
│   ├── test_fetch.py         # Content extraction tests
│   └── conftest.py           # Pytest fixtures
├── pyproject.toml            # Project configuration
└── README.md                 # This file
```

## Examples

### Basic Search

```bash
python examples/basic_search.py "python programming"
python examples/basic_search.py -e bing-intl "machine learning"
python examples/basic_search.py -e bing-cn "人工智能"
```

### Fetch Article

```bash
python examples/fetch_article.py https://example.com/article
python examples/fetch_article.py -t 500 https://example.com/article
python examples/fetch_article.py -o article.json https://example.com/article
```

### Batch Processing

```bash
# Fetch multiple URLs
python examples/batch_fetch.py -u "https://example.com" "https://example.org"

# Search and fetch top results
python examples/batch_fetch.py -s "python programming" -n 3

# Fetch from file
python examples/batch_fetch.py -f urls.txt -o results.json
```

## Development

```bash
# Run tests
pytest

# Run linting
ruff check .
ruff format .

# Install in development mode
uv pip install -e ".[dev]"
```

## Requirements

- Python >= 3.10
- Chrome browser installed
- ChromeDriver (automatically managed by selenium)

## AI-Friendly Output

The tool is designed to produce output that's optimized for AI context windows:

- **Token counting**: Uses tiktoken for accurate token estimation (cl100k_base encoding)
- **Smart truncation**: Truncates at sentence boundaries when possible
- **Semantic chunking**: Split content at paragraph boundaries
- **Metadata preservation**: Includes word count, token count, description
- **Markdown format**: Clean, structured output with preserved links

## License

MIT License