# chrome-browser-tool **Repository Path**: lvhaodeyeye/chrome-browser-tool ## Basic Information - **Project Name**: chrome-browser-tool - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-02-12 - **Last Updated**: 2026-02-12 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Chrome Browser Tool A Python tool for browser automation with Chrome using Selenium. Provides a high-level API for web scraping, search engine integration (Google, Bing China, Bing International), and content extraction optimized for AI context windows. ## Features - **Browser Automation**: Control Chrome browser with a simple, intuitive API - **Search Engine Integration**: Built-in support for Google, Bing China (cn.bing.com), and Bing International (www.bing.com) - **Content Extraction**: Convert web pages to Markdown, extract structured data - **Token-aware Processing**: Track token usage when processing web content, with truncation strategies for AI context windows - **CLI Interface**: Command-line tool for quick browser operations - **No API Keys Required**: Uses local Chrome browser for all operations ## Installation ### Using uv (recommended) ```bash # Navigate to project directory cd chrome-browser-tool # Create virtual environment and install uv venv uv pip install -e ".[dev]" ``` ### Using pip ```bash cd chrome-browser-tool pip install -e ".[dev]" ``` ## Quick Start ### Python API #### Search ```python from chrome_browser_tool.search import SearchEngineFactory # Search using Google with SearchEngineFactory.create("google") as engine: results = engine.search("python programming", num_results=5) for result in results.results: print(f"{result.position}. {result.title}") print(f" URL: {result.url}") print(f" {result.snippet}") # Search using Bing China with SearchEngineFactory.create("bing_cn") as engine: results = engine.search("人工智能", num_results=5) ... # Search using Bing International with SearchEngineFactory.create("bing_intl") as engine: results = engine.search("machine learning", num_results=5) ... ``` #### Fetch and Extract Content ```python from chrome_browser_tool.browser import Browser from chrome_browser_tool.fetch import ContentExtractor, ContentProcessor with Browser(headless=True) as browser: # Navigate to a page browser.get("https://example.com/article") # Get page source html = browser.get_page_source() # Extract content extractor = ContentExtractor() result = extractor.extract(html, "https://example.com/article") print(f"Title: {result.title}") print(f"Content: {result.content}") print(f"Tokens: {result.metadata.token_count}") # Truncate for AI context window processor = ContentProcessor() truncated, was_truncated = processor.truncate_by_tokens(result.content, max_tokens=4000) ``` ### CLI Usage ```bash # Search using Google chrome-tool search "python tutorials" # Search using Bing China chrome-tool search "人工智能" --engine bing-cn # Search using Bing International chrome-tool search "machine learning" --engine bing-intl # Fetch and extract content from URL chrome-tool fetch https://example.com # Fetch with token limit (AI-friendly) chrome-tool fetch https://example.com --max-tokens 4000 # Fetch as JSON chrome-tool fetch https://example.com --format json # Configuration chrome-tool config set default_engine bing-cn chrome-tool config list ``` ## Project Structure ``` chrome-browser-tool/ ├── src/ │ └── chrome_browser_tool/ │ ├── __init__.py # Package exports │ ├── browser.py # Browser class (Selenium wrapper) │ ├── search.py # Search engine implementations │ ├── fetch.py # Content extraction and processing │ └── cli.py # Command-line interface ├── examples/ │ ├── basic_search.py # Search examples │ ├── fetch_article.py # Content extraction examples │ └── batch_fetch.py # Batch processing examples ├── tests/ │ ├── __init__.py │ ├── test_browser.py # Browser tests │ ├── test_search.py # Search engine tests │ ├── test_fetch.py # Content extraction tests │ └── conftest.py # Pytest fixtures ├── pyproject.toml # Project configuration └── README.md # This file ``` ## Examples ### Basic Search ```bash python examples/basic_search.py "python programming" python examples/basic_search.py -e bing-intl "machine learning" python examples/basic_search.py -e bing-cn "人工智能" ``` ### Fetch Article ```bash python examples/fetch_article.py https://example.com/article python examples/fetch_article.py -t 500 https://example.com/article python examples/fetch_article.py -o article.json https://example.com/article ``` ### Batch Processing ```bash # Fetch multiple URLs python examples/batch_fetch.py -u "https://example.com" "https://example.org" # Search and fetch top results python examples/batch_fetch.py -s "python programming" -n 3 # Fetch from file python examples/batch_fetch.py -f urls.txt -o results.json ``` ## Development ```bash # Run tests pytest # Run linting ruff check . ruff format . # Install in development mode uv pip install -e ".[dev]" ``` ## Requirements - Python >= 3.10 - Chrome browser installed - ChromeDriver (automatically managed by selenium) ## AI-Friendly Output The tool is designed to produce output that's optimized for AI context windows: - **Token counting**: Uses tiktoken for accurate token estimation (cl100k_base encoding) - **Smart truncation**: Truncates at sentence boundaries when possible - **Semantic chunking**: Split content at paragraph boundaries - **Metadata preservation**: Includes word count, token count, description - **Markdown format**: Clean, structured output with preserved links ## License MIT License