# mmdbconvert **Repository Path**: mirrors_maxmind/mmdbconvert ## Basic Information - **Project Name**: mmdbconvert - **Description**: A command-line tool to merge multiple MaxMind MMDB databases and export to CSV, Parquet, or MMDB format. - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-11-16 - **Last Updated**: 2026-06-20 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # mmdbconvert A command-line tool to merge multiple MaxMind MMDB databases and export to CSV, Parquet, or MMDB format. [![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/license/Apache-2.0) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/license/MIT) [![Go Version](https://img.shields.io/badge/Go-1.25%2B-00ADD8?logo=go)](https://go.dev/) ## Features - ✅ **Merge multiple MMDB databases** - Combine GeoIP databases (e.g., Enterprise + Anonymous IP) - ✅ **Non-overlapping networks** - Automatically resolves overlapping networks to smallest blocks - ✅ **Adjacent network merging** - Combines adjacent networks with identical data for compact output - ✅ **Multiple output formats** - Export to CSV, Parquet, or MMDB format - ✅ **Query-optimized Parquet** - Integer columns enable 10-100x faster IP lookups - ✅ **Type-preserving MMDB output** - Perfect type preservation for merged databases - ✅ **Flexible column mapping** - Extract any fields from MMDB databases using JSON paths - ✅ **IPv4 and IPv6 support** - Handle both IP versions seamlessly - ✅ **Type hints for Parquet** - Native int64, float64, bool types for efficient storage ## Installation ### Binary Releases (Recommended) Download pre-built binaries from the [GitHub Releases page](https://github.com/maxmind/mmdbconvert/releases). > **Architecture Guide:** > > - `amd64` = x86-64 / x64 (most common for Intel/AMD processors) > - `arm64` = ARM 64-bit (Apple Silicon, AWS Graviton, Raspberry Pi 4+) > - `darwin` = macOS > - Replace `` with the release version (e.g., `0.1.0`) > - Replace `` with your architecture (e.g., `amd64` or `arm64`) #### Linux **Using .deb package (Debian/Ubuntu):** 1. Download the `.deb` file for your architecture from the releases page 2. Install using dpkg: ```bash sudo dpkg -i mmdbconvert__.deb ``` **Using .rpm package (RedHat/CentOS/Fedora):** 1. Download the `.rpm` file for your architecture from the releases page 2. Install using rpm: ```bash sudo rpm -i mmdbconvert__.rpm ``` **Using tar.gz archive:** 1. Download the Linux tar.gz file for your architecture from the releases page 2. Extract and install: ```bash tar -xzf mmdbconvert__linux_.tar.gz sudo mv mmdbconvert/mmdbconvert /usr/local/bin/ ``` #### macOS 1. Download the macOS tar.gz file for your architecture from the releases page: - `darwin_arm64` for Apple Silicon (M1/M2/M3/M4) - `darwin_amd64` for Intel Macs 2. Extract and install: ```bash tar -xzf mmdbconvert__darwin_.tar.gz sudo mv mmdbconvert/mmdbconvert /usr/local/bin/ ``` #### Windows 1. Download the Windows zip file for your architecture from the releases page 2. Extract the zip file 3. Add the `mmdbconvert.exe` binary to your PATH or run it directly from the extracted location **Using PowerShell:** ```powershell # Extract (adjust filename to match your download) Expand-Archive -Path mmdbconvert__windows_.zip -DestinationPath . # Run .\mmdbconvert\mmdbconvert.exe --version ``` > **Note:** ARM64 binaries are available for all platforms. Choose the > appropriate architecture for your system. ### From Source ```bash go install github.com/maxmind/mmdbconvert/cmd/mmdbconvert@latest ``` ### Build Locally ```bash git clone https://github.com/maxmind/mmdbconvert.git cd mmdbconvert go build -o mmdbconvert ./cmd/mmdbconvert ``` ## Quick Start ### 1. Create a Configuration File Create `config.toml`: ```toml [output] format = "csv" file = "output.csv" [[databases]] name = "city" path = "/path/to/GeoIP2-City.mmdb" [[columns]] name = "country_code" database = "city" path = ["country", "iso_code"] [[columns]] name = "city_name" database = "city" path = ["city", "names", "en"] ``` ### 2. Run the Tool ```bash mmdbconvert config.toml ``` ### 3. View the Output ```bash head output.csv ``` ``` network,country_code,city_name 1.0.0.0/24,AU,Sydney 1.0.1.0/24,CN,Beijing 1.0.4.0/22,AU,Melbourne ``` > **Note:** The `network` column appears automatically because no > `[[network.columns]]` sections were defined. By default, CSV output includes a > CIDR column named `network`, while Parquet output includes `start_int` and > `end_int` integer columns for faster IP lookups. You can customize network > columns in the configuration. ## Usage ```bash # Basic usage mmdbconvert config.toml # Explicit config flag mmdbconvert --config config.toml # Suppress progress output mmdbconvert --config config.toml --quiet # Disable unmarshaler caching to reduce memory usage (several times slower) mmdbconvert --config config.toml --disable-cache # Show version mmdbconvert --version # Show help mmdbconvert --help ``` ## Configuration See [docs/config.md](docs/config.md) for complete configuration reference. ### CSV Output Example ```toml [output] format = "csv" file = "geo.csv" [output.csv] delimiter = "," # or "\t" for tab-delimited [[network.columns]] name = "network" type = "cidr" [[databases]] name = "city" path = "GeoIP2-City.mmdb" [[columns]] name = "country" database = "city" path = ["country", "iso_code"] ``` ### Parquet Output Example ```toml [output] format = "parquet" file = "geo.parquet" [output.parquet] compression = "snappy" row_group_size = 500000 # Integer columns for fast queries [[network.columns]] name = "start_int" type = "start_int" [[network.columns]] name = "end_int" type = "end_int" [[databases]] name = "city" path = "GeoIP2-City.mmdb" [[columns]] name = "country" database = "city" path = ["country", "iso_code"] type = "string" [[columns]] name = "latitude" database = "city" path = ["location", "latitude"] type = "float64" ``` ### MMDB Output Example ```toml [output] format = "mmdb" file = "merged.mmdb" [output.mmdb] database_type = "GeoIP2-City" description = { en = "Merged GeoIP Database" } record_size = 28 [[databases]] name = "city" path = "GeoIP2-City.mmdb" # Use output_path to create nested structure [[columns]] name = "country_code" database = "city" path = ["country", "iso_code"] output_path = ["country", "iso_code"] [[columns]] name = "city_name" database = "city" path = ["city", "names", "en"] output_path = ["city", "names", "en"] [[columns]] name = "latitude" database = "city" path = ["location", "latitude"] output_path = ["location", "latitude"] [[columns]] name = "longitude" database = "city" path = ["location", "longitude"] output_path = ["location", "longitude"] ``` **MMDB output features:** - Perfect type preservation from source databases - Support for nested structures via `output_path` - Compatible with all MMDB readers (libmaxminddb, etc.) - Configurable record size (24, 28, or 32 bits) ## Querying Parquet Files Parquet files generated with integer columns (`start_int`, `end_int`) support **extremely fast IP lookups** (10-100x faster than string comparisons). ### DuckDB Example ```sql -- Lookup IP address 203.0.113.100 (integer: 3405803876) SELECT * FROM read_parquet('geo.parquet') WHERE start_int <= 3405803876 AND end_int >= 3405803876; ``` **See [docs/parquet-queries.md](docs/parquet-queries.md) for comprehensive query examples and performance optimization guide.** ## Examples ### Merging Multiple Databases Combine GeoIP Enterprise with Anonymous IP data: ```toml [output] format = "parquet" file = "merged.parquet" [[network.columns]] name = "start_int" type = "start_int" [[network.columns]] name = "end_int" type = "end_int" [[databases]] name = "enterprise" path = "GeoIP2-Enterprise.mmdb" [[databases]] name = "anonymous" path = "GeoIP2-Anonymous-IP.mmdb" # Columns from Enterprise database [[columns]] name = "country_code" database = "enterprise" path = ["country", "iso_code"] [[columns]] name = "city_name" database = "enterprise" path = ["city", "names", "en"] [[columns]] name = "latitude" database = "enterprise" path = ["location", "latitude"] type = "float64" [[columns]] name = "longitude" database = "enterprise" path = ["location", "longitude"] type = "float64" # Columns from Anonymous IP database [[columns]] name = "is_anonymous" database = "anonymous" path = ["is_anonymous"] type = "bool" [[columns]] name = "is_anonymous_vpn" database = "anonymous" path = ["is_anonymous_vpn"] type = "bool" ``` ### All Network Column Types ```toml [[network.columns]] name = "network" type = "cidr" # e.g., "203.0.113.0/24" [[network.columns]] name = "start_ip" type = "start_ip" # e.g., "203.0.113.0" [[network.columns]] name = "end_ip" type = "end_ip" # e.g., "203.0.113.255" [[network.columns]] name = "start_int" type = "start_int" # e.g., 3405803776 (IPv4 only) [[network.columns]] name = "end_int" type = "end_int" # e.g., 3405804031 (IPv4 only) [[network.columns]] name = "network_bucket" type = "network_bucket" # Bucket for efficient lookups. Requires split files. ``` **Default network columns:** If you don't define any `[[network.columns]]`, mmdbconvert automatically provides sensible defaults based on output format: - **CSV**: Single `network` column (CIDR format) for human readability - **Parquet**: `start_int` and `end_int` columns for 10-100x faster IP queries **Note:** `start_int` and `end_int` only work with IPv4 addresses unless you split your output into separate IPv4/IPv6 files via `output.ipv4_file` and `output.ipv6_file`. For single-file outputs that include IPv6 data, use string columns (`start_ip`, `end_ip`, `cidr`). ### Network Bucketing for Analytics (BigQuery, etc.) When loading network data into analytics platforms like BigQuery, range queries can be slow due to full table scans. The `network_bucket` column provides a join key that enables efficient queries by first filtering to a specific bucket. **Configuration:** ```toml [output] format = "parquet" ipv4_file = "geoip-v4.parquet" ipv6_file = "geoip-v6.parquet" [output.parquet] ipv4_bucket_size = 16 # Optional, defaults to 16 ipv6_bucket_size = 16 # Optional, defaults to 16 ipv6_bucket_type = "int" # Optional: "string" (default) or "int" [[network.columns]] name = "start_int" type = "start_int" [[network.columns]] name = "end_int" type = "end_int" [[network.columns]] name = "network_bucket" type = "network_bucket" ``` For IPv4, the bucket is a 32-bit integer. For IPv6, the bucket is either a hex string (default) or a 60-bit integer when `ipv6_bucket_type = "int"` is configured. Using `network_bucket` requires split output files. See [docs/bigquery.md](docs/bigquery.md) for BigQuery query examples. **Note:** When a network is larger than the bucket size (e.g., a /15 with /16 buckets), the row is duplicated for each bucket it spans. This ensures queries find the correct network regardless of which bucket the IP falls into. **Note:** `network_bucket` is supported for CSV and Parquet output. ### Data Type Hints Parquet supports native types for efficient storage and queries: ```toml [[columns]] name = "population" database = "city" path = ["city", "population"] type = "int64" # Integer values [[columns]] name = "accuracy_radius" database = "city" path = ["location", "accuracy_radius"] type = "int64" [[columns]] name = "latitude" database = "city" path = ["location", "latitude"] type = "float64" # Floating-point values [[columns]] name = "is_satellite" database = "city" path = ["traits", "is_satellite_provider"] type = "bool" # Boolean values ``` ## Use Cases ### Merging Enterprise + Anonymous IP Merging GeoIP Enterprise with GeoIP Anonymous IP to enrich traffic logs. The merged database provides: - Geographic location data (country, city, coordinates) - Anonymous IP detection (VPN, proxy, hosting provider) - Single query-optimized Parquet file for fast lookups - Non-overlapping networks for accurate IP matching ### Creating Custom MMDB Databases Merge multiple MMDB databases into a single custom database with perfect type preservation: - Combine multiple data sources (GeoIP, ISP, ASN, etc.) - Create application-specific databases with only needed fields - Maintain exact data types from source databases - Deploy merged databases with existing MMDB readers - No performance overhead compared to original databases ### Analytics Pipelines Export MMDB databases to Parquet for use in analytics pipelines: - **DuckDB:** Fast local queries on laptop/server - **Apache Spark:** Distributed processing of billions of logs - **Trino/Presto:** Query data in S3 without downloading - **BigQuery:** Load Parquet files for SQL analysis ### Data Warehouse Integration Convert MMDB databases to CSV/Parquet for loading into data warehouses: - Snowflake - Redshift - BigQuery - Databricks ## Architecture ### Streaming Network Merge mmdbconvert uses a streaming accumulator algorithm: 1. **Nested iteration** through all databases using `NetworksWithin()` 2. **Smallest network selection** - Always chooses most specific network block 3. **Data extraction** from all databases for each network 4. **Adjacent network merging** - Combines networks with identical data ### Non-Overlapping Networks When databases have overlapping networks, mmdbconvert automatically splits them into non-overlapping blocks: **Example:** ``` Database A: 10.0.0.0/16 Database B: 10.0.1.0/24 Output: 10.0.0.0/24 (only in A) 10.0.1.0/24 (in both A and B) 10.0.2.0/23 (only in A) 10.0.4.0/22 (only in A) 10.0.8.0/21 (only in A) ... etc ``` This ensures accurate IP lookups with no ambiguity. ## Documentation - [Configuration Reference](docs/config.md) - Complete config file documentation - [Parquet Query Guide](docs/parquet-queries.md) - Optimizing IP lookup queries - [BigQuery Guide](docs/bigquery.md) - Network bucketing for BigQuery ## Requirements - Go 1.25 or later - MaxMind MMDB database files (GeoIP, GeoLite, etc.) ## License Copyright 2025-2026 MaxMind, Inc. This project is licensed under either of: - Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or https://www.apache.org/licenses/LICENSE-2.0) - MIT License ([LICENSE-MIT](LICENSE-MIT) or https://opensource.org/license/MIT) at your option. ## Contributing Contributions are welcome! Please: 1. Fork the repository 2. Create a feature branch (`git checkout -b feature/amazing-feature`) 3. Make your changes with tests 4. Run linters: `golangci-lint run` 5. Run tests: `go test ./...` 6. Commit your changes (`git commit -m 'Add amazing feature'`) 7. Push to the branch (`git push origin feature/amazing-feature`) 8. Open a Pull Request ## Acknowledgments Built with: - [go-toml](https://github.com/pelletier/go-toml) - TOML configuration parsing - [maxminddb-golang](https://github.com/oschwald/maxminddb-golang) - MMDB database reading - [mmdbwriter](https://github.com/maxmind/mmdbwriter) - MMDB database writing - [parquet-go](https://github.com/parquet-go/parquet-go) - Parquet file writing ## Support - **Issues:** [GitHub Issues](https://github.com/maxmind/mmdbconvert/issues) - **Documentation:** [docs/](docs/) - **MaxMind Support:** [https://support.maxmind.com/knowledge-base](https://support.maxmind.com/knowledge-base)