The complete Common Crawl web graph history, compiled into SQLite databases with instant key-value lookup. Includes Normalized PageRank and Harmonic Centrality scores on a 0–100 scale, raw values, rank positions, and host counts for every Common Crawl snapshot. Search any domain or hostname and get its full ranking history. No API limits, no rate restrictions — run unlimited queries on your own hardware.
Search This Database Using the Free Public APIData sourced from the Common Crawl Web Graph.
Yearly Access
Updated with every new Common Crawl release — save ~50% vs. monthly
Domains Only
- 438 million+ domains
- Complete history from May 2017
- 32 sharded SQLite files
- Normalized PageRank & HC (0–100)
- Raw values, positions, host counts
- Every update for 12 months
Hostnames Only
- 4.7 billion+ hostnames
- History from February 2020 forward
- 32 sharded SQLite files
- Normalized PageRank & HC (0–100)
- Raw values and rank positions
- Every update for 12 months
Both
- Everything in both packages
- 64 total SQLite files (32 domain + 32 host)
- Full domain + hostname coverage
- All metrics, all snapshots
- Best value for comprehensive analysis
- Every update for 12 months
30-Day Access
One-time download — no recurring charges
What You Get
SQLite Format
Portable, fast, no server required. Open with any SQLite client, any programming language. Works on Linux, macOS, and Windows.
Instant Key-Value Lookup
Each database is indexed for sub-millisecond lookups. Query any hostname or domain and get its full history instantly.
Normalized Scores (0–100)
Raw PageRank and Harmonic Centrality values vary wildly between datasets — wide-ranging floats that are hard to interpret. For each Common Crawl release, fresh normalization is calculated (not a single static formula), producing clean 0–100 integer scores you can graph, compare across time, and immediately understand.
Complete Metric History
Every crawl snapshot includes: PageRank (raw + normalized), Harmonic Centrality (raw + normalized), rank positions, and host counts (domain-level).
No Rate Limits
Run as many queries as you want, as fast as you want. The database runs on your hardware — no API calls, no daily caps, no throttling.
Monthly Updates
Common Crawl now publishes new web graph data approximately once per month. With a yearly plan, updated databases are available for download within 48–72 hours of each new release — every update included for 12 months.
Who Uses This Data
Built for teams and individuals who need domain authority data at scale — without per-query costs.
SEO & Marketing Teams
Build marketing databases where you need to prioritize domains and opportunities. At the scale most SEO projects operate, using a paid API would get astronomically expensive — this database makes it a flat cost.
Data Scientists & Analysts
Run bulk analysis across hundreds of millions of domains. Build internal ranking tools, track authority trends over time, and feed normalized scores directly into your pipelines.
ML & LLM Training Pipelines
When building training datasets, use authority scores to find the most authoritative version of content across multiple sources, or to weight content from established sites more heavily. Use the scores to influence how an LLM, neural network, or classifier values source material.
Competitive Intelligence
Track how domains and hostnames rise and fall in authority over time. Identify emerging competitors, monitor acquisitions, or evaluate link-building impact — all with full historical data.
Replace Your Existing Authority Data
If you currently pay for domain authority metrics from Ahrefs, Majestic, or Moz, you can replace that data with this — at a significantly lower cost. The vast majority of use cases are fully covered. The only exception: if you rely on something unique to a specific provider’s proprietary methodology.
Delivery & Technical Details
Direct Download Links
After purchase, you’ll enter the IP address you want to download from. Each link is validated against your IP. You’ll receive 32 download links per dataset (32 for domains, 32 for hosts). If you need to change your IP later, email support@customdatasets.com.
Zstandard Compression
Files are compressed with zstd at level 19 (highest before ultra). Significantly smaller than gzip, and decompresses very quickly. You’ll need the zstd tool to decompress. Compressed total for both datasets: ~400 GB.
Disk Space Requirements
Decompressed: domains ~272.5 GB, hosts ~847 GB, both ~1.12 TB. A 2 TB drive is comfortable for the full dataset. If space is tight, download and decompress one file at a time, deleting the compressed version before the next.
Compatibility
SQLite3 format — works on any OS (Linux, macOS, Windows) with any language that has SQLite bindings. Data is sharded into 32 files per dataset; see the database documentation for the DJB2 hash lookup reference.
Need help working with the database? Read the database documentation for schema details, query examples, and hash sharding reference.
Want to try before you buy? Use the free search tool or the free API to explore the data first.