Custom Datasets

Who This Is For

Teams That Need Data, Not Another Project

You know the data exists. You know it can be extracted. You just don't have the time or the team to do it yourself.

In-House Marketing Teams

SEO, analytics, competitive intelligence

Your engineers are busy. Your analysts are buried. You need someone who understands both marketing and engineering — and can take a data project from concept to deliverable without hand-holding.

SEO teams needing competitive or authority data
Brand teams monitoring market presence
Analytics teams that need clean, structured datasets
Data teams without bandwidth for one-off projects

Agencies

SEO, marketing, digital, PR

Your client needs data you can't build in-house. I build it, you deliver it. Clean handoff, no drama. You set your own margin.

Client requests that don't justify a full-time hire
Complex data work outside your team's skill set
White-label delivery — your client, your relationship
Repeatable partnerships on ongoing client work

Case Studies

What I Build for Teams

Every project is different. Here are the kinds of work that come through the door most often.

Common Crawl Extraction

Parse and analyze pages across Common Crawl releases at scale. Extract structured data from HTML, URLs, tags, and page elements across billions of records.

SERP Analysis & Opportunity Mining

Hundreds of thousands of search queries, downloading and analyzing every ranked URL. Contact info, partnership opportunities, competitive gaps — extracted and structured.

Competitive Intelligence

Track competitors across web properties, search results, and market signals. Build databases your team can search and act on immediately.

Brand & Reputation Monitoring

Monitor search results, news, RSS feeds, and web mentions for brand terms, product names, executives, and competitors. Near-real-time, highly thorough.

Lead & Partnership Databases

Build searchable databases of sponsorship opportunities, link prospects, local partnerships, and outreach targets — nationwide or by specific location.

Large-Scale Web Scraping

Phone numbers, emails, social accounts, named entities — extracted, validated, and delivered in your preferred format. CSV, JSON, SQLite, Excel.

Work With Ben Wills

13+ years in marketing. 13+ years in engineering. 26+ years of combined experience building datasets, managing teams, and solving hard data problems for companies of every size.

Who I Am

Most marketers can't engineer complex data systems. Most engineers don't understand marketing well enough to build the right thing. I've spent over a decade in each discipline, and that combination is rare.

On the marketing side, I've directed teams of 70–80 people responsible for over 1,400 SEO client accounts. I've led international SEO campaigns spanning 30–40 countries for companies like a major home improvement retailer. I was the weekly point of contact for Fortune 500 accounts. I grew a company from zero to $140K+/month in revenue in under a year as VP of Operations. I've run SEO and PPC campaigns for small businesses, managed agency teams, and built marketing strategy at every scale.

On the engineering side, I've built large-scale web scraping and indexing systems that process billions of records. I wrote a marketing SaaS platform from scratch in pure C that could download and parse up to 400 million URLs per day. I've built custom databases, data pipelines, sharded storage systems, firmware for embedded devices, cross-platform networking libraries, and the complete Common Crawl web graph history — over 850 billion records compiled into queryable SQLite databases.

When you describe a data problem to me, I don't just understand the technical requirements — I understand the marketing objective behind it. I know what you're trying to accomplish, why it matters, and how the data needs to be structured so your team can actually use it. That's what 26 years across both disciplines gives you.

How It Works

Every project starts with a conversation about what you're trying to accomplish — not a requirements document or a feature list. I want to understand the business objective. Once I understand the goal, I scope the work, define exactly what you'll receive, and quote a fixed price. No hourly billing surprises. No scope creep. You know what you're getting and what it costs before anything starts.

From there, I build. You get regular updates with working deliverables — not status reports, not slide decks. Actual data you can look at. If something needs to change mid-project, we talk about it and adjust. I'm direct about what I can and can't do. If I think something won't work, I'll tell you before I waste your time or money on it.

Deliverables are production-grade. You get the dataset in whatever format your team needs — CSV, JSON, Excel, SQLite — along with schema definitions, field-level documentation, notes on assumptions and edge cases, and QA methodology. Everything is clean, documented, and ready to plug into your workflows.

Revisions Are Built In

Here's something I've learned from years of doing this work: once people see their data for the first time, they almost always want it differently than they originally described. That's not a failure of scoping — it's how data projects actually work. You don't fully know what you need until you see what's possible.

Iterations and revisions are baked into every project I take on. The price I quote assumes we're going to go back and forth. I expect it. I'll refine the deliverable until it's right. If you ask for something that's outside the scope, I'll tell you — but within the scope of the project, I'm not going to nickel-and-dime you on changes. The goal is a deliverable your team actually uses, not one that technically meets a spec but sits in a folder.

Schedule a Call

Engineering Experience

Large-scale web scraping & indexing systems
Data pipeline architecture (billions of records)
Common Crawl parsing & extraction at scale
Custom database design (SQLite, key-value, sharded)
Built a marketing SaaS in pure C — 400M URLs/day
Firmware development (ESP32, PIC32, custom protocols)
API design & development

Marketing Experience

Directed 70–80 person team across 1,400+ SEO accounts
Led international SEO campaigns (30–40 countries)
Weekly point person for Fortune 500 accounts
VP of Operations — $0 to $140K+/month in 9 months
SEO & PPC for SMBs, agencies, and enterprise

Example Projects

Built partnership database across 750+ locations for a worldwide hotel chain — 750 custom spreadsheets delivered
Nationwide sponsored link opportunity database from 100K+ Google search queries for a national SEO firm
Compiled the complete Common Crawl web graph history — 850B+ records across 15+ years of crawl data
SERP monitoring & competitive analysis for a major SaaS company — hundreds of thousands of search queries analyzed
Built brand monitoring system tracking mentions across search results, news, RSS feeds, and web sources in near-real-time

FAQ

Common Questions

What format do you deliver data in?

Whatever works for your team. CSV, JSON, Excel, SQLite — I deliver in your preferred format with field-level documentation and schema definitions. The web graph databases are SQLite.

How does pricing work for custom projects?

Almost always fixed price. I scope the project, define clear deliverables, and quote a number before work begins. Revisions and iterations are included — we go back and forth until it's right.

How long does a typical project take?

It depends entirely on the project. There's no single answer. Some are a few weeks, some are longer. I'll give you a realistic timeline during the scoping conversation.

What if I need changes after delivery?

That's expected. Once people see their data, they almost always want adjustments. Iterations are baked into every project. If you need something I can't provide, I'll tell you upfront.

What's the web graph database?

The complete Common Crawl web graph history, compiled into SQLite databases. Search any domain or hostname and get full historical metrics including custom normalized PageRank and Harmonic Centrality scores (0–100).

Is the API really free?

Yes. No API key, no registration, no credit card. 100 hostnames per day, up to 10 per request. Just hit the endpoint and get data back. Resets every 24 hours.

Web Graph Database

The Complete Common Crawl Web Graph — Ready to Query

The raw Common Crawl web graph data is a collection of tab-separated value files you have to download and compile yourself. No database. No search. Just raw data and a lot of work.

I've done that work for you. Every release, every domain, every hostname — compiled into SQLite databases with instant key-value lookup. Search any domain or hostname and get its complete history across every metric, every crawl.

What's Included

Full historical PageRank and Harmonic Centrality for every entity
Normalized PageRank score (0–100) — custom, far more useful for analysis
Normalized Harmonic Centrality score (0–100) — clean range for visualization
Rank position, n_hosts (domain-level), and all original metrics
SQLite format — portable, fast, no server required
Separate databases for domain queries and hostname queries

$999

Domains (~300GB) — one-time

$1,999

Hostnames (~850GB) — one-time

$2,499

Both (~1.15TB) — one-time

Yearly subscriptions available — updated with each new Common Crawl release.

View Full Pricing

When You Just Want Data,
Not Another Project.

Custom Data Projects

Common Crawl Web Graph History

Free Web Graph Search & API

Teams That Need Data, Not Another Project

In-House Marketing Teams

Agencies

What I Build for Teams

Common Crawl Extraction

SERP Analysis & Opportunity Mining

Competitive Intelligence

Brand & Reputation Monitoring

Lead & Partnership Databases

Large-Scale Web Scraping

Work With Ben Wills

Who I Am

How It Works

Revisions Are Built In

Engineering Experience

Marketing Experience

Example Projects

Common Questions

What format do you deliver data in?

How does pricing work for custom projects?

How long does a typical project take?

What if I need changes after delivery?

What's the web graph database?

Is the API really free?

Companies I Have Directly Worked With

Schedule a Call

The Complete Common Crawl Web Graph — Ready to Query

What's Included

Recent Data Updates

CC-2026-09 WebGraph Added

Improved Normalization

API Rate Limits Increased

Hostname Database Optimization

When You Just Want Data,Not Another Project.

Custom Data Projects

Common Crawl Web Graph History

Free Web Graph Search & API

Teams That Need Data, Not Another Project

In-House Marketing Teams

Agencies

What I Build for Teams

Common Crawl Extraction

SERP Analysis & Opportunity Mining

Competitive Intelligence

Brand & Reputation Monitoring

Lead & Partnership Databases

Large-Scale Web Scraping

Work With Ben Wills

Who I Am

How It Works

Revisions Are Built In

Engineering Experience

Marketing Experience

Example Projects

Common Questions

What format do you deliver data in?

How does pricing work for custom projects?

How long does a typical project take?

What if I need changes after delivery?

What's the web graph database?

Is the API really free?

Companies I Have Directly Worked With

Schedule a Call

The Complete Common Crawl Web Graph — Ready to Query

What's Included

Recent Data Updates

CC-2026-09 WebGraph Added

Improved Normalization

API Rate Limits Increased

Hostname Database Optimization

When You Just Want Data,
Not Another Project.