Skip to content
All Projects
AutomationSelf-Hosted

LeadGenEngine

A Zero-Cost-Per-Lookup B2B Email-Discovery Pipeline

A self-hosted pipeline that turns a company domain into SMTP-verified emails for titled employees at effectively zero marginal cost replacing a five-figure annual enrichment-API bill.

PythonFastAPIPlaywrightCamoufoxScrapyhttpxdnspythonMongoDBMotor
Problem Statement

A sales team was bleeding budget on per-record enrichment APIs that metered every lookup, yet still couldn’t reliably reach the SEO decision-makers that mattered. Naive email guessing tops out near a 65% hit rate, and every bounce erodes sender reputation.

  • Per-lookup API pricing turned prospecting into a tax on growth at 1k–10k lookups/month.
  • Target SEO persona was under-represented in vendor contributory databases.
  • Blind permutation hits ~65% accuracy; misses damage domain deliverability.
Headline Outcomes
≈ $0self-hosted

Marginal cost per lookup

Metered (paid API)

75–85%+10–20 pts

Deliverable find-rate

~65%

< 5%

Bounce on high-confidence sends

Uncontrolled

The Solution

A staged, self-hosted pipeline that mirrors a diligent researcher: start from public signals, corroborate independently, and only trust an address once the mail server confirms it with adaptive routing instead of one rigid recipe.

A DNS MX-record gate fingerprints each domain before guessing, rerouting no-mail domains to personal-account discovery.

Discovery beats permutation: harvests real public emails to infer the company’s single naming convention.

A fused 0–100 confidence verdict (pattern + provider + signals) classifies every candidate as accept/review/reject.

On-prem SMTP verification closes the loop output is verified-only, and every confirmation locks the domain pattern.

System Architecture

How the data flows

01

Company Domain

Input

02

LinkedIn Page

Titled employees

03

Email Permutations

Ranked by inferred format

04

SMTP Verify

On-prem, server-side

05

Verified Output

Confirmed mailboxes only

Result 01

Replaced a recurring five-figure enrichment-API line item with an owned open-source asset.

Result 02

Closed the SEO-persona coverage gap by sourcing directly from LinkedIn and live search.

Result 03

Self-improving: each verified address locks a domain’s pattern so accuracy compounds.

Available for new work

Have a backend, AI, or data problem worth solving?

From production APIs to self-hosted AI that kills per-call costs let's scope it. I reply within one business day.