Lead Enrichment for Near-Free: A Pipeline
I went looking for a practical way to add context to leads, and quickly ran into tools built for teams at a very different scale. Clay is the obvious example. It’s powerful. I’m not taking shots at it. If you’re running big outbound systems, pulling from a bunch of providers, and building a serious production workflow, I get why people use it.
But that wasn’t my situation. I didn’t need a full enrichment machine. I needed a way to solve a smaller problem without pretending I had enterprise problems. We were talking about 1 to 5 leads a month. Not enough to justify a heavyweight platform.
So I built a simpler pipeline. It pulls the basics I cared about: whether the email is personal or business, the company website, industry, size, location, legal entity, and the most likely LinkedIn profile. It runs on Pipedream, uses Serper.dev for search, GPT-4o-mini for extraction, and at small volume it costs basically nothing.
I like tools. I also like not paying for tools I don’t need.
Why Clay wasn’t the right fit for this use case
Clay makes sense when you need a lot of moving parts. High volume. Multiple data sources. Waterfalls. Scoring. Routing. All the stuff that starts to matter when your pipeline gets complicated and the orchestration becomes the work.
But if you’re a founder, a small team, or trying to build a useful system under real constraints, the math changes.
For 1 to 5 leads per month, I didn’t need a giant stack. I needed something practical. If I could reliably figure out:
- whether the email was personal or business
- the company website
- the industry
- the company size
- the location
- the legal entity
- the likely LinkedIn profile
that was enough to be useful.
The cost difference is what pushed me over the edge. Serper.dev gives you 2,500 free queries up front, then costs about $0.001 per query after that. GPT-4o-mini, for this kind of extraction, works out to about $0.00006 per lead. Pipedream’s free tier covers a lot. At roughly 200 leads a month, this thing is effectively free. Even after that, it stays cheap enough that I don’t have to squint at a pricing page and pretend it’s fine.
That was the whole decision. I didn’t need more features. I needed a pipeline that did the core job, didn’t break all the time, and didn’t ask me to light money on fire for the privilege.
How the pipeline works
The flow is simple.
A user enters a company name in the app. The app sends a webhook to Pipedream with the contact name, email, and company. From there, the workflow does the enrichment and sends the result back into the system.
First, I validate the Bearer token in the Pipedream HTTP trigger. Nothing fancy. Enough to keep random junk out.
Then I check the email type. If it’s a personal email like Gmail or Yahoo, I treat it differently, because the domain won’t help much. If it’s a business email, the domain gives me a useful fallback if search gets messy.
After that, I run a Serper.dev search on the company name. That usually gets me the website, and if I’m lucky, I also get a knowledge graph result with structured data like description, headquarters, founders, or employee count.
Then I scrape the company website. If search doesn’t give me a clean answer, I use the email domain as a fallback and check that too. I send the content into GPT-4o-mini and ask it to return structured JSON with:
- industry
- size
- location
- description
- legal entity
Once I have the legal entity, I run a second Serper search for LinkedIn using the company name plus the legal entity. That works better than searching the brand name and hoping for the best.
From there, the workflow updates the contact record in the CRM or app through API, then posts the result to Slack so the team can see it right away.
On the Pipedream side, it’s three parts:
- HTTP Webhook Trigger
- Code Step
- HTTP Response
That’s part of why I like it. No servers. No cron jobs. No weird deployment dance where I break something simple because I got too clever.
The design decisions that made it reliable
This is the part I didn’t appreciate at first. The broad idea was easy. The small choices were where most of the accuracy came from.
The first one was search.
My first instinct was to help Google by searching for something like “CompanyName company website.” That felt smart for about five minutes. In practice, it made results worse. Search engines started “fixing” stylized company names, which is great if you like being confidently wrong. The best results came from searching the raw company name by itself and turning autocorrect off.
That one surprised me, because I assume machines are more helpful than they are.
The second decision was using the email domain as a fallback. For business emails, the domain usually points close to the real company site. Not always, but often enough that it’s worth using. You do have to filter out personal domains first, or else you end up treating gmail.com like a competitive advantage, which it is not.
The third thing was the order of operations around LinkedIn. I tried searching LinkedIn early on, but it worked better after scraping the company site first. Legal entity names show up in footers, terms pages, privacy pages, all the places nobody reads until a workflow forces them to. Once I had that legal name, the LinkedIn search got a lot cleaner. “Acme” is vague. “Acme Holdings LLC” is less vague. Still not perfect, but better.
The last piece was the knowledge graph. It’s easy to ignore because it feels too convenient. But when it’s there, it’s useful free structure. Location, description, employee count, sometimes founders. It can reduce how much scraping you need and speed up the whole pipeline. I learned not to skip free signal when it’s sitting right there.
None of these decisions are dramatic. That’s the point. Reliability comes from boring choices made in the right order.
What you actually get back
The output is straightforward, which is what I wanted.
For each lead, I can save back a structured set of fields like:
- name
- email type
- company
- industry
- company size
- location
- website
- legal entity
- description
That gives sales or ops enough context to do something useful. A rep can see who they’re dealing with before outreach. A founder can route leads better. A small team can build a cleaner CRM without making someone manually research every inbound lead.
That matters more than the enrichment itself, at least to me. Data for the sake of data is how people end up with very expensive dashboards and very confused teams.
I also post the result to Slack. It sounds minor, but it helps. If the data only lives in the CRM, people forget it exists. If it shows up where the team already pays attention, there’s a better chance it gets used.
Setup, tradeoffs, and when to upgrade
The setup is light.
You need:
- a Serper.dev API key
- an OpenAI API key
- a Pipedream workflow
- an optional Slack webhook
- auth for your CRM or app API
That’s it. No servers to manage. No background workers. No extra production infrastructure. You can build it quickly and ship it without turning it into a side project that eats your week.
That said, this isn’t magic.
There are failure points. Bad webhook URLs. Auth mismatches. Search picking the wrong company because the name is too generic. Sites blocking scraping. JSON coming back malformed from the model. Weak legal entity cleanup leading to a bad LinkedIn result. All the normal stuff. Nothing catastrophic, but enough to remind you that “simple” and “automatic” are cousins, not twins.
And there’s a ceiling here.
If you need multi-provider waterfall enrichment, intent data, lead scoring, complex routing, or enough volume that orchestration starts to matter more than cost, then a bigger platform probably makes sense. That’s where something like Clay starts to earn its keep. At that point you’re not trying to enrich leads. You’re trying to automate a more complex system in production.
But most small teams aren’t there yet. I think a lot of us start there because buying a polished tool feels safer than building a small thing ourselves.
I get that. I do it too. Then I stare at the bill and suddenly become philosophical.
So my take is simple: build the smaller pipeline first. Solve the practical problem in front of you. Use the cheap thing until the cheap thing becomes the bottleneck. Then upgrade.
Not before.