How to Crush Link Prospecting with Content Scraping


Over the last couple of years, link building has become a lot more resource heavy. Gone are the days when you could survive by using a selection of cheap automated tools — after Google took a firm stance on link quality. It demanded that links be “natural” and started penalising sites that weren’t relevant.

Since the 2012 Penguin update, link spam faded out and hailed an era of “natural link building”.

Link building tactics evolved to attract high quality links with great content, and to acquire article placements on authoritative external publications.

The only problem was that this took a lot of time. Like a lot — hours of manual data gathering, hunting for emails and reaching out to potential prospects.

I’m going to share a process I use for link prospecting and outreach that will save you hours of time.

Quick Jump Menu

Putting a White Hat on a Black Hat Tool — Scrapebox

Once in a blue moon you might chance upon an SEO tool that made you wonder what you did before you discovered it. Scrapebox is that tool.

Scrapebox was originally a blackhat SEO tool designed for large-scale blog commenting or spam. What’s often missed out is the incredible scrapping features Scrapebox has that can be used for link prospecting, without spamming a site.

It’s considered the ‘Swiss Army Knife of SEO’ and should be in every marketer’s arsenal. If you don’t have a copy of Scrapebox yet, it’s only $97 for lifetime access — you need to get it right now.

Finding potential link prospects

When you first open Scrapebox, you might be a little confused on how to start. There are many features here but I will show you one that I found most beneficial.

The ‘custom footprint’.


Click on the ‘custom footprint’ radio button on the top left box. This tells Scrapebox to search through all the websites possible. Next, enter some search modifiers into the box. For this example, we are going to look for all the yoga blogs that accept guest authors. To do that, I’ve entered the following queries:

  • “yoga” inurl:tag/guest
  • “yoga” intitle:”write for us”
  • guest post” intitle:”yoga”

Finally, click on ‘start harvesting’ and within 20 seconds you’ll have a ton of possible link prospect opportunities. (I gathered 999 URLs from my queries!).

ScrapeBox can scrape search engine result pages (SERPs) at an incredibly fast rate. Our example above was scrapping 119 URLs per second! This makes the manual process of gathering link prospects one by one from search queries a hell of a lot quicker.

Note that if you choose to do some heavy-scrapping, you’d need to get private proxies to make the most out of this tool.

Trim the Fat

Next we want to refine our huge list of results and remove unsuitable prospects. Scrapebox will return a lot of duplicate results and the first thing we want to do is ‘remove duplicate urls’.



Then, ‘remove duplicate domains’. This step is much more brutal and will probably trim down your scrape by 40% or more.

Finally, we want to weed out the useless sites such as free blogging platforms. You can do this via excel or use the function within Scrapebox, under ‘Remove/Filter’, choose ‘remove URLs containing’:

  • weebly
  • blogspot
  • blogger
  • tumblr
  • squarespace

Checking Page Authority on Scrapebox

You can bulk check link equity metrics of the URLs using Scrapebox’s ‘Page Authority’ addon.

To use this addon, you have to sign up for a free or paid Mozscape API key.

MozRank is a score from 1–10 which measures a URL’s link popularity.

Now the free version of Mozscape has a 10 second delay so the full crawl might take 30–50 minutes depending on how many URLs you’ve harvested.

Let Scrapebox do its thing and after its completed, you’ll have the page authority metrics of all your URLs — fantastic! Also, you can export the data in .csv format using the ‘export results as’ button.


After exporting the .csv file, you want to sort your data.

  • Filter out URLs with 0 MozRank, these are likely low quality blogs

The URLs remaining meet our minimum quality criteria — we have a total of 562 potential link prospects. Now we just need to trawl through them to find the good ones for outreach.


Sifting through link prospects using ScreamingFrog

After scraping hundreds of links, it’s unlikely that all of them will be relevant, legitimate link prospects. This is when ScreamingFrog SEO Spider tool comes in.

We want to look through the HTML pages we have scrapped earlier and search for these specific phrases:

  • write for us
  • contribute to our blog

To do that, we’ll set up some custom filters on ScreamingFrog.

screamingfrog-custom-search filters

Next, set up ScreamingFrog to ‘List’ mode. Navigate to ‘Mode’ -> ‘List’.


Final step: upload the qualified URLs from our excel sheet earlier. Now sit back and let ScreamingFrog work its magic.

screamingfrog urls paste manually tutorial

After ScreamingFrog finishes processing the URLs, click on the ‘Custom’ tab to the right and you will get a list of custom filtered URLs.

The below screenshot shows 162 URLs that contain the custom filter, “write for us”. From my initial 900+URLs, I’ve filtered it down to over 162 relevant link targets — all within a couple of minutes.

screamingfrog guest post custom filter

So now we have a nice list of 162 links we can start reaching out to. First, we need their contact details. Our next tool for this — Buzzstream.

Buzzstream in a fantastic content and link building tool for streamlined outreach. There’s really no better tool out there that can build links as efficiently.

The features we’ll be focusing on are:

  • Contact details gathering
  • Outreach through Buzzstream’s email interface

It’s worth noting that you’ll need an account to do this, but you can sign up for a free account (14-day trial) which gives you full access.

buzzstream-import urls

In Buzzstream, create a new project and click on ‘add websites’ button and select ‘import from csv’ option. Now upload the 162 URLs scrapped and filtered from ScreamingFrog into Buzzstream. Within 10–15mins, Buzzstream will have gathered a list of contact info from your website links. This saves you hundreds of hours combing through websites for emails.


In about 30 minutes, I’ve gathered 160 niche-relevant link targets with all their contact info, including social media profiles and emails. Now we just have to use Buzzstream’s email interface for outreach and build some quality links!