<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=1822615684631785&amp;ev=PageView&amp;noscript=1"/>

How to Easily Scrape Prospect Data with GPT for Sheets (New)

Prospecting has a tendency to be the most arduous aspect of the entire cold email process — so it’s great when AI can take on some of the load.

In this article, I’m going to demonstrate some ways you can use a Google Sheets add-on called GPT for Sheets to have AI find and scrape prospects’ emails, websites, and other key information.

And the add-on can do all that inside Google Sheets, making it extra convenient to use your results for cold outreach campaigns with GMass.

Note: If you want a guide to installing GPT for Sheets or figuring out its pricing structure, check out our complete guide to GPT for Sheets — then come on back to this article.

Scraping Prospect Data with GPT for Sheets: Table of Contents

How to Use GPT for Sheets for Prospecting

Here are the results of (and instructions for) different prospect hunting experiments I ran with GPT for Sheets.

You should be able to replicate these all pretty easily in your own Google Sheets.

Scraping a website’s “contact us” page, then finding the email address

If you have a list of websites’ contact pages and you want to find an email address on each one for your campaign, GPT for Sheets can handle that legwork.

In this example, I am using two GPT for Sheets formulas to find email addresses for 10 American-based sock manufacturing companies.

The first is GPT_SCRAPE, which grabs the entire contents of a web page and copies it as text into a Google Sheets cell.

The second is GPT_EXTRACT, which sifts through a Google Sheets cell looking for a specific type of data (in this case, an email address).

Important: In order to use the GPT_SCRAPE function, you need to first follow GPT for Work’s instructions on installing a Google Apps Script. It only takes a couple of minutes.

Once you follow the instructions for getting GPT_SCRAPE working, it adds a new Scrape menu to Google Sheets.

Adding the scraping feature to Google Sheets

I used that to scrape the contents of the URLs in Column A into Column B.

Scraped website results

Then I used GPT_EXTRACT to find an email address in each website’s scraped results.

=GPT_EXTRACT(B3, "email address")

The function did a good job with this task, finding the accurate email address on the page in every case. (One page did not have an email address, so GPT for Sheets just left the field blank. I was glad to see it didn’t hallucinate, as AI still just loves to do.)

But, as always with AI, the results are a little bit quirky. In a few cases, it doubled up the email addresses.

And there’s no built-in setting to tell GPT for Sheets just to grab a single result. So it would take another formula to make the email column ready for a mail merge campaign.

Finding email addresses inside scraped data

Good news: This cost me less than 1 cent of my GPT for Sheets credit. GPT_SCRAPE does not use your paid GPT for Sheets tokens, and the email extraction didn’t even knock a full penny off my total.

Hunting down contact info just by providing a company name

The above example worked great. However, I had to manually research all those companies’ contact page URLs.

So I decided to try something that would require less manual work on my end.

In this case, I am only providing the name of a company. And we’ll use GPT for Sheets’s web research function, GPT_WEB, to find what I need from there.

I started by putting each company’s name into column A.

Then in column B, I concatenated together a prompt.

="Look up the Contact page for the sock company "&A2& " and find a contact email address on that page. Give me the URL of the Contact page as well as the contact email address"

I structured my prompt not just to give me an email but also the URL. That way I could spot check the AI’s results — and also, in circumstances where it didn’t find a contact email, I could manually check the website myself. (That’s still manual work, but if GPT for Sheets takes care of most of my contacts that’s a lot less of it.)

In column C, I used this simple formula to run the prompt.

=GPT_WEB(B2)

Here are the results.

Using a single function to find a contact page and a contact email

I can’t complain too much. It found an accurate email for 8/10. It couldn’t find the available contact page or email for one of them (row 10) and hallucinated on the one site that didn’t have an email address on its contact page (row 6).

But 80% accuracy is about the trade-off I’d be willing to take for this level of prospecting automation. (It’s also, realistically, about the accuracy rate I’ve seen when I’ve hired virtual assistants and others to do prospecting for me manually.)

Now from here I’ll use another GPT for Sheets formula to bring the URLs into one column and the email addresses into another column.

After a few unsuccessful attempts to get GPT for Sheets to split out the URL and email address using just one function, I wound up using two separate GPT_EXTRACT formulas.

In the Website column: =GPT_EXTRACT(C2, "url")

And in the Email column: =GPT_EXTRACT(C2, "email address")

Splitting out the urls and emails

The results in both columns are perfect. However… they’re perfect based on the data I provided. And, as we saw earlier, the data was 80% correct. So this is a “garbage in, garbage out” situation: The better the prospecting data you have, the better the extractions will be.

Overall, I’m really happy with the results I got from this approach. For not having to do any manual research other than knowing some companies’ names, I’ll take it.

Of course, you may be after a higher level of accuracy. I only had 10 contacts; with a larger contact list, this method could require a lot of manual spot checking and/or filling in the gaps.

And one other downside: GPT_WEB is expensive. It’s not billed with GPT for Sheets’ usual token system. So you could find yourself burning through your GPT for Sheets budget really fast if you used this technique on a large data set.

Then again, based on the time it’s saving and the pretty strong accuracy, it just might be worth it.

Using GPT VISION to search images and documents

For my next trick…

Let’s throw something more complicated than a website at GPT for Sheets.

What if you have a list of prospects in a file, like a PDF or scanned document?

For those cases: There’s a function in GPT for Sheets called GPT_VISION that can analyze an image. In the examples on their website, they use it to write product descriptions, but I wanted to see if it could extract data.

GPT_VISION needs a prompt and URL of an image, in the structure of =GPT_VISION(prompt, url).

Note: If you have a PDF, you’ll need to convert it to an image. GPT_VISION only works with png, gif, jpg, and webp file types.

Here is the first image I threw at it:

An image with email addresses on it

And I used this prompt:

=GPT_VISION("Create a comma-separated list of all emails in this image", "https://blogcdn.gmass.co/blog/wp-content/uploads/2024/06/emailsampleimage1.png")

Here’s the result:

The results of searching within an image

It successfully extracted all 10 emails from the image. And now I can use another formula to get them each into their own rows. =TRANSPOSE(SPLIT(A2, ",")) would get that done.

What about data that’s less structured? I generated some AI pink slime and sprinkled the 10 email addresses throughout. Here’s how that image looks:

Burying email addresses inside text

I used the same prompt as before, and once again GPT_VISION successfully found all 10 email addresses.

Finally I tried the toughest test of all — a handwritten list.

Handwritten email addresses

And GPT_VISION got that perfect, though I wonder if it would struggle with sloppier handwriting. (Yes, that’s a backdoor brag about my not-terrible penmanship in that image.)

I thought GPT_VISION would possibly burn through my GPT for Sheets budget, since it uses gpt-4o, but all of my tests above only cost around one cent.

Scraping LinkedIn profiles to find prospect information

Now that we’ve done some email and URL hunting on company websites, let’s scrape info about individual prospects.

For this example, I had GPT for Sheets try to research the following things about Ajay, the CEO of GMass (using these various prompts).

  • Who is the CEO? =GPT_WEB("Find me the name of the CEO of the email company GMass")
  • What is the CEO’s LinkedIn page? =GPT_WEB("Find me the name of the CEO of the email company GMass")
  • Where can I find a public interview or profile on the CEO? =GPT_WEB("Find the URL of a public profile or interview transcript with "&A2)
  • Where is the CEO located? =GPT_WEB("Find the URL of a public profile or interview transcript with "&A2)
  • Where did the CEO attend college? =GPT_WEB("Find the college where "&A2& " attended")
  • What is their email? =GPT_WEB("Find an email address for "&A2)

Note: Scraping of LinkedIn pages using the GPT_SCRAPE function does not work, so I couldn’t  just scrape the entire LinkedIn profile then search within it.

Here are the results:

How GPT for Sheets does on hunting down prospect info

In this test it got 4 things right:

  • Identifying the CEO
  • Finding the LinkedIn URL
  • Finding a public profile/interview
  • Finding the location

And got 2 things wrong:

  • College attended
  • Email address

I’m torn here.

On one hand, GPT for Sheets got a decent amount of the prospecting info correct and would save me time.

But on the other hand, it was inaccurate enough that I don’t know if I’d trust these results at scale.

Your mileage will vary; it depends on how many prospects you’re researching, what type of information you need, and how damaging it would be to get someone’s info wrong in a message to them.

Using Your Prospected Data in GMass Campaign

Once you’ve found and/or scraped your prospecting info, it’s time to use it in a campaign.

GMass has a native integration with Google Sheets so you can go straight from this prospecting work into your campaign without any extra steps.

To begin the campaign process, I connected this Google Sheet to a new GMass campaign. Then I made sure to select the right sheet within my Google Sheets that had the contact info for my prospects.

Connecting to the Google Sheet of prospects

GMass puts all of the email addresses I found into a campaign. You can also mail merge in anything else you found during the prospecting process.

GMass takes the prospect data

Ready to try GMass yourself? You can get started by downloading the Chrome extension and you’ll be up and running in minutes — no credit card required.

Come see why there are 300,000+ happy GMass users who rate it an average of 4.8 out of 5 stars. Cold outreach, email marketing, mail merge, message tracking, and more — all inside of Gmail.

Email marketing, cold email, and mail merge inside Gmail


Send incredible emails & automations and avoid the spam folder — all in one powerful but easy-to-learn tool


TRY GMASS FOR FREE

Download Chrome extension - 30 second install!
No credit card required
Love what you're reading? Get the latest email strategy and tips & stay in touch.
   


Leave a Reply

Your email address will not be published. Required fields are marked *

Send your first campaign in a matter of minutes

No credit card required

Try GMass for free Then check out the quickstart guide to send your first mail merge email in minutes!

GMass

Share This