Blog

Prying Open the Black Box: Why Bidstream Data is Not a Sustainable Source of Intent

bidstream data“Billions of signals…” Does this phrase sound familiar to you when it comes to intent providers?

There’s been a proliferation of solutions offering intent data in recent years, most of which tell some variant of the story in which they can instantly monitor hundreds of thousands of websites to determine where to target your sales or marketing outreach. While this black box approach to intent data has, for the most part, skirted by the market’s overall concerns with data leakage and data privacy, recent developments call into question whether the web scraping methodologies used are sustainable going forward, and whether they’re delivering on their value proposition today.

How exactly does a company “monitor” websites that they don’t own or have any relationship with? It’s simple enough to crawl websites, and the smart use of AI can do some of the work of classifying web pages (although the quality issues that come with AI classifications is a blog post for another day). But crawling a web page doesn’t tell you who visited it. There are limited ways to match relevant content with a site visitor – none of them get to the level of the individual user, and they’re all under siege.

The problem with bidstream data sourcing

One common way to do this is through the use of bidstream data, which is derived from programmatic advertising exchanges. The real-time bidding process that results in a programmatic impression being offered to multiple potential advertisers exposes certain information about the page and the site visitor to not just the winning ad impression owner, but to any party that bids on the inventory. Companies will set up ad bidding to “see the inventory” with no intention of buying the ad, but simply to make use of the bidstream data to harvest audience information from the sites being bid upon.

While this approach doesn’t rely on cookies, it also raises troublesome questions about unauthorized data use that are drawing attention just as the third party cookie is being phased out. This summer, a group of U.S. House and Senate lawmakers called upon the FTC to investigate whether use of bidstream data to capture and sell consumers personal information violates federal laws barring deceptive business practices.

In a separate development earlier this summer, the Business Publishers’ Association published an open letter from some of the B2B industry’s largest information providers calling upon the digital ad ecosystem to put an end to data leakage in the programmatic bidstream.  TechTarget gladly signed onto this initiative because it reflected a reality we’ve long been aware of – publishers who allow their pages to be bid upon programmatically are unwittingly handing off valuable audience data to companies that are using it to create derivative products that compete against them. We’ve chosen not to participate in these programmatic exchanges to protect our audience data – and other key tech information providers are now doing the same, further reducing the actual realizable value of the dataset.  You can check out the BPA’s very informative overview of the issue here.

Other approaches used to harvest audiences you should pay attention to

Bidstream data isn’t the only approach being used to harvest audiences. Web tools that enable content sharing, analytics, or other site features are often capturing user visit data and creating datasets that can be licensed by black box intent providers as another source of URLs and account level reverse IP mapping. Undercover audience harvesting by web feature widgets isn’t discussed often, but it’s the next front in the privacy discussion. The same information providers that don’t want their audience data compromised by bidstream data leakage are also starting to pay attention to this issue – quality data sources don’t give away their data assets for free.

All intent is not created equal – make sure you know your sources

TechTarget’s Priority Engine purchase intent insight platform does not rely on data scraped from across the web, but from the real, observed activity of opted-in buyers across our owned and operated network of 140+ sites. But all intent is not created equal – many vendors will not be as transparent with you on where they are sourcing the intent they provide to you and they will not be able to clearly articulate the value purportedly embedded in it.

Recent developments make it clear that the black box approach to intent data is getting pried open.  Make sure you are asking your intent vendors the right questions and you are assessing the quality of the signals and data you receive.

More Posts On