Key Takeaway: Ad spy tools (AdPlexity, Anstrex, Spy.House, BigSpy, and their peers) build their databases mostly by registering as publishers on third-party push and native ad networks, then capturing every bid response their own JavaScript receives. The networks structurally tolerate this because they earn revenue twice from the same activity: once from the original advertiser, again from the copycats. The durable defenses aren’t concealment (impossible against a publisher paid to deliver the creative) but reducing the value of what leaks: brand distinctiveness that makes copies visibly inferior, token-bound click URLs, server-side lander personalization, bot detection at the funnel edge, and mirroring spy operators’ own view of your campaigns through real carrier-ASN mobile-proxy audits.
Affiliate ad-intelligence platforms (AdPlexity, Anstrex, AdSpy, BigSpy, Spy.House, Mobidea, and a long tail of smaller competitors) sell access to databases of millions of ad creatives, searchable by GEO, vertical, advertiser, and date. Most public writing about them stops at “they collect ads from ad networks” and moves on to product reviews. The actual collection mechanisms are technically interesting, the defenses against them are non-obvious, and almost nothing serious gets written about either.
The five collection mechanisms
Five fundamentally different mechanisms feed a modern ad-spy database, each with its own engineering profile and its own defense story:
- Publisher infiltration: registering as a publisher on third-party push/native ad networks to capture every creative they deliver. The subject of this article.
- Push subscription harvesting: exploiting the Web Push protocol to capture encrypted notification payloads through a service worker the spy operator controls.
- Public ad libraries and social ad scraping: Meta Ad Library, TikTok Creative Center, Google Ads Transparency Center, plus account-farm feed scrolling for what the libraries don’t expose.
- In-app SDK MITM: rooted Android device farms intercepting ad-SDK traffic from AppLovin MAX, Google AdMob, ironSource and their peers.
- Landing page and funnel crawling: following creative click destinations through tracker, cloaker, and prelander chains to identify the offer, the affiliate network, the tracker software, and the full reverse-engineered funnel.
The mental model
The dominant collection mechanism for push, native, inpage, and popunder spy databases is the simplest one structurally: the spy operator becomes a publisher.
Picture a small three-floor house. One wall faces a busy highway. Another wall faces partially toward a quieter crossroad. The owner decides how to slice those walls up for ads: ten panels packed across the highway-facing wall (lots of revenue, ugly house, fewer return visitors), or one small tasteful banner on the side wall (less revenue, the house still works as a house). That slicing decision (what space is offered and how it’s carved up) is what ad-tech calls inventory.
The owner offers a particular layout of available space. An ad network signs contracts on both sides: with the owner to fill the slots, and with brands to put ads in those slots. The network handles the matching and takes a cut.
Cross-mapped to ad-tech:
| House on the highway | Ad-tech |
|---|---|
| Billboard rental company | Ad network, PropellerAds, RichAds, MGID, Taboola, Adsterra, AdMaven, EvaDav, ClickAdilla, ExoClick, HilltopAds, Kadam, PushHouse, ZeroPark, Mondiad, RollerAds, Izooto, DatsPush, and another dozen |
| Brand renting wall space | Advertiser running campaigns |
| Your walls along the highway | Publisher’s shell-site ad slots, or push subscriber list |
| How you slice each wall into banner panels | Inventory: the configured set of ad placements you offer up |
| Cars driving by | Visitors arriving on the publisher’s page |
| Pasting an ad onto a panel | The network’s JavaScript rendering a creative when a visitor arrives |
| Billboard company paying you for the wall | Publisher revenue from the network, per impression or per click |
It is your wall. You sell it. You get paid for it. You see exactly which billboard goes up on it, and when. Every part of what a spy database sells as “competitive intelligence” follows from that one fact.
Two more pieces of jargon used below, glossed once and then dropped: a vertical is the niche an advertiser operates in (finance, insurance, e-commerce); a DSP is the buyer-side platform an advertiser uses to actually purchase inventory in real-time auctions. If any of this is new ground, the IAB Tech Lab is the canonical source for the protocols and Wikipedia on real-time bidding covers the basics in 10 minutes.
The three steps
- Register as a publisher. Standard onboarding form on every push/native ad network. Hand over a domain, accept terms, embed the network’s JavaScript tag or configure a push CNAME subdomain.
- Send visitors at the inventory. Cheap popunder/redirect traffic at roughly $0.50 per 1000, or self-generated visitors from a residential or mobile-proxy farm running real browsers against the shell page. Every visitor triggers an ad call.
- Capture the bid response. Network sends creative metadata as plain JSON. The publisher’s JS (code the operator wrote) reads the response, posts it to the operator’s collection endpoint, then renders the ad normally so nothing looks off.
Published scale claims are realistic given this model: Spy.House publishes intake claims of 12 million creatives per day across 185+ countries, with AdPlexity and Anstrex publishing overlapping coverage lists across native, push, and mobile formats. One operation at 30 networks, 50 shell sites, 95 GEOs lands in that range without effort.
What the wire actually carries
The technical core is one HTTP request and one HTTP response per impression, both fully visible to the publisher because the publisher’s own JavaScript is what makes the request.
A typical publisher tag:
<script async
src="//cdn.adnetwork.example/sdk.js"
data-zone-id="1234567"></script>
On page load the SDK assembles an ad request. As curl, what flies over the wire looks like:
curl -G "https://srv.adnetwork.example/req" \
--data-urlencode "zone=1234567" \
--data-urlencode "geo=DE" \
--data-urlencode "device=android" \
--data-urlencode "browser=chrome" \
--data-urlencode "ua=Mozilla/5.0 (Linux; Android 14; SM-S921B) AppleWebKit/537.36" \
--data-urlencode "referer=https://shell-site.example" \
--data-urlencode "screen=412x915" \
--data-urlencode "lang=de" \
--data-urlencode "ts=1737542890"
The network runs its selection — a real-time-bidding auction across DSPs for the bigger exchanges (the standard protocol is IAB OpenRTB ), or direct campaign matching for the push/native specialists — and returns the winning creative as JSON:
{
"zone_id": 1234567,
"campaign_id": 287465,
"advertiser_id": 5821,
"creative": {
"title": "Stop overpaying for car insurance",
"body": "Drivers in DE are switching — check rates now",
"icon": "https://cdn.adnetwork.example/c/874513_icon.jpg",
"image": "https://cdn.adnetwork.example/c/874513_main.jpg",
"click_url": "https://trk.adnetwork.example/c?cid=287465&zid=1234567&sid=abc123&v=1.42"
},
"pricing": {
"model": "cpc",
"bid": 1.42,
"currency": "USD"
},
"targeting_matched": {
"geo": "DE",
"carrier": "vodafone-de",
"device_type": "mobile",
"os": "android"
}
}
Title, body, image URLs, click destination, campaign ID, advertiser ID, bid price, and the targeting context that triggered the match: the entire ad delivery in one structured object. The publisher’s SDK then renders the ad.
Interception is mechanical. The publisher controls page-level JavaScript that runs before the SDK; a fetch wrapper or XMLHttpRequest shim sees everything:
// loaded before the network's SDK
(function () {
const origFetch = window.fetch;
window.fetch = async function (input, init) {
const res = await origFetch(input, init);
const url = typeof input === 'string' ? input : input.url;
if (url.includes('adnetwork.example')) {
// tee the response to the collector before returning to the SDK
res.clone().json()
.then(payload => navigator.sendBeacon(
'https://collector.spy.example/p',
JSON.stringify({ origin: location.hostname, ts: Date.now(), payload })
))
.catch(() => {});
}
return res;
};
})();
The SDK sees an identical response and renders the ad normally. The collector receives the full record. The visitor sees a working ad. No request blocked, no response altered, no failure metric ticks up anywhere.
Pricing models and what each one leaks
Different networks use different pricing models, and the model determines which fields actually carry useful signal in the bid response. The vocabulary:
- CPM (cost per mille): advertiser pays per 1000 impressions. Default for push, popunder, most native. PropellerAds, RichAds, Adsterra. Bid expressed as dollars per 1000.
- CPC (cost per click): advertiser pays per click. Common on MGID, Taboola, Outbrain native. Bid per click.
- CPA (cost per action): advertiser pays on conversion (signup, deposit, install). Used by affiliate networks (AdCombo, MaxBounty, ClickDealer), rarely by ad networks directly. The conversion happens server-side at the advertiser, so CPA payouts are not visible in the bid response.
- CPI (cost per install): mobile-app CPA where the action is an install attributed by an MMP (AppsFlyer, Adjust, Branch).
- CPV (cost per view): video, pays per view completion.
The model that leaks the most intelligence is CPC. The bid amount is a real-time signal of what advertisers are willing to pay per click in that GEO and vertical, surfaced fresh on every impression. Three months of CPC bid history per advertiser per GEO is a competitive-intelligence dataset no advertiser ever consented to share, yet it accumulates as a side effect of the publisher’s ordinary view of the bid response.
Concretely, the spy operator’s internal query API ends up exposing endpoints like:
curl -G "https://api.spy-internal.example/v1/bids" \
-H "Authorization: Bearer $TOKEN" \
--data-urlencode "advertiser_id=5821" \
--data-urlencode "geo=DE" \
--data-urlencode "carrier=vodafone-de" \
--data-urlencode "model=cpc" \
--data-urlencode "from=2026-04-19" \
--data-urlencode "to=2026-05-19"
returning a time series:
[
{"ts":"2026-04-20T08:14Z","creative_id":874513,"campaign_id":287465,"bid":0.78},
{"ts":"2026-04-21T08:14Z","creative_id":874513,"campaign_id":287465,"bid":0.84},
{"ts":"2026-04-22T08:14Z","creative_id":874513,"campaign_id":287465,"bid":0.91},
{"ts":"2026-04-23T08:14Z","creative_id":891204,"campaign_id":289103,"bid":1.12},
{"ts":"2026-04-24T08:14Z","creative_id":891204,"campaign_id":289103,"bid":1.18}
]
The commercial spy-tool frontend is a search box over exactly this shape. The customer sees filter dropdowns and bid-trajectory charts; underneath, the same time-series query against the same normalized bid table, populated, impression by impression, from the JS interception block above.
CPM is less leaky per impression but useful in aggregate. A single CPM bid expresses willingness-to-pay per thousand impressions — fractions of a cent each — and on its own says little about how the advertiser values a click or a conversion. It is a function of budget × audience match × auction pressure for a unit of attention, not a direct read on the advertiser’s economic model.
What matters is the distribution. Across thousands of logged impressions for the same campaign, the CPM time series reveals budget shape (rising bids = pressure to scale, falling bids = budget exhaustion or dayparting), GEO price levels (DE inventory might clear at $3 CPM while VN clears at $0.30 for the same vertical, surfacing real market-level pricing), audience priorities (which targeting buckets attract the highest bids), and campaign lifespan. None of this is visible in any single impression; all of it emerges from a few weeks of accumulated history.
CPA does not surface in the bid response at all: the spy operator is not the conversion endpoint. Closing that gap requires following the click destination through tracker, cloaker, and offer wall, which is a different mechanism entirely.
Why is the bid in the response at all? Fair gap-fill question — the spy database is reading a number the network could in principle just not send. Five overlapping reasons:
- OpenRTB requires it. The
pricefield is mandatory in theBidResponse.seatbid.bidschema. Networks that ride on OpenRTB inherit the transparency by design. - Publisher revshare verification. Push and native networks pay publishers 60–80% of the advertiser bid. Hide the number → lose publisher trust → lose inventory to a competitor who does show it. Opacity is a competitive disadvantage on the supply side.
- Yield optimization on the publisher side. Publishers set CPM floors per zone, A/B test placements, tune creative density. All of that requires per-impression bid data. A publisher who can’t see what their inventory clears at can’t optimize what they sell.
- Client-side rendering. Push, native, popunder, inpage are all rendered by the publisher’s own JavaScript. The bid sits in the same payload as the creative URL and click URL the tag needs to render at all, leaked at no extra cost to the network.
- Click URL often encodes the bid. Many networks bake the bid value into the click-tracking URL for billing reconciliation between impression and click. Even if you stripped
bidfrom the top-level JSON, it would ride along in the click URL parameters.
The structural exception is the walled-garden case detailed below: Google AdSense aggregates publisher earnings to category-level revenue rather than exposing per-impression bids. That opacity is one of the three defenses that make publisher infiltration fail against the walled gardens, and a choice third-party networks could replicate but don’t, because the value of publisher transparency at onboarding and retention beats the cost of long-tail spy indexing.
OpenRTB and the bidstream
Most exchanges and many of the larger networks speak the IAB OpenRTB protocol. Bid request and response follow a public JSON schema
, BidRequest, Imp, Seatbid, Bid, with documented fields for advertiser domain (adomain), creative markup (adm), bid price (price), creative ID (crid), and IAB category taxonomy (cat). For a spy operator, the OpenRTB-compliant network is the friendliest target: field names are predictable, the schema is documented, normalization across networks is trivial.
The practical pipeline: one parser per legacy/proprietary network format, one parser shared across all OpenRTB-compliant networks, all output normalized to a single internal schema. The cross-network search UI on the spy tool’s frontend is then a thin layer over that normalized table, which is why every major spy service exposes essentially the same filter set (GEO, device, OS, browser, ad type, date range, advertiser keyword).
The easier path: the publisher dashboard
Wire-level interception is the general case. The specific case is even simpler: the network’s own publisher dashboard is a search UI over the same data. Logged in as a publisher, the operator sees creative thumbnails, advertiser names or IDs, click destinations, bid amounts on served impressions, time-of-day distributions, campaign IDs. The dashboard exists so legitimate publishers can monitor and brand-safety-check what runs through them; for a spy operator it is the easiest possible export interface.
Where the dashboard is rich, the spy operator uses it. Where it is weak, absent, or undocumented (particularly push, where the encrypted Web Push payload at the wire is harder to handle without controlling the service worker that decrypts it), wire interception fills the gap. Multiplied across 25–40 networks, the resulting database is structured, normalized, and roughly current with the live state of every running push/native campaign worldwide.
The GEO multiplier
A publisher in Brazil sees Brazilian creatives. To capture creatives in 95 countries, the operator needs either:
-
Real visitors from each GEO to the shell publisher sites, so the ad network serves geo-targeted inventory naturally. Operators buy cheap traffic (popunder, redirect, push) from other networks and route it through their own publisher sites to trigger ad calls. The economics: a redirect impression costs $0.0005 and generates one served creative worth indexing.
-
Synthetic visitors through residential or mobile proxies, executing the publisher page with a real browser to load the ad tag and capture the response. This path is more expensive per impression but produces clean, carrier-segmented coverage that cheap redirect traffic doesn’t. The largest spy marketplaces cross-sell mobile and residential proxy vendors as a primary partner category for exactly this reason.
The mobile-carrier dimension matters specifically: ad networks key targeting on the carrier ASN of the requesting IP. Generic datacenter proxies surface as “Wi-Fi/unknown” and miss carrier-segmented campaigns entirely. Mobile proxies through real carrier endpoints surface campaigns that only fire for, say, Vodafone IT or Vivo BR subscribers, which is exactly the inventory worth spying on because that targeting suggests a profitable, well-tuned campaign.
Why ad networks structurally tolerate it
The economic loop is the answer:
Advertiser pays Ad Network
↓
Ad Network serves creative to spy operator's "publisher" site
↓
Ad Network pays publisher revenue to spy operator ← network's incentive
↓
Spy operator logs every creative, sells access to other affiliates
↓
Other affiliates launch copy campaigns through the same Ad Network
↓
Ad Network earns more advertiser revenue (the loop closes)
The ad network is paid twice from the same activity. The original advertiser’s spend funds the publisher payout to the spy. The copying advertiser’s spend funds the next cycle. Networks have no incentive to detect or ban these publishers, and there is observable evidence of this alignment: PropellerAds maintains a dedicated /blog/tag/spy-tools/ index on its corporate blog, RichAds publishes a top push spy tools listicle, and Adsterra runs its own best-ad-spy-tools recommendation post, all three covering the same set of platforms that index their own inventory, often with affiliate links to those tools. The networks are not adversaries of the spy ecosystem; they’re a complementary layer.
Why walled gardens are different
The mechanism above works on PropellerAds, RichAds, MGID, and the other 30-odd third-party push/native networks because those networks sit between advertisers and an open ecosystem of independent publishers. Anyone with a domain becomes a publisher; the network has no first-party surface of its own and no way to deliver an ad without a publisher to host it. The publisher is the network’s outbound delivery channel. The spy operator becomes one.
Google Search, Meta (Facebook and Instagram), TikTok, and LinkedIn do not work this way. Their ads run on their own first-party surfaces — the Google search results page, the Instagram and Facebook feeds, the TikTok For You stream, the LinkedIn timeline — rendered by their own apps and servers. The advertiser pays them; they show the ad directly to users on their property. There is no third-party publisher inventory to register against, no JS tag anyone can embed to receive Meta sponsored posts, no API to subscribe to Google Search ads from outside Google’s own surfaces. The serving surface is the platform.
Where the walled gardens do have third-party publisher networks (Google AdSense and the broader Google Display Network, Meta Audience Network for in-app), three structural defenses keep publisher infiltration from working.
1. Cross-origin iframes
AdSense ads load inside an iframe served from googleads.g.doubleclick.net or tpc.googlesyndication.com. The publisher’s JavaScript runs in the parent page; the ad runs in the iframe; browser same-origin policy makes the iframe content invisible to the publisher’s code:
// the publisher's page tries to inspect the ad iframe
const adFrame = document.querySelector('iframe[src*="doubleclick"]');
adFrame.contentDocument;
// → null (same-origin policy blocks access)
adFrame.contentWindow.document.body;
// → DOMException: Blocked a frame with origin "https://my-publisher.example"
// from accessing a cross-origin frame.
The publisher rents an empty rectangle to Google. No bid metadata, no creative URL, no advertiser identity reachable from publisher-side code.
2. Server-side selection with no exposed bid response
Google selects which ad to show on its own infrastructure. The publisher receives a rendered ad inside the sandboxed iframe, not a JSON bid response with advertiser ID, campaign ID, and creative metadata. The publisher dashboard shows aggregated revenue by IAB category, not per-creative bid history, and even those categories are deliberately coarse (“Finance > Insurance”, not “Allianz under-30s EU campaign”). The publisher’s view of what runs through them is low-resolution by design.
3. Aggressive account-level enforcement
Google and Meta both run dedicated trust-and-safety operations on the publisher side, with ML-based fraud detection plus explicit Terms-of-Service prohibitions on scraping, intercepting, or fingerprinting ads served through your account. Account termination plus revenue holdbacks are the standard response, applied at scale. The incentive structure is opposite to that of PropellerAds: walled gardens lose money to publisher fraud rather than earning more from it, so they police it.
The combined effect: the only programmatic ways to see Meta or Google ads at scale are the platforms’ own ad libraries and account-farm feed scrolling against the user-facing surface. Wire-level publisher interception, the universal trick on third-party networks, does not survive contact with the walled-garden architecture.
Detection from the advertiser side
For an advertiser running a campaign and wanting to know whether they’re being indexed:
-
Inject unique fingerprints in click URLs. A query parameter that encodes a per-creative random token, logged on the landing server. If that token shows up in a spy tool’s search results, or in your own competitor analysis, the creative has been indexed.
-
Audit click traffic for spy-tool referrers. Spy databases preview landing pages by fetching them from their infrastructure. The fetch User-Agent, IP ranges (often known cloud subnets in EU and US), and absence of JS execution are detectable on the landing server.
-
Plant decoy creatives with distinctive but meaningless text on small bid amounts. Search for that text in spy tools after 48 hours. The tools that surface it have it; the tools that don’t, don’t yet.
Defensive moves
| Defense | Publisher infiltration | Push harvesting | Public ad libraries | In-app SDK MITM | Funnel crawling |
|---|---|---|---|---|---|
| 1. Brand distinctiveness | ✓ | ✓ | ✓ | ✓ | ✓ |
| 2. Separate hook from offer | ✓ | ✓ | ✓ | ✓ | ✓ |
| 3. Token-bound click URLs | ✓ | ✓ | ◐ | ✓ | ✓ |
| 4. Server-side personalization | ✓ | ✓ | ◐ | ✓ | ✓ |
| 5. Bot detection at funnel edges | ✓ | ✓ | ○ | ✓ | ✓ |
| 6. Dynamic Creative Optimization | ✓ | ✓ | ◐ | ✓ | ◐ |
| 7. Pixel-level watermarking | ◐ | ◐ | ✓ | ◐ | ◐ |
| 8. Velocity beats concealment | ✓ | ✓ | ◐ | ✓ | ✓ |
| 9. Mirror what spy databases see | ✓ | ✓ | ✓ | ◐ | ✓ |
| 10. Diversify carrier-bucket surface | ✓ | ✓ | ○ | ◐ | ◐ |
Legend: ✓ full coverage (defense materially blocks the mechanism) · ◐ partial (helps but doesn’t fully block) · ○ no effect.
The structural alignment between ad networks and spy databases — networks paid twice for the same activity — means there is no way to be a paying advertiser on push or native networks and avoid being indexed. The creative will leak. What an advertiser does control is two things: reducing the value of what leaks (so the copy doesn’t help the copier), and making the leak commercially useless (so even a perfect copy doesn’t move product). The defenses below run from the most fundamental to the most tactical.
Three terms used throughout this section, since they have not been introduced earlier:
- Lander or landing page: the destination website the click leads to. The creative is what shows in the user’s feed; the lander is the page that loads after the click.
- Funnel: the full chain from creative, through tracking redirects, to the lander and onward to whatever action the advertiser actually wants (a signup, a purchase, an install).
- Cloaker: a server-side filter at the funnel that shows different content to different visitor types: real shoppers see the actual offer, bots and crawlers see a generic decoy. The word is most associated with gray-area affiliate marketing, but the underlying mechanism, distinguishing a human visitor from a scraper, is standard practice in legitimate fraud prevention, regional personalization, and bot defense, which is what Cloudflare Bot Management , DataDome , and FingerprintJS Pro do for major brands.
1. Brand distinctiveness: the only durable defense
A creative that is hard to copy convincingly is worth more than a creative that is hard to find. If a competitor can lift your creative, swap the brand name, and the result still looks plausible to your audience, the leak moves money. If the swap reads as obviously fake, it doesn’t. Distinctive typography, color palette, visual structure, and voice carry more defensive weight than any technical countermeasure — the same logic that has protected iconic billboard campaigns for a century.
The practical test: take your creative, replace your brand with a competitor’s, and ask whether it still reads as a credible ad for them. If yes, the asset is generic and the leak is dangerous. If “this is obviously not theirs,” the leak is mostly harmless. A competitor running the same copy with their name attached looks like a knockoff to anyone who already knows you.
2. Separate the hook from the offer
A distinctive creative is hard to copy convincingly. An offer-laden creative is easy to read. Keep the specifics (partner name, price, target segment, exact CTA) off the unit itself and behind the click. An ad that reads “Allianz drops to €18/mo for under-30s, sign here” hands a competitor the partner, price, segment, and funnel in one screenshot. The same hook in your distinctive voice, with the spec gated behind the lander, gives away the angle but not the playbook.
3. Token-bound click URLs
Every click URL carries a per-click token, signed server-side with a short expiry. One to four hours is enough. Your lander validates the token; expired or replayed tokens land on a generic page instead of the real offer. A spy crawler captures the URL once, the URL is dead by the time it surfaces in the spy database, and the simplest competitive-analysis move (replay the captured click) stops working.
The creative carries a single static URL pointing at your own redirector (https://track.you.com/c/<creative-id>). That’s the URL the network indexes and stores. When a real click arrives, the redirector mints a fresh signed token and 302s the visitor to the lander with the token attached. Use 302, not 301: 301 is cacheable, which would freeze one token onto the URL for every subsequent visitor; 302 forces each click back through your server for a fresh token.
const crypto = require('node:crypto');
const SECRET = Buffer.from('<rotated server-side secret>');
const TTL = 4 * 3600; // 4 hours
// In the redirector, on each click:
function sign(creativeId) {
const nonce = crypto.randomBytes(8).toString('hex');
const payload = `${creativeId}.${Math.floor(Date.now() / 1000)}.${nonce}`;
const sig = crypto.createHmac('sha256', SECRET).update(payload).digest('hex').slice(0, 16);
return `${payload}.${sig}`; // appended to the click URL as ?t=...
}
// On the lander, before revealing the offer:
function verify(token) {
const [creativeId, ts, nonce, sig] = token.split('.');
const payload = `${creativeId}.${ts}.${nonce}`;
const expected = crypto.createHmac('sha256', SECRET).update(payload).digest('hex').slice(0, 16);
const sigBuf = Buffer.from(sig);
const expBuf = Buffer.from(expected);
if (sigBuf.length !== expBuf.length) return false;
return crypto.timingSafeEqual(sigBuf, expBuf)
&& Math.floor(Date.now() / 1000) - parseInt(ts, 10) < TTL;
}
The major attribution platforms — AppsFlyer (OneLink + Protect360), Branch, Adjust, Google Campaign Manager 360 — generate their own per-click IDs and run server-side fraud detection (click replay, spoofing, bot traffic) inside their infrastructure, but the verdict arrives as a post-attribution flag in your dashboard rather than a primitive your lander can check at request time. The signed-token pattern above sits on top of whichever platform you use; it’s aimed specifically at the spy-crawler replay vector, which click-fraud suites are not built to address.
The expiry window is a tradeoff. Too short (under an hour) and real users opening a tab and coming back the next day land on the decoy page, hurting conversion. Too long (more than a day) and spy databases pick up live URLs faster than they expire. Four hours covers the vast majority of legitimate click latency on mobile traffic and clears most spy-database refresh windows. For sustained campaigns, rotate the signing secret weekly so even captured tokens stop validating against the current key.
4. Server-side personalization on the lander
The lander is not a static HTML file with offer details baked in. It is rendered per visitor from request context: GEO, device, referrer, click parameters, time. Two visitors hitting URLs that look identical see materially different pages. A spy crawler captures one synthetic view; the captured view does not represent what real customers see, and the spy database fragments your campaign into context-specific snapshots rather than recording one canonical lander to copy.
The inputs that drive variation are request-context primitives every web stack already has: GEO (from IP or click parameter), carrier ASN, device class, OS, browser, referrer entropy, click timestamp, and the campaign/creative IDs carried in the click URL. Each input shifts which offer variant, which payment method, which copy block, and which price the visitor sees. A spy crawler from a datacenter ASN in Frankfurt visiting a click URL meant for Vodafone IT 5G on iOS sees a fallback variant that has nothing to do with the campaign’s actual targeting.
The line between personalization and cloaking is intent and disclosure. Personalization treats every legitimate visitor as a first-class case and shows them the offer they were actually targeted with; the only visitors who see decoy content are those whose request context doesn’t match any served audience (bots, spy crawlers, ad-review traffic from the network itself when relevant). Cloaking in the strict affiliate sense intentionally deceives the ad network’s compliance reviewers. The same mechanism implements both; the legitimacy lives in what gets shown to the network’s review traffic versus a non-targeted visitor.
The catch with all this branching logic: you can’t verify it from the office. Loading your own lander through a datacenter IP, consumer VPN, or laptop on home Wi-Fi hits the fallback branch by design — exactly what the personalization rules block. The only ground-truth test is loading the lander through the request context you actually target: same carrier ASN, same GEO, same device class. iProxy.online turns an Android phone and a local SIM into a dedicated mobile proxy on the real carrier IP, so you can load the lander as a real Vodafone IT 5G visitor would and confirm the served variant matches the targeting before scaling spend. The same fleet doubles as the spy-defense audit: replay the URL captured in a spy database from the same carrier IP and check whether your decoy logic actually fires for the context the crawler was in.
5. Bot detection at the funnel edges
This is the legitimate, defensible application of cloaking-class techniques. Standard bot-mitigation layers at the lander edge identify automated visitors and route them away from real offer content. The same machinery online retailers run against credential stuffing and inventory-hoarding bots applies here: the spy crawler is a bot; bot detection keeps the offer page out of its reach. The captured lander becomes a generic decoy, and the spy database’s value to a copying competitor degrades accordingly.
What the major bot-mitigation products catch, concretely: Cloudflare Bot Management scores every request on a combination of TLS fingerprint, HTTP/2 frame ordering, IP reputation, behavioral history, and a confidence challenge; DataDome layers similar signals with a focus on application-side behavior (mouse movement, scroll cadence, form-field timing); FingerprintJS Pro emphasizes durable browser fingerprints that survive cookie clearing and incognito mode. A headless-browser spy crawler running from a clean datacenter IP fails all three on the basic signals; a sophisticated crawler running a real Chromium build behind a residential proxy passes one or two but rarely the full stack.
Where this defense gets noisy is around mobile-carrier traffic. Mobile-proxy IPs from real carrier ASNs route through CGNAT and look behaviorally identical to legitimate mobile visitors: same NAT pool, same TLS profile, same device fingerprints. A blunt “block all residential or mobile-proxy traffic” rule will hurt your conversion rate on real mobile users more than it slows down a competent spy operator. Tune the bot-detection threshold against the conversion impact: false positives on mobile cost real money, so prefer behavioral signals (interaction patterns, time-on-page, form engagement) over IP-class blocks for the mobile segment.
The pattern we see most often: a team tunes bot-detection rules against one carrier’s traffic shape, then over-trusts the rule on a different carrier in the same GEO. The signals that actually separate carriers and device classes sit below the layers an antidetect browser or cloud phone can repaint — SIM MTU differs between operators, TCP stack fingerprints differ between device classes, and both are kernel-level primitives a userspace tool can’t forge. A threshold tuned on Vodafone IT iPhones may flag legitimate TIM IT Android visitors a week later. Test against at least two carrier × device combinations per target market before locking the threshold.
Tuning that threshold honestly requires testing from the request context itself. An iProxy.online mobile proxy gives you a real carrier ASN with CGNAT routing and a genuine device TLS profile; paired with an antidetect browser, it lets you load your own lander as both a legitimate mobile visitor and as a more sophisticated crawler trying to imitate one. Walk the rules up until they catch the antidetect setup, then walk them back down until they stop punishing the bare-mobile baseline. The gap between those two thresholds is your operating window.
6. Dynamic Creative Optimization (DCO)
A standard industry feature on Meta, Google, and most programmatic DSPs: automated variation of creative components (headline, image, CTA, color, copy) at impression time. When every served creative is a slightly different combination, the spy database does not see one clear “winner” worth copying. It sees fragmented variants that don’t cluster, so competitors running copy campaigns test against the wrong assumptions and waste budget. DCO is a defensive move at no incremental cost — the platforms support it natively.
The variation axes that matter most for defensive fragmentation: headline (3–10 variants), primary image (3–8 variants), CTA button text (3–5 variants), and accent color (2–4 variants). Combinatorially, four axes with five variants each produces 625 served combinations. A spy database scraping 200 of those impressions sees what looks like 200 distinct creatives; no single combination dominates the data, no clear “best” emerges, and a competitor running their copy campaign against the captured set tests every variant equally and learns nothing about which one converts.
DCO is harder to escape on third-party push and native networks than on the walled gardens. Meta Advantage+ Creative and Google Performance Max do component-level variation automatically; PropellerAds and similar push networks accept multiple creative variants under one campaign but don’t usually fragment components within a single creative. The workaround on push: ship 20–30 micro-variants of the creative as separate units within the campaign, ensuring no single creative dominates impressions. Operationally heavier than the walled-garden equivalent but achieves the same defensive fragmentation.
7. Pixel-level watermarking
Steganographic markers embedded in creative images, videos, and copy: imperceptible to viewers, recoverable from a captured copy. Open-source toolchains like Meta’s watermark-anything demonstrate production-grade image payloads that survive JPEG re-encoding, resizing, and the screenshot-and-recompress path a spy crawler typically uses to ingest creatives. The same principle extends to text via Unicode whitespace substitution. A recent method paper catalogs the approach for headlines and copy lines, where invisible character variants encode a per-impression ID without altering the rendered text. Neither prevents indexing. Both identify which spy databases captured which creatives and when, which is useful intelligence about competitive research: who is studying your campaigns, on what schedule, through which collection paths.
8. Velocity beats concealment
Spy databases lag: 24 to 72 hours for the fastest, sometimes a week for the long tail. A team that ships a new creative variant every 48 hours is consistently ahead of competitors copying the previous one. For test campaigns this is a full defense; for sustained brand presence it is partial, and the brand-distinctiveness point in #1 carries the rest of the weight.
9. Mirror what spy databases see
The defense advertisers consistently skip is auditing their own campaigns from the same vantage point a spy database uses. If the only place you see your own creative is in the network’s reporting dashboard, you have no idea how it surfaces to the kind of traffic spy operators actually capture: synthetic visitors from residential and mobile-carrier IPs across every target GEO. Networks key delivery on carrier ASN; a creative that serves cleanly to your office IP on Wi-Fi may never reach a Vodafone IT 5G subscriber the network is actually targeting. You are flying blind on what your competitors can buy access to.
The audit is mechanical: spin up a real browser through a mobile-carrier proxy in each target GEO, load a publisher page running the relevant ad network, and capture the served creative and bid response over a sample window. The result is the spy operator’s view of your campaign — what creatives surface, at what bid levels, against what targeting parameters. If your “Vodafone IT 5G under-30s” creative doesn’t appear in the carrier-bucketed audit, the network isn’t delivering it the way you intended, and the same delivery gap is invisible to your reporting. If it does appear, you know exactly what shape of your campaign is downstream of any spy database. Run the audit browser through an antidetect profile bound to the mobile proxy so the captured view matches what a real visitor in that context would see.
10. Diversify the carrier-bucket surface of your campaigns
A campaign targeting “DE mobile” runs as a single broad bucket. A campaign segmented into “DE / Vodafone DE 5G under-30s”, “DE / Telekom DE 4G all-ages”, “DE / O2 DE 5G urban”, and so on, runs as a dozen narrow buckets. The total impression volume is the same; the spy-database footprint is not.
Spy operators query their normalized database by GEO, device, OS, browser, and the network’s targeting context fields. A broad-bucket campaign clusters into a single contiguous slice of the spy DB: query “DE mobile [vertical]” and the whole campaign surfaces in one view. A narrow-bucket campaign fragments across queries: no buyer of competitive intelligence naturally combines a dozen specific carrier, age, and device filters to reconstruct the full campaign, and the captured creatives split across cells that look like separate small campaigns rather than one large one. The same total spend, the same total reach, a fundamentally different signal in any spy database.
This works because ad networks key delivery on carrier ASN as a first-class targeting field. The same property that makes mobile-proxy audits possible in #9 makes carrier-bucket diversification meaningful here. Generic datacenter or residential proxies that mask the carrier ASN can’t deliver this defense; the spy operator’s mobile-carrier visitor pool is itself segmented by carrier, so a campaign that operates in narrow carrier buckets remains fragmented across the spy database even after capture.
Closing thought: the billboard has always been copied
Billboard advertising has been knocked off for a century, and Coca-Cola does not run a war room to stop it. They have something more powerful, a creative identity so distinctive that a copy is visibly a copy. Technical defenses buy days; brand distinctiveness buys decades.
The house on the highway can never stop people from looking at the wall. The owner gets to decide what is painted on it, and a painting recognizable everywhere does not lose its value the day someone takes a photograph.