How to scrape Perimeter X: Please verify you are human?

  • Josselin Liebe
    Author
    by Josselin Liebe
    10 months ago
  • While venturing into the world of web scraping, it's not uncommon to bump into roadblocks designed to prevent automated access. One of the most formidable of these obstacles is PerimeterX, a service designed explicitly to deter web scraping efforts. If you've encountered messages like Please verify you are human: Press & Hold, it's a telltale sign that PerimeterX is in play.=

    px-verify-human.png

    Recognizing PerimeterX

    PerimeterX isn't just about displaying human verification challenges. Underneath, it employs sophisticated techniques aimed at spotting and thwarting automated requests:

    1. JavaScript Fingerprinting: This method examines how browsers execute JavaScript, hunting for patterns consistent with web scrapers or bots.

    2. TLS Fingerprinting: By analyzing the TLS (Transport Layer Security) handshakes of incoming connections, PerimeterX tries to spot anomalies or patterns characteristic of automated tools.

    3. Request Patterns & Details: The sequence, frequency, HTTP version, and other intricate details of incoming requests are also scrutinized for signs of automated access.

    Navigating around PerimeterX

    Getting past PerimeterX requires finesse, and here are some strategies:

    1. Using Undetected-Chromedriver: This tool offers a version of Chrome's webdriver, which is not easily detected as an automated tool. When coupled with techniques like randomized user agents and delays, it can make scraping undetectable.

    2. Premium Proxies with Piloterr: Leveraging reputable proxy providers like Piloterr ensures your requests come from diverse and genuine-looking IPs, reducing the chances of getting blocked. Piloterr goes a step further by providing rotating proxies and pre-rendering JavaScript, further disguising your scraping efforts.

    3. Piloterr's Web Scraping API: If you'd prefer an all-in-one solution, consider using Piloterr's web scraping API. Not only does it encapsulate various strategies to bypass anti-scraping measures, but it also offloads the complexities, letting you focus on the data.