What is 520 status code?

  • Josselin Liebe
    Author
    by Josselin Liebe
    9 months ago
  • Response status code 520 is somewhat enigmatic, and it typically signals that the server failed to furnish a valid response. This could often be related to services like Cloudflare. While this may arise from underlying technical snags, in the terrain of web scraping, it might also indicate insufficient request details or even active blocking of the web scraper.
    To comprehend and navigate this complex situation, consider the following:

    1. Mind the Headers: Ensuring that sent requests include all the requisite headers like secret/CSRF tokens, Origin, Referer, and other common details could be the key to avoiding the 520 code. If it's a POST request, ensure the sent body conforms to the expected format.

    2. Scraper Identification: The 520 might be an indication that the scraper is being identified and thwarted. In this context, tools and techniques to disguise your scraping activities become crucial.

    For the Python 🐍 fans, here's a script that's prepared to handle 520 scenarios:

    import requests
    from time import sleep
    
    

    MAX_RETRIES = 5 WAIT_PERIOD = 5 # seconds

    def fetch_url(url, headers): retries = 0 while retries < MAX_RETRIES: response = requests.get(url, headers=headers) if response.status_code == 520: print("520 error encountered. Adjusting request and retrying... ⏳") sleep(WAIT_PERIOD * (retries + 1)) retries += 1 else: return response raise Exception("Max retries reached. Scraper identified? 🚫")

    url = "YOUR_URL_HERE" headers = { "Origin": "YOUR_ORIGIN_HERE", "Referer": "YOUR_REFERER_HERE", # Add other required headers here } response = fetch_url(url, headers) print(response.status_code)

    Navigating the 520 code might seem like a conundrum, but with the right blend of attention to detail, understanding of web scraping dynamics, and appropriate tools, it can be decoded and defused.