Response status code 429 generally indicates that the client is making too many requests. In the world of web scraping, this predicament often arises when scraping at a rapid pace.
One method to sidestep status code 429 is to decelerate our connections using rate limiting. This tactic is prevalent especially when utilizing high-scale asynchronous scrapers such as Python's 🐍 asyncio
or scrapy
.
Another strategy to evade the 429 status code is to disseminate connections across multiple agents. In this scenario, proxies and proxy rotation are invaluable.
Alternatively, the Piloterr web scraping API can be employed to automatically distribute connections, acting as a shield against the stringent rate limits enforced by certain websites.
Here's a basic Python script that utilizes the requests
library to retry requests when a 429 status code is encountered. The script will:
Slow down the request pace using the sleep
function from the time
module.
If a 429 status code is encountered, it will wait for a specified period and then retry the request.
import requests
import requests
from time import sleep
MAX_RETRIES = 5
WAIT_PERIOD = 5 # seconds
def fetch_url(url):
retries = 0
while retries < MAX_RETRIES:
response = requests.get(url)
if response.status_code == 429:
print("Rate limit encountered. Retrying after waiting...")
sleep(WAIT_PERIOD * (retries + 1)) # increasing wait time with more retries
retries += 1
else:
return response
raise Exception("Max retries reached.")
url = "YOUR_URL_HERE"
response = fetch_url(url)
print(response.status_code)