Here’s a guide on how to make concurrent requests in Python, similar to the Ruby example you shared. We’ll use the requests
and concurrent.futures
libraries to demonstrate how to perform HTTP requests concurrently.
Here’s how you can do it in Python:
import requests
from concurrent.futures import ThreadPoolExecutor
def send_request(query):
url = "https://piloterr.com/api/v2/website/crawler"
x_api_key = "YOUR-X-API-KEY" # ⚠️ Don't forget to add your API token here!
params = {
'x_api_key': x_api_key,
'query': query
}
try:
print(f"Sending request to {query}")
response = requests.get(url, params=params)
response.raise_for_status() # Handle HTTP errors
print(response.text)
except requests.exceptions.RequestException as e:
print(f"HTTP Request failed: {e}")
urls_to_scrape = [
"https://www.piloterr.com",
"https://www.piloterr.com/blog"
]
with ThreadPoolExecutor(max_workers=2) as executor:
futures = [executor.submit(send_request, url) for url in urls_to_scrape]
for future in futures:
future.result() # This retrieves the result or raises an exception if any request failed
print("Process Ended")
requests: The requests
library is used to make HTTP requests easily and intuitively.
concurrent.futures: This built-in Python library allows you to manage threading with ThreadPoolExecutor
.
ThreadPoolExecutor: It controls the number of threads you want to run concurrently using max_workers
. In this example, two threads are used to send two requests in parallel.
futures: A list that holds the results of the requests, which are retrieved once all threads finish executing.
Replace "YOUR-X-API-KEY"
with your actual API key.
Modify the urls_to_scrape
list to include all the URLs you want to scrape.
By using threads, you can send multiple requests simultaneously, reducing the total time needed to gather data.
This approach is easy to scale and adapt for handling larger volumes of concurrent requests.
I hope this guide helps you efficiently manage your scraping operations in Python!