How to Use Proxies with Python Requests (HTTP, Socks5) With Example
DataProt is supported by its audience. When you buy through links on our site, we may earn a commission. This, however, does not influence the evaluations in our reviews. Learn More.
In Python, the requests library is one of the most popular and convenient libraries for sending HTTP requests. It provides a simple API for specifying proxies, handling authentication, and dealing with multiple connection parameters.
But, this gets complicated, if you are doing it at a scale. Usually scraping involves popular websites and most of them have rate limits, blocks, and geo-restrictions set up on their servers. If you are scraping from your device, you will send more than normal requests and they have systems in place to detect such patterns and then block you, throttle your connection.
This is where we use proxies. They help us with bypassing those restrictions and blockades set by websites you want to scrape.
Here, we explore:
- how to set up HTTP and SOCKS5 proxies with requests,
- how to handle authentication and environment variables,
- how to read responses correctly, and
- how to rotate proxies (including IP addresses).
- To make it practical we included an example on scraping Google Search results.
2. Prerequisites & Installation
If you are a beginner, here are a few basic things you need to make sure you have on your computer.Python Installed: Make sure Python 3.x is installed on your system. You can check it on your terminal by typing:
python --version
or
python3 --version
2. Install requests and Additional Packages:
Even if you are a beginner, you might be aware of python packages. They do all the heavy lifting in all python applications. And for automation and scraping tasks, request is one such library or in python lingo, package.
- requests is the core HTTP library.
- requests[socks] is required if you want to handle SOCKS proxies.
- beautifulsoup4 is optional but helpful for parsing HTML responses (like in our Google scraping example).
Install these by running:
pip install requests requests[socks] beautifulsoup4
3. Install free-proxy: If you are not sure how proxies work or if you want to test free proxies, the free-proxy package can dynamically fetch free proxies from the internet. However, free proxies are often less stable or reliable.
You can install it with:
pip install free-proxy
4. Proxy Service: If you’re using a static residential proxy , you’ll need the server addresses, ports, and credentials (username and password).
If using free proxies, you’ll likely just have an IP address and port.
With these prerequisites in place, you can start coding. Let’s now walk through the process of setting up and using proxies step by step.
3. How to Use an HTTP Proxy with Python Requests
HTTP proxies as the name goes, handle basic web connection protocols, HTTP and HTTPS connections.
HTTP proxies basically route traffic through an intermediary proxy server.
Basic Setup for HTTP Proxy
Below is a minimal example demonstrating how to configure requests to use an HTTP proxy for both HTTP and HTTPS requests:
import requests
proxies = {
"http": "http://username:password@proxyserver:port",
"https": "http://username:password@proxyserver:port"
}
url = "http://httpbin.org/ip"
response = requests.get(url, proxies=proxies)
print(response.json())
Explanation
- proxies dictionary: Contains the configuration for HTTP and HTTPS protocols. Each key (“http” or “https”) is mapped to a string that represents the proxy’s URL. The proxy URL has the format:
http://username:password@proxyserver:port
If your proxy doesn’t require authentication, you can omit the username:password@ part. - requests.get(url, proxies=proxies): This sends a GET request to the specified url, routing through the proxy described in the proxies dictionary.
- response.json(): We use this to parse the JSON response. httpbin.org/ip returns your current IP address, making it an easy endpoint to test whether the proxy is working correctly.
If the request is successful and the JSON response shows an IP address different from your local IP, then you’re successfully routing your traffic through the proxy.
4. Using SOCKS5 Proxy
Why SOCKS5?
While HTTP proxies operate at the application layer (HTTP/HTTPS), SOCKS5 is more versatile because it operates at a lower level. It doesn’t interpret or modify traffic, thus can handle more types of requests and is often considered more secure or flexible. SOCKS5 proxies are especially popular for tasks requiring anonymity.
Installing Dependencies
To work with SOCKS proxies in Python’s requests library, you must install the requests[socks] extra:
pip install requests[socks]
Configuring a SOCKS5 Proxy
You can specify a SOCKS5 proxy in the proxies dictionary by using the socks5h:// scheme:
import requests
proxies = {
"http": "socks5h://username:password@proxyserver:port",
"https": "socks5h://username:password@proxyserver:port"
}
url = "http://httpbin.org/ip"
response = requests.get(url, proxies=proxies)
print(response.json())
The process here mirrors the HTTP proxy setup, except the scheme is socks5h://. If you omit authentication, you can remove username:password@.
5. Setting Proxies with Environment Variables
Why Use Environment Variables?
If your application is large or if you work with multiple scripts that all require proxies, setting proxies in environment variables is often more maintainable. This way, you can avoid hardcoding proxy details in your scripts.
Configuration on Linux/macOS
You can set environment variables in a terminal session like so:
export HTTP_PROXY="http://username:password@proxyserver:port"
export HTTPS_PROXY="http://username:password@proxyserver:port"
Once set, all your requests calls will automatically use these proxies, as Python’s requests library respects these environment variables by default.
Configuration on Windows
On Windows, you can set environment variables via the set command:
set HTTP_PROXY=http://username:password@proxyserver:port
set HTTPS_PROXY=http://username:password@proxyserver:port
Alternatively, you can set them permanently via System Properties → Environment Variables.
Security Considerations
- Security: Storing credentials in environment variables can be safer than embedding them in code, but be mindful of logging and other processes that might expose them.
Global Scope: Once set, every requests call uses them, which might not be desirable if you want different proxies for different tasks.
Proxy Authentication
Why Authentication Matters
Many premium proxy providers require authentication (username and password) for access. This helps them manage bandwidth usage, user quotas, and service reliability. If your proxy requests require credentials, you must ensure these are included in your configuration.
In-Code Authentication Configuration
Simply embed your credentials in the proxy string:
import requests
proxies = {
"http": "http://user:pass@proxyserver:port",
"https": "http://user:pass@proxyserver:port"
}
response = requests.get("http://httpbin.org/ip", proxies=proxies)
print(response.json())
Long-Term Use & Static Residential Proxies
When you have a stable or large-scale scraping operation, rotating random free proxies can be unreliable and prone to frequent failure. This can lead to incomplete data or frequent retries.
- Static Residential Proxies: A type of premium proxy service that offers real IP addresses associated with residential ISPs. Websites see your traffic as coming from a legitimate home connection. These proxies tend to have lower block rates and more consistent performance.
- Why They’re Reliable: Static residential proxies maintain a consistent pool of IP addresses for you, reducing captcha triggers or outright bans.
Reading Responses
Interpreting HTTP Responses
When using proxies, you must ensure the response is interpreted correctly. Sometimes, proxy errors or blocks can result in successful status codes even though the content is a block page or an HTML captcha.
response = requests.get("http://httpbin.org/ip", proxies=proxies)
if response.status_code == 200:
print("Success:", response.json())
else:
print("Error with status code:", response.status_code)
- status_code: Typically, 200 is good, but other codes like 403 (Forbidden), 404 (Not Found), 407 (Proxy Authentication Required), or 429 (Too Many Requests) indicate potential issues.
- response.json(): Use this if the content is JSON. If it’s HTML, you might need to parse it with a library like BeautifulSoup to detect the presence of captchas or block pages.
If you receive an error or a suspiciously short response, you might want to retry with a different proxy. This is important in large-scale scraping where each request might fail due to server load, captchas, or proxy downtime.
Requests Session with Proxies
Why Use a requests.Session()?
If you’re sending many requests, it’s more efficient to use a Session object. A Session persists cookies, connection pooling, and other settings across multiple requests, reducing overhead.
import requests
proxies = {
"http": "http://user:pass@proxyserver:port",
"https": "http://user:pass@proxyserver:port"
}
session = requests.Session()
session.proxies.update(proxies)
url = "http://httpbin.org/ip"
response = session.get(url)
print(response.json())
Explanation
- Session creation: session = requests.Session() creates a persistent connection object.
- Update proxies: session.proxies.update(proxies) applies the proxy configuration to all requests made through this session.
- Make requests: session.get(url) uses the same underlying TCP connection, saving time and resources compared to repeatedly creating new connections.
Using sessions is not only convenient but can improve performance, especially when you need to make many requests in quick succession.
Rotating Proxies with Requests
Why Rotate Proxies?
When web scraping at scale, as we mentioned earlier, you will get blocked as websites can detect your scraping tasks.
By rotating proxies you can change the IP address used for each (or every few) requests—you appear less suspicious to target websites.
This can significantly extend the lifespan of a scraping session before encountering captchas or bans.
Simple Proxy Rotation Example
Below is a quick, naïve approach to rotating proxies from a predefined list:
import requests
import random
proxy_list = [
“http://username:password@proxy1:port”,
“http://username:password@proxy2:port”,
“http://username:password@proxy3:port”
]
def get_proxy():
return {
“http”: random.choice(proxy_list),
“https”: random.choice(proxy_list)
}
for _ in range(5):
proxies = get_proxy()
response = requests.get(“http://httpbin.org/ip”, proxies=proxies, timeout=5)
print(response.json())
Breakdown
- proxy_list: We store a list of proxy URLs. These can be HTTP or SOCKS5, as needed.
- get_proxy(): This function picks a random proxy from the list for both HTTP and HTTPS.
- Loop: We run a loop for demonstration, making 5 requests. Each time, we fetch a proxy from the list.
- Timeout: We set a timeout=5 to avoid getting stuck waiting on a slow or dead proxy.
Note: This approach is simple but doesn’t handle scenarios like an unreliable proxy failing mid-request. In production, you’d likely implement retry logic and better error handling.
How to Rotate IPs with Requests
Advanced IP Rotation
When dealing with large-scale scraping or more complicated tasks, random selection may not be enough. You might want a round-robin approach or a proxy pool that is continuously tested and updated.
Round-Robin Approach
import requests
from itertools import cycle
proxy_list = [
"http://username:password@proxy1:port",
"http://username:password@proxy2:port",
"http://username:password@proxy3:port"
]
proxy_pool = cycle(proxy_list)
def fetch_with_proxy(url):
proxy = next(proxy_pool) # get next proxy in the cycle
try:
response = requests.get(url, proxies={"http": proxy, "https": proxy}, timeout=5)
return response.json()
except requests.RequestException:
return "Proxy failed, trying next..."
for _ in range(5):
data = fetch_with_proxy("http://httpbin.org/ip")
print(data)
Explanation
- Cycle: We use itertools.cycle to create a round-robin iterator over the proxy list.
- fetch_with_proxy: In each call, we get the next proxy in sequence and attempt the request.
- Error Handling: If the request fails (e.g., timeout or connection error), we handle it gracefully by returning a failure message. You might then attempt the same request again with the next proxy in the sequence.
- Balance: This approach distributes requests more evenly among the proxies, which might be beneficial if your provider has specific rate limits per IP.
Using Premium Services for IP Rotation
Rather than manually rotating proxies, many proxy providers offer auto-rotating proxy endpoints. In this case, you simply point your requests to a single “gateway” proxy, and the provider manages IP rotation on their end. This can be more convenient and reliable, though it usually comes at a higher cost.
Example: Scraping Google Search with Proxies
Why Google is a Challenge
Below is a simplified script that queries Google for a keyword and extracts the text from <h3> elements, which often represent result titles:
import requests
from bs4 import BeautifulSoup
import random
proxy_list = [
"http://username:password@proxy1:port",
"http://username:password@proxy2:port",
]
def get_proxy():
return {
"http": random.choice(proxy_list),
"https": random.choice(proxy_list)
}
def scrape_google(query):
url = f"https://www.google.com/search?q={query}"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
}
proxies = get_proxy()
try:
response = requests.get(url, headers=headers, proxies=proxies, timeout=5)
# Optional: Check for captchas or other indicators in response.text
soup = BeautifulSoup(response.text, "html.parser")
results = [a.text for a in soup.select("h3")]
return results
except requests.RequestException:
return "Proxy failed, retrying with another..."
print(scrape_google("python requests proxy"))
Explanation
- proxy_list: We store a few proxies for demonstration. In practice, you’d have a larger list or a premium service managing rotation.
- get_proxy(): Picks a random proxy for each request.
- scrape_google(query):
- Builds a Google search URL, e.g., https://www.google.com/search?q=python%20requests%20proxy.
- Sends a GET request with a desktop User-Agent to avoid Google’s mobile or simplified pages.
- Parses the response HTML with BeautifulSoup to select <h3> elements, which usually contain the title of each result.
- Returns a list of these title strings.
- Failure Handling: If the request fails because the proxy is down or too slow, we return a message. In a robust scenario, you might switch to the next proxy and retry.
Best Practices for Google Scraping
- Respect Rate Limits: Even with proxies, send requests at a moderate pace (e.g., 1 request every few seconds) to avoid detection.
- User-Agent Rotation: Use random User-Agents to reduce signature detection.
- Captcha Handling: Google might return a captcha page if it suspects automation. Detect this in the response and implement a fallback strategy.
Most Common Errors When Using Proxies
When dealing with proxies, you’ll occasionally encounter errors. Here are some of the most frequent ones:
- 407 Proxy Authentication Required
- Cause: Invalid or missing proxy credentials for an authenticated proxy.
- Solution: Double-check your username and password. Ensure you’ve included them correctly: http://user:pass@proxyserver:port.
- 403 Forbidden
- Cause: The server has blocked your IP. This often happens if the target website denies your proxy IP range or if you’re scraping too aggressively.
- Solution: Switch proxies, reduce request frequency, or consider a higher-quality proxy network.
- Connection Timeout
- Cause: The proxy is too slow or unresponsive.
- Solution: Use a shorter timeout and retry with a different proxy. Or ensure your proxy provider has enough bandwidth for your tasks.
- SSL Errors
- Cause: Some proxies may not handle TLS/SSL seamlessly.
- Solution: For debugging, you can temporarily disable SSL verification:
requests.get(url, proxies=proxies, verify=False)
- However, disabling verification isn’t recommended in production because it weakens security.
- 429 Too Many Requests
- Cause: The target website is throttling or rate-limiting your IP (or your entire proxy subnet).
- Solution: Decrease your request rate, increase the variety of IP addresses, and adopt more advanced rotation strategies.
5. 429 Too Many Requests
- Cause: The target website is throttling or rate-limiting your IP (or your entire proxy subnet).
- Solution: Decrease your request rate, increase the variety of IP addresses, and adopt more advanced rotation strategies.