Python Requests
last modified May 9, 2026
In this article, we explore practical techniques for working with the Python Requests module, one of the most widely used libraries for interacting with web resources. Step by step, we demonstrate how to retrieve data from remote servers, submit form data, upload JSON payloads, stream large responses efficiently, and establish secure HTTPS connections. Each concept is illustrated with clear, real-world examples.
To make the examples concrete, we interact with several types of back-end services: a public online API, an Nginx server configured for local testing, Python's built-in HTTP server, and a small Flask application. This variety shows how Requests behaves across different environments and how to adapt your code to each scenario.
The Hypertext Transfer Protocol (HTTP) is the underlying application protocol that enables communication across the Web. It defines how clients and servers exchange messages, how resources are addressed, and how data is transferred in a reliable, extensible way. Because HTTP is the foundation of nearly all modern web services, understanding how to work with it programmatically is essential for tasks such as automation, data collection, testing, and system integration.In this article, we explore practical techniques for working with the Python Requests module, one of the most widely used libraries for interacting with web resources. Step by step, we demonstrate how to retrieve data from remote servers, submit form data, upload JSON payloads, stream large responses efficiently, and establish secure HTTPS connections. Each concept is illustrated with clear, real-world examples.
The Hypertext Transfer Protocol (HTTP) is the underlying application protocol that enables communication across the Web. It defines how clients and servers exchange messages, how resources are addressed, and how data is transferred in a reliable, extensible way. Because HTTP is the foundation of nearly all modern web services, understanding how to work with it programmatically is essential for tasks such as automation, data collection, testing, and system integration.
Python requests
Requests is a simple and elegant Python HTTP library. It provides
methods for accessing Web resources via HTTP. The library abstracts away the
low-level details of working with sockets, headers, and query strings, giving
the programmer a clean and intuitive API. With a few lines of code, we can send
GET and POST requests, handle cookies, upload files, work with JSON data, or
communicate with RESTful services. Requests focuses on readability
and convenience, making everyday HTTP tasks straightforward and pleasant to
write.
Library version
The first program prints the version of the Requests library.
#!/usr/bin/python import requests print(requests.__version__) print(requests.__copyright__)
The program prints the version and copyright of Requests.
$ ./version.py 2.33.1 Copyright Kenneth Reitz
Reading a web page
The get function issues a GET request and returns a
Response object containing the status code, headers, and body of
the server's reply.
#!/usr/bin/python import requests url = "https://example.com" resp = requests.get(url) print(resp.text)
The text attribute exposes the body decoded as Unicode;
content gives the same data as raw bytes.
Resource management
When requests.get is called without a with block,
the underlying TCP connection is not closed immediately after the response is
read — it is returned to an internal connection pool and kept alive for
potential reuse. For a short script like this one that is perfectly fine; the
interpreter releases everything on exit. In longer-running programs, however,
responses that are never explicitly closed can hold sockets open longer than
necessary, and under high request volume that can exhaust the connection pool.
#!/usr/bin/python
import requests
with requests.get("https://example.com") as resp:
resp.raise_for_status()
print(resp.text)
The with block calls resp.close() automatically on
exit — whether the body was read successfully or an exception was raised —
returning the underlying TCP socket to the connection pool immediately. The
raise_for_status call is added here as a matter of habit: any
production code that reads a page should verify it actually received a valid
response before trying to use the content.
Stripping HTML tags
The following program gets a small web page and strips its HTML tags.
#!/usr/bin/python
import requests
import re
url = "https://example.com"
with requests.get(url) as resp:
resp.raise_for_status()
content = resp.text
stripped = re.sub('<[^<]+?>', '', content)
print(stripped)
The script strips the HTML tags of the https://example.com
web page.
stripped = re.sub('<[^<]+?>', '', content)
A simple regular expression is used to strip the HTML tags. For more complex HTML documents, consider using a library like Beautiful Soup instead of regular expressions for more robust parsing.
Getting status
The Response object contains a server's response to an HTTP
request. Its status_code attribute returns HTTP status code of the
response, such as 200 or 404.
#!/usr/bin/python
import requests
url = "https://example.com"
with requests.get(url) as resp:
print(resp.status_code)
url = "https://example.com/news"
with requests.get(url) as resp:
print(resp.status_code)
We perform two HTTP requests with the get method
and check for the returned status.
$ ./get_status.py 200 404
200 is a standard response for successful HTTP requests and 404 tells that the requested resource could not be found.
The Response object
Every request method — get, post, put,
and the rest — returns a Response object. It contains everything
the server sent back: the status code, headers, body, and metadata about the
exchange itself. The table below covers the attributes and methods you will
reach for most often.
| Attribute / method | Description |
|---|---|
status_codeint |
The HTTP status code returned by the server, such as 200,
301, or 404. Use raise_for_status()
to turn error codes into exceptions rather than checking this value
manually. |
okbool |
True when status_code is less than
400, False otherwise. Convenient for a quick
success check, but raise_for_status() is safer in
production code because it forces the caller to handle the failure
explicitly rather than risk silently ignoring it. |
headersCaseInsensitiveDict |
The response headers as a dictionary-like object. Key lookup is
case-insensitive, so resp.headers['content-type'] and
resp.headers['Content-Type'] are equivalent. Use
.get(key, default) to avoid a KeyError when
a header may be absent. |
textstr |
The response body decoded as a Unicode string. The encoding is inferred
from the Content-Type header or detected by
chardet if the header is absent. Suitable for HTML, JSON,
XML, and any other text-based content. |
contentbytes |
The raw, undecoded response body as a byte string. Use this for binary
responses such as images, PDFs, or ZIP files, and when passing the body
to a parser that expects bytes (e.g. lxml.html.fromstring).
|
json()Any |
Decodes the response body as JSON and returns the corresponding Python
object — typically a dict or list. Raises a
ValueError if the body is not valid JSON, regardless of the
Content-Type header. Equivalent to
json.loads(resp.text) but raises a more descriptive
exception on failure. |
encodingstr |
The encoding used to decode the body when accessing text.
Inferred from the Content-Type header by default. Can be
set explicitly before accessing text if the server returns
an incorrect or missing charset declaration:
resp.encoding = 'utf-8'. |
urlstr |
The final URL of the response after all redirects have been followed. Useful when the original URL was a shortlink or a redirect that resolves to a canonical address. |
historylist[Response] |
A list of Response objects for any redirects that occurred
before reaching the final response, ordered oldest to newest. Empty when
no redirects took place. Each entry has its own status_code
and headers, making it possible to inspect the full redirect
chain. |
cookiesRequestsCookieJar |
Cookies set by the server in this response, exposed as a
dictionary-like object. Can be passed directly to the cookies
parameter of a subsequent request to send them back, or merged into a
Session for automatic handling across all future
requests. |
elapsedtimedelta |
The time between sending the request and receiving the first byte of
the response headers. Does not include the time spent reading the body.
Useful for basic performance measurement and for logging slow requests.
Access the value in seconds with
resp.elapsed.total_seconds(). |
requestPreparedRequest |
The PreparedRequest object that was sent to produce this
response. Useful for debugging: resp.request.headers shows
the exact headers that left the client, and resp.request.body
shows the serialised request body. |
raise_for_status()None |
Raises requests.exceptions.HTTPError if the status code
indicates a client error (4xx) or server error
(5xx). Does nothing for successful responses
(2xx) and redirects (3xx). The exception
carries the original Response object in its
response attribute, so the status code and body are still
accessible inside the handler. |
The following example exercises several of these attributes against
httpbin.org, which returns a JSON description of the request it
received — making it straightforward to verify what the client actually sent.
#!/usr/bin/python
import requests
url = "https://httpbin.org/get"
try:
with requests.get(url, params={"lang": "python"}, timeout=(5, 10)) as resp:
resp.raise_for_status()
print(f"Status code: {resp.status_code}")
print(f"OK: {resp.ok}")
print(f"Final URL: {resp.url}")
print(f"Redirects: {len(resp.history)}")
print(f"Elapsed: {resp.elapsed.total_seconds():.3f} s")
print(f"Encoding: {resp.encoding}")
print(f"Content-Type: {resp.headers.get('content-type', 'n/a')}")
print(f"Sent headers: {dict(resp.request.headers)}")
print()
print("Body (JSON):")
data = resp.json()
for key, value in data.items():
print(f" {key}: {value}")
except requests.exceptions.RequestException as e:
print(f"Request failed: {e}")
The program sends a GET request to httpbin.org/get with a query
parameter lang=python. The resp.json() method parses
the JSON response body and returns a Python dictionary. The output shows the
status code, final URL, headers sent by the client, and the JSON body returned
by the server, which includes the query parameters and other request details.
$ ./response_object.py
Status code: 200
OK: True
Final URL: https://httpbin.org/get?lang=python
Redirects: 0
Elapsed: 0.621 s
Encoding: utf-8
Content-Type: application/json
Sent headers: {'User-Agent': 'python-requests/2.33.1', 'Accept-Encoding':
'gzip, deflate, zstd', 'Accept': '*/*', 'Connection': 'keep-alive'}
Body (JSON):
args: {'lang': 'python'}
...
resp.request.headers shows the headers the library added
automatically — User-Agent, Accept-Encoding, and
Connection — without any explicit configuration. These defaults
are applied by the PreparedRequest layer before the request
leaves the client and can be overridden by passing a headers
dictionary to the request method.
Python requests raise_for_status
The raise_for_status method inspects the HTTP response code and
raises an HTTPError exception for any response that signals a
client error (4xx) or server error (5xx). For
successful responses (2xx) it does nothing, making it a concise
way to treat bad status codes as exceptions without manually checking
resp.status_code after every request.
While resp.status_code simply exposes the raw integer returned by
the server, raise_for_status adds a decision on top of it — turning
error codes into exceptions so they cannot be silently ignored. Using
status_code directly requires an explicit check such as
if resp.status_code == 200 or if resp.ok after every
request, and it is easy to forget or handle inconsistently across a codebase.
raise_for_status centralises that logic in one call: successful
responses pass through untouched, while anything in the 4xx or
5xx range immediately raises an HTTPError that must
be handled — or will propagate up the call stack, making the failure visible
rather than hidden behind a status integer that nobody checked.
#!/usr/bin/python
import requests
urls = [
"https://example.com", # 200 OK — succeeds
"https://httpbin.org/status/404", # 404 Not Found — client error
"https://httpbin.org/status/500", # 500 Internal Server Error — server error
]
for url in urls:
print(f"GET {url}")
try:
resp = requests.get(url)
resp.raise_for_status()
print(f" Status code: {resp.status_code} — OK\n")
except requests.exceptions.HTTPError as e:
print(f" HTTP error: {e}\n")
The httpbin.org/status/{code} endpoint returns whatever HTTP status
code is embedded in the URL, making it ideal for testing error-handling paths
without needing a broken server. The 404 triggers a client-error
branch and the 500 a server-error branch — both surfaced as an
HTTPError with the status code and reason phrase included in the
message.
$ ./raise_for_status.py GET https://example.com Status code: 200 — OK GET https://httpbin.org/status/404 HTTP error: 404 Client Error: NOT FOUND for url: https://httpbin.org/status/404 GET https://httpbin.org/status/500 HTTP error: 500 Server Error: INTERNAL SERVER ERROR for url: https://httpbin.org/status/500
HTTPError is a subclass of requests.exceptions.RequestException,
the base class for every exception the library raises. If you need to distinguish
between client and server errors, inspect e.response.status_code
inside the handler — values in the 400–499 range indicate a problem
with the request itself, while 500–599 point to a fault on the
server side.
RequestException
Every exception the requests library raises is a subclass of
requests.exceptions.RequestException, the root of its exception
hierarchy. Catching it in a single except clause is therefore
sufficient to handle any transport-level failure — but knowing the subclasses
lets you respond to each failure mode appropriately rather than treating all
errors the same way.
The hierarchy below shows the most commonly encountered exceptions and their inheritance relationships.
RequestException ├── ConnectionError │ ├── ProxyError │ └── SSLError ├── HTTPError # raised by resp.raise_for_status() ├── URLRequired ├── TooManyRedirects ├── Timeout │ ├── ConnectTimeout │ └── ReadTimeout └── InvalidURL
The most specific exceptions are at the bottom of the tree, and the most general at the top. This means that when you want to handle different failure modes separately, the specific exceptions must be caught before the more general ones — otherwise the base class will catch every error and make the specific branches unreachable.
#!/usr/bin/python
import requests
URLS = [
"https://httpbin.org/status/404", # HTTPError
"https://httpbin.org/delay/10", # ReadTimeout
"https://httpbin.org/get", # success
"https://invalid.invalid", # ConnectionError
"http://", # InvalidURL
]
for url in URLS:
print(f"GET {url}")
try:
with requests.get(url, timeout=(5, 3)) as resp:
resp.raise_for_status()
print(f" OK — status {resp.status_code}\n")
except requests.exceptions.ConnectTimeout:
print(" ConnectTimeout — server did not accept the connection in time\n")
except requests.exceptions.ReadTimeout:
print(" ReadTimeout — server connected but stalled mid-response\n")
except requests.exceptions.HTTPError as e:
print(f" HTTPError — {e}\n")
except requests.exceptions.SSLError as e:
print(f" SSLError — certificate or handshake failure: {e}\n")
except requests.exceptions.ProxyError as e:
print(f" ProxyError — could not connect through proxy: {e}\n")
except requests.exceptions.ConnectionError as e:
print(f" ConnectionError — DNS failure or refused connection: {e}\n")
except requests.exceptions.TooManyRedirects:
print(" TooManyRedirects — redirect loop detected\n")
except requests.exceptions.InvalidURL as e:
print(f" InvalidURL — malformed URL: {e}\n")
except requests.exceptions.RequestException as e:
print(f" Unexpected error: {e}\n")
The exceptions are ordered from most specific to most general, which is how
Python resolves except clauses — top to bottom, first match wins.
Because ConnectTimeout and ReadTimeout are both
subclasses of Timeout, and Timeout itself is a
subclass of RequestException, placing the base class first would
swallow every subclass and make the specific branches unreachable.
$ ./exceptions.py GET https://httpbin.org/status/404 HTTPError — 404 Client Error: NOT FOUND for url: https://httpbin.org/status/404 GET https://httpbin.org/delay/10 ReadTimeout — server connected but stalled mid-response GET https://httpbin.org/get OK — status 200 GET https://invalid.invalid ConnectionError — DNS failure or refused connection: ... GET http:// InvalidURL — malformed URL: ...
When to catch the base class
In code where the reaction to any failure is the same — log the error, return a
default value, show a generic message to the user — catching
RequestException alone keeps the handler concise:
def fetch(url):
try:
with requests.get(url, timeout=(5, 10)) as resp:
resp.raise_for_status()
return resp.json()
except requests.exceptions.RequestException as e:
print(f"Could not fetch {url}: {e}")
return None
This pattern is appropriate for helper functions whose callers do not care about the specific cause of failure — only whether a result was returned. When the caller needs to distinguish between a server being down and a server returning a bad response, the specific subclasses should be caught and re-raised or handled individually instead.
HTTPError and raise_for_status
HTTPError is the one exception in the hierarchy that is never
raised automatically — it only fires when you explicitly call
resp.raise_for_status(). A 404 or
500 response does not raise anything on its own; without that call,
resp.status_code simply holds the error code and execution
continues normally. This is by design — some applications treat
404 as a valid negative result rather than a failure — but it means
that omitting raise_for_status in code that expects a successful
response can lead to subtle bugs where an error body is silently processed as
real data.
HTTP request methods
An HTTP request is a message sent from a client to a server asking it
to perform a specific action. The action is identified by a
method — the most common being GET (retrieve a resource),
POST (submit data), PUT (replace a resource), and
HEAD (retrieve headers only, without a body). The
requests library exposes each method as a dedicated function:
requests.get, requests.post,
requests.put, requests.head, and so on. All of them
are thin wrappers around the lower-level requests.request, which
accepts the method name as an explicit string argument when none of the
shortcuts fit.
#!/usr/bin/python
import requests
url = "https://example.com"
with requests.head(url, timeout=(5, 10)) as resp:
resp.raise_for_status()
print("Server: ", resp.headers.get("server", "n/a"))
print("Last modified: ", resp.headers.get("last-modified", "n/a"))
print("Content type: ", resp.headers.get("content-type", "n/a"))
A HEAD request asks the server to return only the response headers,
omitting the body entirely. This makes it useful for checking metadata — content
type, last-modified date, server software — without paying the cost of
transferring the full document. The headers are accessed through the
resp.headers dictionary, which is case-insensitive. Using
.get with a fallback of "n/a" avoids a
KeyError when a header is absent, since not every server includes
all three fields.
$ ./head_request.py Server: cloudflare Last modified: Wed, 06 May 2026 14:17:14 GMT Content type: text/html
Python requests get method
The get method issues a GET request to the server.
The GET method requests a representation of the specified resource.
The httpbin.org is a freely available HTTP Request & Response Service.
#!/usr/bin/python
import requests
url = "https://httpbin.org/get?name=Peter"
with requests.get(url) as resp:
resp.raise_for_status()
print(resp.text)
The script sends a variable with a value to the httpbin.org
server. The variable is specified directly in the URL.
$ ./mget.py
{
"args": {
"name": "Peter"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate, zstd",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.33.1",
"X-Amzn-Trace-Id": "Root=1-69ff0896-272a70fa29b81c2556b6fffb"
},
...
"url": "https://httpbin.org/get?name=Peter"
}
#!/usr/bin/python
import requests
payload = {'name': 'Peter', 'age': 23}
url = "https://httpbin.org/get"
with requests.get(url, params=payload) as resp:
resp.raise_for_status()
print(resp.url)
print(resp.text)
The get method takes a params parameter where
we can specify the query parameters.
payload = {'name': 'Peter', 'age': 23}
The data is sent in a Python dictionary.
resp = req.get("https://httpbin.org/get", params=payload)
We send a GET request to the httpbin.org site and
pass the data, which is specified in the params parameter.
print(resp.url) print(resp.text)
We print the URL and the response content to the console.
Python requests timeout attribute
The timeout attribute defines how long (in seconds) the client
waits for a server response before raising an exception. Without it, a request
can hang indefinitely if the remote server is slow or unresponsive — making it
essential for any production code.
The timeout applies to two distinct phases: the connection phase
(establishing the TCP handshake) and the read phase (waiting
for the server to send data). You can control both independently by passing a
tuple (connect, read), or set a single value that applies to each
phase separately.
#!/usr/bin/python
import requests
TIMEOUT = (3, 5)
urls = [
"https://example.com", # responds immediately
"https://httpbin.org/delay/10", # deliberately delays 10 s — will time out
]
for url in urls:
print(f"GET {url}")
try:
with requests.get(url, timeout=TIMEOUT) as resp:
resp.raise_for_status()
print(f" Status code: {resp.status_code}\n")
except requests.exceptions.ConnectTimeout:
print(" Connection timed out — server not reachable\n")
except requests.exceptions.ReadTimeout:
print(" Read timed out — server connected but did not respond in time\n")
except requests.exceptions.Timeout:
print(" Request timed out\n")
The first URL responds normally within the allowed window. The second points to
httpbin.org/delay/10, which deliberately waits 10 seconds before
replying — well beyond the 5-second read timeout — so a ReadTimeout
is raised instead of receiving a response.
TIMEOUT = (3, 5)
The timeout is set to 3 seconds for the connection phase and 5 seconds for the read phase.
$ ./timeout_request.py GET https://example.com Status code: 200 GET https://httpbin.org/delay/10 Read timed out — server connected but did not respond in time
Both ConnectTimeout and ReadTimeout are subclasses of
requests.exceptions.Timeout, so catching the base class alone is
sufficient when you do not need to distinguish between the two failure modes.
Splitting them — as shown above — lets you react differently: a
ReadTimeout against a live server may be worth retrying, while a
ConnectTimeout against an unreachable host usually is not.
Python requests redirection
Redirection is a process of forwarding one URL to a different URL. The HTTP response status code 301 Moved Permanently is used for permanent URL redirection; 302 Found for a temporary redirection.
#!/usr/bin/python
import requests
url = "https://httpbin.org/redirect-to?url=/"
with requests.get(url) as resp:
resp.raise_for_status()
print(resp.status_code)
print(resp.history)
print(resp.url)
In the example, we issue a GET request to the
https://httpbin.org/redirect-to page. This page redirects to
another page; redirect responses are stored in the
history attribute of the response.
$ ./redirect.py 200 [<Response [302]>] https://httpbin.org/
A GET request to https://httpbin.org/redirect-to was 302 redirected to
https://httpbin.org.
In the second example, we do not follow a redirect.
#!/usr/bin/python
import requests
url = "https://httpbin.org/redirect-to?url=/"
with requests.get(url, allow_redirects=False) as resp:
resp.raise_for_status()
print(resp.status_code)
print(resp.url)
The allow_redirects parameter specifies whether the redirect
is followed; the redirects are followed by default.
$ ./redirect2.py 302 https://httpbin.org/redirect-to?url=/
Redirect with nginx
In the next example, we show how to set up a page redirect in nginx server.
location = /oldpage.html {
return 301 /newpage.html;
}
Add these lines to the nginx configuration file, which is located at
/etc/nginx/sites-available/default on Debian.
$ sudo service nginx restart
After the file has been edited, we must restart nginx to apply the changes.
<!DOCTYPE html> <html> <head> <title>Old page</title> </head> <body> <p> This is old page </p> </body> </html>
This is the oldpage.html file located in the nginx document root.
<!DOCTYPE html> <html> <head> <title>New page</title> </head> <body> <p> This is a new page </p> </body> </html>
This is the newpage.html.
#!/usr/bin/python
import requests
url = "http://localhost/oldpage.html"
with requests.get(url) as resp:
resp.raise_for_status()
print(resp.status_code)
print(resp.history)
print(resp.url)
print(resp.text)
This script accesses the old page and follows the redirect. As we already mentioned, Requests follows redirects by default.
$ ./redirect3.py 200 (<Response [301]>,) http://localhost/files/newpage.html <!DOCTYPE html> <html> <head> <title>New page</title> </head> <body> <p> This is a new page </p> </body> </html>
$ sudo tail -2 /var/log/nginx/access.log 127.0.0.1 - - [21/Jul/2019:07:41:27 -0400] "GET /oldpage.html HTTP/1.1" 301 184 "-" "python-requests/2.4.3 CPython/3.4.2 Linux/3.16.0-4-amd64" 127.0.0.1 - - [21/Jul/2019:07:41:27 -0400] "GET /newpage.html HTTP/1.1" 200 109 "-" "python-requests/2.4.3 CPython/3.4.2 Linux/3.16.0-4-amd64"
As we can see from the access.log file, the request was redirected
to a new file name. The communication consisted of two GET requests.
User agent
A user agent is a short identification string that an HTTP client
sends to a server in the User-Agent header. Every browser, crawler,
and script has one. It tells the server who is making the request and often
what capabilities that client has. Web servers use this information for
logging, analytics, content negotiation, or to apply special handling for
specific clients.
When we write our own Python HTTP server, we can choose any user-agent name we like. This makes it easy to distinguish our requests from browsers, automated tools, or other scripts in server logs. It also helps when debugging, because the server can immediately see that the request came from our custom client rather than from Chrome, Firefox, or a bot.
#!/usr/bin/python
from http.server import BaseHTTPRequestHandler, HTTPServer
class MyHandler(BaseHTTPRequestHandler):
def do_GET(self):
message = "Hello there"
self.send_response(200)
if self.path == "/agent":
message = self.headers["user-agent"]
self.send_header("Content-type", "text/html")
self.end_headers()
self.wfile.write(bytes(message, "utf8"))
return
if __name__ == "__main__":
with HTTPServer(("127.0.0.1", 8081), MyHandler) as server:
print("Listening on http://127.0.0.1:8081")
server.serve_forever()
We have a simple Python HTTP server.
if self.path == '/agent':
message = self.headers['user-agent']
If the path contains '/agent', we return
the specified user agent.
#!/usr/bin/python
import requests
headers = {'user-agent': 'Python script'}
url = "http://localhost:8081/agent"
with requests.get(url, headers=headers) as resp:
resp.raise_for_status()
print(resp.text)
This script creates a simple GET request to our Python HTTP server.
To add HTTP headers to a request, we pass in a dictionary to the
headers parameter.
headers = {'user-agent': 'Python script'}
The header values are placed in a Python dictionary.
resp = requests.get("http://localhost:8081/agent", headers=headers)
The values are passed to the headers parameter.
$ ./http_server.py starting server on port 8081...
First, we start the server.
$ ./user_agent.py Python script
Then we run the script. The server responded with the name of the agent that we have sent with the request.
Python requests post value
The post method dispatches a POST request on the given
URL, providing the key/value pairs for the fill-in form content.
#!/usr/bin/python
import requests
data = {'name': 'Peter'}
url = "https://httpbin.org/post"
with requests.post(url, data=data) as resp:
resp.raise_for_status()
print(resp.text)
The script sends a request with a name key having
Peter value. The POST request is issued with the post
method.
$ ./post_value.py
{
"args": {},
"data": "",
"files": {},
"form": {
"name": "Peter"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "10",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.21.0"
},
"json": null,
...
}
This is the output of the post_value.py script.
Python requests upload image
In the following example, we are going to upload an image. We create a web application with Flask.
#!/usr/bin/python
import os
from flask import Flask, request
app = Flask(__name__)
@app.route("/")
def home():
return 'This is home page'
@app.route("/upload", methods=['POST'])
def handleFileUpload():
msg = 'failed to upload image'
if 'image' in request.files:
photo = request.files['image']
if photo.filename != '':
photo.save(os.path.join('.', photo.filename))
msg = 'image uploaded successfully'
return msg
if __name__ == '__main__':
app.run()
This is a simple application with two endpoints. The /upload
endpoint checks if there is some image and saves it to the current directory.
#!/usr/bin/python
import requests
url = 'http://localhost:5000/upload'
image_file = 'data/sid.jpg'
with open(image_file, 'rb') as f:
files = {'image': f}
with requests.post(url, files=files) as resp:
resp.raise_for_status()
print(resp.text)
We send the image to the Flask application. The file is specified
in the files attribute of the post method.
JSON
JSON is a
lightweight text format for exchanging structured data. It maps directly onto
Python dictionaries and lists, making it the natural choice for HTTP APIs:
the requests library can both decode incoming JSON and serialise
outgoing data automatically, without any manual calls to the
json module.
Reading JSON from a server
When the server sets Content-Type: application/json,
resp.json() decodes the body and returns a Python object — a
dictionary, list, or scalar, depending on what the server sent. It is
equivalent to calling json.loads(resp.text) but raises a more
informative exception if decoding fails.
#!/usr/bin/python
import json
from http.server import BaseHTTPRequestHandler, HTTPServer
class Handler(BaseHTTPRequestHandler):
def do_GET(self):
payload = json.dumps({"name": "Jane", "age": 17}).encode()
self.send_response(200)
self.send_header("Content-Type", "application/json; charset=utf-8")
self.send_header("Content-Length", str(len(payload)))
self.end_headers()
self.wfile.write(payload)
if __name__ == "__main__":
with HTTPServer(("127.0.0.1", 8000), Handler) as server:
print("Listening on http://127.0.0.1:8000")
server.serve_forever()
This is a simple HTTP server that returns a JSON response to any GET request.
The Content-Type header is set to application/json to
indicate that the response body is JSON. The body itself is a JSON-encoded
dictionary containing a name and age.
#!/usr/bin/python
import requests
url = "http://127.0.0.1:8000"
try:
with requests.get(url, timeout=(5, 10)) as resp:
resp.raise_for_status()
data = resp.json()
print(f"Name: {data['name']}, age: {data['age']}")
except requests.exceptions.RequestException as e:
print(f"Request failed: {e}")
except ValueError as e:
print(f"Failed to decode JSON: {e}")
resp.json() raises a ValueError if the response body
is not valid JSON — for example when a proxy returns an HTML error page with a
200 status. Catching it separately from
RequestException keeps network failures and malformed responses
as distinct error cases.
$ ./read_json.py Name: Jane, age: 17
Sending JSON to a server
Passing a dictionary to the json parameter of
requests.post serialises it automatically and sets the
Content-Type header to application/json. This is
the preferred approach over manually calling json.dumps and
setting the header by hand.
#!/usr/bin/python
import json
from http.server import BaseHTTPRequestHandler, HTTPServer
class Handler(BaseHTTPRequestHandler):
def do_POST(self):
length = int(self.headers.get("Content-Length", 0))
try:
data = json.loads(self.rfile.read(length))
except json.JSONDecodeError:
self.send_response(400)
self.send_header("Content-Type", "text/plain; charset=utf-8")
self.end_headers()
self.wfile.write(b"400 Bad Request: invalid JSON\n")
return
lines = [f"{key}: {value}" for key, value in data.items()]
payload = "\n".join(lines).encode()
self.send_response(200)
self.send_header("Content-Type", "text/plain; charset=utf-8")
self.send_header("Content-Length", str(len(payload)))
self.end_headers()
self.wfile.write(payload)
if __name__ == "__main__":
with HTTPServer(("127.0.0.1", 8000), Handler) as server:
print("Listening on http://127.0.0.1:8000")
server.serve_forever()
This server accepts POST requests with a JSON body, decodes it, and returns a
plain-text summary of the fields. If the body is not valid JSON, it responds
with400 Bad Request without attempting to process the data further.
#!/usr/bin/python
import requests
url = "http://127.0.0.1:8000"
data = {"name": "Jane", "age": 17}
try:
with requests.post(url, json=data, timeout=(5, 10)) as resp:
resp.raise_for_status()
print(resp.text)
except requests.exceptions.RequestException as e:
print(f"Request failed: {e}")
The server reads Content-Length to know exactly how many bytes to
consume from the request body, then attempts to parse them as JSON. If parsing
fails it returns 400 Bad Request immediately, before touching the
data, which prevents a malformed payload from propagating further into the
handler. The valid path iterates over the decoded dictionary and returns a
plain-text summary — one key: value line per field.
$ ./send_json.py name: Jane age: 17
Working with cookies
A cookie is a small piece of data the server asks the browser (or
any HTTP client) to store and send back with every subsequent request to the
same origin. Cookies are the standard mechanism for maintaining state across
otherwise stateless HTTP connections — session tokens, user preferences, and
tracking identifiers are all commonly stored this way. The client sends
cookies in the Cookie request header; the server sets them via
the Set-Cookie response header.
The following examples use a minimal http.server server running
locally alongside a requests client, so the full exchange is
visible at both ends without depending on an external service.
| Header / attribute | Type | Description |
|---|---|---|
Cookie |
request | Carries all cookies the client holds for the current origin, serialised as
name=value pairs separated by ; . Set automatically
by the browser; in requests, populated via the
cookies parameter or a Session object. |
Set-Cookie |
response | Instructs the client to store a cookie. One header per cookie; repeated as
many times as needed. The value contains the cookie name and value followed
by optional attributes separated by ; . For example:
sessionid=abc123; Path=/; HttpOnly; Max-Age=3600. |
Expires |
attribute | Sets an absolute expiry date in RFC 1123 format (e.g.
Thu, 01 Jan 2026 00:00:00 GMT). When the date is reached the
cookie is deleted. Omitting both Expires and
Max-Age creates a session cookie that is discarded when the
browser closes. Superseded by Max-Age in modern clients when
both are present. |
Max-Age |
attribute | Sets a relative lifetime in seconds from the moment the cookie is received.
Max-Age=3600 expires the cookie after one hour. A value of
0 or negative deletes the cookie immediately, which is the
standard way to invalidate a cookie on logout. Takes precedence over
Expires in all modern browsers. |
Domain |
attribute | Specifies which hosts may receive the cookie.
Domain=example.com includes all subdomains
(api.example.com, www.example.com, etc.).
Omitting the attribute restricts the cookie to the exact host that set it,
excluding subdomains. |
Path |
attribute | Limits the cookie to URLs whose path begins with the given value.
Path=/admin sends the cookie only on requests to
/admin and its sub-paths. Path=/ sends it on
every request to the domain, which is the most common setting. |
Secure |
attribute | A flag (no value) that prevents the cookie from being sent over plain HTTP. The browser only includes it in requests made over HTTPS, protecting it from interception on unencrypted connections. Should always be set on session and authentication cookies in production. |
HttpOnly |
attribute | A flag that makes the cookie inaccessible to JavaScript —
document.cookie will not include it. This limits the blast
radius of an XSS attack: even if an attacker injects a script, they cannot
read session tokens marked HttpOnly. |
SameSite |
attribute | Controls whether the cookie is sent on cross-site requests, mitigating CSRF
attacks. Three values: Strict — never sent on cross-site
requests; Lax — sent on top-level navigations (e.g. clicking a
link) but not on embedded requests such as images or iframes (the browser
default since 2020); None — always sent, but requires
Secure to be set. |
The table covers both HTTP headers (Cookie and
Set-Cookie) and all standard Set-Cookie attributes.
The type column distinguishes between what the client sends, what the server
sends, and what travels as part of a Set-Cookie value rather than
as a standalone header.
Sending cookies from the client
The cookies parameter of requests.get accepts a
plain dictionary. The library serialises it into a Cookie header
automatically before sending the request.
#!/usr/bin/python
from http.server import BaseHTTPRequestHandler, HTTPServer
class Handler(BaseHTTPRequestHandler):
def do_GET(self):
cookies = self.headers.get("Cookie", "(no cookies)")
body = f"Received cookies: {cookies}\n".encode()
self.send_response(200)
self.send_header("Content-Type", "text/plain; charset=utf-8")
self.send_header("Content-Length", str(len(body)))
self.end_headers()
self.wfile.write(body)
if __name__ == "__main__":
with HTTPServer(("127.0.0.1", 8000), Handler) as server:
print("Listening on http://127.0.0.1:8000")
server.serve_forever()
This is a simple HTTP server that reads the Cookie header from
incoming GET requests and echoes it back in the response body. If no cookies are
sent, it responds with a message indicating that as well.
#!/usr/bin/python
import requests
url = "http://127.0.0.1:8000"
cookies = {"user": "jane", "theme": "dark"}
try:
with requests.get(url, cookies=cookies, timeout=(5, 10)) as resp:
resp.raise_for_status()
print(resp.text)
except requests.exceptions.RequestException as e:
print(f"Request failed: {e}")
The Cookie header is a single semicolon-separated string
regardless of how many cookies are sent. The requests library
handles the serialisation, so the caller only needs to provide a dictionary.
The server reads the header as-is and echoes it back in the response body.
$ ./cookie_request.py Received cookies: user=jane; theme=dark
Receiving cookies set by the server
When the server wants to create a cookie in the client, it includes a
Set-Cookie header in its response. The requests
library parses these automatically and exposes them through
resp.cookies, a RequestsCookieJar that behaves like
a dictionary.
#!/usr/bin/python
import time
from http.server import BaseHTTPRequestHandler, HTTPServer
from email.utils import formatdate
class Handler(BaseHTTPRequestHandler):
def do_GET(self):
incoming = self.headers.get("Cookie")
if incoming:
body = f"Client sent cookies: {incoming}\n".encode()
self.send_response(200)
self.send_header("Content-Type", "text/plain; charset=utf-8")
self.send_header("Content-Length", str(len(body)))
self.end_headers()
self.wfile.write(body)
else:
ts = int(time.time())
expires = formatdate(ts + 3600, usegmt=True)
max_age = 3600
cookies = [
f"sessionid={ts}; Path=/; HttpOnly; Max-Age={max_age}; Expires={expires}",
f"theme=dark; Path=/; Max-Age={max_age}; Expires={expires}",
f"user=jane; Path=/; Max-Age={max_age}; Expires={expires}",
]
body = b"Cookies set send them back on the next request.\n"
self.send_response(200)
self.send_header("Content-Type", "text/plain; charset=utf-8")
self.send_header("Content-Length", str(len(body)))
for cookie in cookies:
self.send_header("Set-Cookie", cookie)
self.end_headers()
self.wfile.write(body)
if __name__ == "__main__":
with HTTPServer(("127.0.0.1", 8000), Handler) as server:
print("Listening on http://127.0.0.1:8000")
server.serve_forever()
The server distinguishes the two requests by checking whether a
Cookie header is present. On the first visit it responds with
three Set-Cookie headers, each carrying a different cookie along
with Max-Age and Expires attributes that tell a
real browser how long to keep them. On the second visit it simply echoes the
cookies back, confirming what the client returned.
Max-Age and Expires serve the same purpose —
limiting cookie lifetime — but Max-Age takes precedence in all
modern browsers when both are present. Expires is included here
for compatibility with older clients. The HttpOnly flag on the
session cookie instructs the browser not to expose it to JavaScript, which
reduces the risk of it being stolen via an XSS attack. The Path=/
attribute means the cookie is sent on every request to the server, not only
those under a specific sub-path.
#!/usr/bin/python
import requests
url = "http://127.0.0.1:8000"
try:
# First request — server sets cookies.
with requests.get(url, timeout=(5, 10)) as resp:
resp.raise_for_status()
print("First response:", resp.text)
print("Cookies received:")
for name, value in resp.cookies.items():
print(f" {name} = {value}")
# Second request — send the cookies back.
with requests.get(url, cookies=resp.cookies, timeout=(5, 10)) as resp2:
resp2.raise_for_status()
print("\nSecond response:", resp2.text)
except requests.exceptions.RequestException as e:
print(f"Request failed: {e}")
This script performs two GET requests to the server. The first one receives the cookies set by the server and prints them to the console. The second request sends the received cookies back to the server, which responds with a message confirming the cookies it got from the client.
$ ./cookie_set_request.py First response: Cookies set send them back on the next request. Cookies received: sessionid = 1778336578 theme = dark user = jane Second response: Client sent cookies: sessionid=1778336578; theme=dark; user=jane
Retrieving definitions from a dictionary
The following example scrapes word definitions from
dictionary.com by sending a GET request and parsing the returned
HTML with the lxml library. Because the site's markup can change
without notice, the script tries several XPath expressions in priority order,
falling back to broader selectors when the preferred ones yield nothing.
#!/usr/bin/python
import sys
import textwrap
import requests
from lxml import html
HEADERS = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64)"}
BASE_URL = "https://www.dictionary.com/browse/"
WRAP_WIDTH = 80 # matches the standard 80-column terminal width
# Tried in order; the first that yields text wins.
XPATHS = [
"//span[contains(@class,'one-click-content')]//text()",
"//*[contains(@data-testid,'definition')]//text()",
"//section[contains(@class,'definitions') or contains(@class,'css-pnw38j')]//text()",
"//main//p//text()",
"//meta[@name='description']/@content",
]
def fetch_html(term):
"""Return the parsed HTML tree for *term*, or raise on HTTP/network error."""
with requests.get(BASE_URL + term, headers=HEADERS, timeout=(5, 10)) as resp:
resp.raise_for_status()
return html.fromstring(resp.content)
def extract_definitions(root):
"""Try each XPath in XPATHS and return the first non-empty result set."""
for xpath in XPATHS:
texts = []
for node in root.xpath(xpath):
text = node.strip() if isinstance(node, str) else node.text_content().strip()
if text:
texts.append(text)
if texts:
return texts
# Last resort: first 40 non-empty text nodes anywhere inside main.
return [t.strip() for t in root.xpath("//main//text()") if t.strip()][:40]
def deduplicate(texts):
"""Split on newlines, drop fragments shorter than 4 chars, remove duplicates."""
seen = set()
out = []
for block in texts:
for part in (s.strip() for s in block.split("\n") if s.strip()):
if len(part) > 3 and part not in seen:
seen.add(part)
out.append(part)
return out
def main():
term = sys.argv[1] if len(sys.argv) > 1 else "dog"
try:
root = fetch_html(term)
except requests.exceptions.ConnectTimeout:
sys.exit("Connection timed out — check your network and try again.")
except requests.exceptions.ReadTimeout:
sys.exit("Server took too long to respond — try again later.")
except requests.exceptions.HTTPError as e:
sys.exit(f"HTTP error: {e}")
except requests.exceptions.ConnectionError as e:
sys.exit(f"Network error: {e}")
except requests.exceptions.RequestException as e:
sys.exit(f"Unexpected request error: {e}")
lines = deduplicate(extract_definitions(root))
if not lines:
sys.exit("No definition found — the site structure may have changed.")
for line in lines:
print(textwrap.fill(line, width=WRAP_WIDTH))
if __name__ == "__main__":
main()
The script is structured around three focused functions. fetch_html
owns the network layer — it sends the GET request with a browser-like
User-Agent header (required to avoid a bot block), calls
raise_for_status to surface HTTP errors immediately, and returns
a parsed lxml tree. Crucially, it passes resp.content
— the raw bytes — to html.fromstring rather than
resp.text. Passing decoded text strips the encoding declaration
that lxml needs to handle character sets correctly; the bytes form
leaves that decision to the parser.
extract_definitions works through XPATHS in priority
order, returning as soon as one expression yields results. Moving the XPath
list to a module-level constant makes it easy to add or reorder selectors
without touching any logic. When all named selectors fail — for example after
a site redesign — it falls back to the first 40 text nodes inside
<main>, giving enough context to identify what changed and
update the XPath list accordingly.
deduplicate splits each extracted block on newlines, discards
fragments of three characters or fewer (punctuation, stray labels), and
filters out any text already seen. This removes the navigation items, labels,
and repeated UI text that XPath broad-match expressions inevitably pick up
alongside the definitions.
Exception handling lives in main, where the appropriate response
to each failure — a message and a non-zero exit code via sys.exit
— is known. The inner functions stay clean and reusable as a result.
dictionary.com in particular
is a React application whose server-rendered HTML can vary by region, A/B test
bucket, or deploy. If the script stops returning definitions, inspect the live
page source and update XPATHS accordingly.
$ ./get_term.py ephemeral lasting a very short time; short-lived transitory The poem celebrates the ephemeral joys of childhood. ...
Python requests streaming
By default, requests downloads the entire response body into memory
before returning it to your code. For large files this can exhaust available RAM
and cause unnecessary delays before any processing begins. Setting
stream=True changes this behaviour: the body is not fetched
immediately, letting you consume it incrementally in chunks using
iter_content or line by line using iter_lines.
This makes streaming the correct approach whenever the response body is large (file downloads, database exports), potentially unbounded (live event feeds, log streams), or when you want to start processing early parts before the transfer is complete.
#!/usr/bin/python
from pathlib import Path
from urllib.parse import urlsplit
import requests
def download(url, dest=None, chunk_size=65_536, timeout=(5, 30)):
"""Stream *url* to disk, returning the Path of the saved file.
Args:
url: Remote URL to fetch.
dest: Local file path. Inferred from the URL when omitted.
chunk_size: Bytes per write cycle (default 64 KiB).
timeout: (connect, read) timeout in seconds.
Raises:
requests.HTTPError: Non-2xx response.
requests.Timeout: Connection or read deadline exceeded.
requests.ConnectionError: DNS failure or refused connection.
requests.RequestException: Any other transport-level failure.
OSError: Local filesystem write failure.
"""
dest = Path(dest) if dest else Path(Path(urlsplit(url).path).name or "download.bin")
with requests.get(url, stream=True, timeout=timeout) as r:
r.raise_for_status()
with dest.open("wb") as f:
for chunk in r.iter_content(chunk_size=chunk_size):
f.write(chunk)
return dest
url = "https://docs.oracle.com/javase/specs/jls/se25/jls25.pdf"
try:
path = download(url, "java25spec.pdf")
print(f"Saved to {path} ({path.stat().st_size:,} bytes)")
except requests.exceptions.ConnectTimeout:
print("Connection timed out — server did not accept the connection in time")
except requests.exceptions.ReadTimeout:
print("Read timed out — server stalled mid-transfer")
except requests.exceptions.HTTPError as e:
print(f"HTTP error: {e}")
except requests.exceptions.ConnectionError as e:
print(f"Network error: {e}")
except requests.exceptions.RequestException as e:
print(f"Unexpected request error: {e}")
except OSError as e:
print(f"Could not write file: {e}")
The download function deliberately does not catch exceptions
itself — it has no way of knowing whether the caller wants to retry, log,
alert, or abort. Instead, every failure mode is documented in the docstring
and handled at the call site, where that context exists.
The exception hierarchy moves from most specific to most general.
ConnectTimeout and ReadTimeout are caught before
the base RequestException so each can produce a distinct,
actionable message — a connect timeout typically means the host is
unreachable and retrying immediately is pointless, while a read timeout
mid-transfer may be worth retrying. OSError is kept separate
because it is a local filesystem failure, entirely unrelated to the network
layer, and may warrant a different response such as checking disk space.
$ ./streaming.py Saved to java25spec.pdf (5,331,364 bytes)
Python requests credentials
The auth parameter provides a basic HTTP authentication; it takes
a tuple of a name and a password to be used for a realm. A security realm
is a mechanism used for protecting web application resources.
$ sudo apt-get install apache2-utils $ sudo htpasswd -c /etc/nginx/.htpasswd user7 New password: Re-type new password: Adding password for user user7
We use the htpasswd tool to create a user name and a password
for basic HTTP authentication.
location /secure {
auth_basic "Restricted Area";
auth_basic_user_file /etc/nginx/.htpasswd;
}
Inside the nginx /etc/nginx/sites-available/default configuration file,
we create a secured page. The name of the realm is "Restricted Area".
<!DOCTYPE html> <html lang="en"> <head> <title>Secure page</title> </head> <body> <p> This is a secure page. </p> </body> </html>
Inside the /usr/share/nginx/html/secure directory, we have
this HTML file.
#!/usr/bin/python
import requests
user = 'user7'
passwd = '7user'
with requests.get("http://localhost/secure/", auth=(user, passwd)) as resp:
print(resp.text)
The script connects to the secure webpage; it provides the user name and the password necessary to access the page.
$ ./credentials.py <!DOCTYPE html> <html lang="en"> <head> <title>Secure page</title> </head> <body> <p> This is a secure page. </p> </body> </html>
With the right credentials, the credentials.py script returns
the secured page.
Source
In this article we have worked with the Python Requests module. The Requests library is a powerful and user-friendly HTTP client for Python. It allows you to send HTTP requests with ease, making it a popular choice for developers when working with web APIs and handling HTTP interactions in their applications.
Author
List all Python tutorials.