Changelog:
- 31 Oct 2024: add requirement for HEAD + multiple request per connection support; remove requirement for particular error documents.
- 31 Oct 2024: add suggestions for testing with
curl
- 1 Nov 2024: add suggestions for testing with
nc
- 6 Nov 2024: add note about testing with web browsers
- 11 Nov 2024: note when talking about testing with
curl
, that the command PowerShell provides calledcurl
is not what we mean - 13 Nov 2024: consistently use
--request
forcurl
option and not its alias-X
; correct typo in getservname() instructions - 13 Nov 2024: point to RFC 9112 as a more specific HTTP/1.1 reference
- 15 Nov 2024: correct hex/decimal confusion in CRLF expansion footnote; (and 10:45pm) actually correct it to have the right order
1 Your Task
Using the standard Python socket library, and any data structure and text encoding-related standard Python libraries of your choice1, create a webserver. Your webserver must:
- set the SO_REUSEADDR socket option on its server socket
- be startable by running
python3 webserver.py 127.0.0.1 PORT
- listen on IP address 127.0.0.1, port number PORT, and look for files in
webroot
directory in the directory where it is run from - implement HTTP/1.1, where
- only the GET or HEAD methods is supported; requests using any other method always return a
405 Method Not Allowed
error. - GET requests for the path
/FOO
are handled as follows:- if
FOO
contains any/
s or the fileFOO
does not exist inwebroot
, then the webserver returns a404 Not Found
response whose body contains text of your choice. - if
FOO
isredirect-example
, returns a 301 Moved Permanently response with a Location header specifying /redirect-target.html and text of your choice as the response body - if
FOO
exists in webroot but is not readable, the webserver returns a403 Not Authorized
response whose body contains text of your choice. - otherwise, the webserver returns
200 OK
response whose body is the contents of the fileFOO
in webroot
- if
- HEAD requests for the path
/FOO
are handled like GET requests, except no response body is returned. - When returning a file from a GET request, if its extension
html
orhtm
, theContent-Type
header is set totext/html
; if its extension istxt
, theContent-Type
header is set totext/plain
. Otherwise, theContent-Type
header may be omitted or set as you choose. - When returning a response with a message body, either include a correct
Content-Length
or use thechunked
transfer encoding for the message body. - Your server supports multiple requests in the same connection, as long as all of those requests are GET and HEAD requests.
- only the GET or HEAD methods is supported; requests using any other method always return a
Your webserver only need to handle one connection at a time.
Test your server, probably by using a utility like
curl
.Submit your webserver.py to the submission site.
2 References
RFC 9112 is the official specification for HTTP/1.1 (and RFC 9110 is an official specification for HTTP more generally). You can find a friendlier introduction in Section 9.1.2 of Computer Networks: A Systems Approach
The reference documentation for the Python socket library is here.
There is a friendlier introduction in the official Python socket programming HOWTO. If you remember the socket chat lab from CSO1, you may notice that since the Python socket library wraps the C API you used in CSO1, it follows the same structure.
The socket API shows creating server sockets using
socket.socket
andbind
; you might find it easier to use thecreate_server
utility function as more readable shorthand.
3 Testing your server
- If you are testing on a shared server and aren’t sure what ports are free, you can bind to port 0 to ask the OS to select port. Then you can use
ssock.getsockname()
(wheressock
is your server socket name) to find out which port the OS selected.
3.1 Using curl
You can use the Linux
curl
utility to make requests to your server. ([added 11 Nov 2024]: Confusingly, Windows PowerShell provides a command calledcurl
which is missing many featuers we use in the commands below, and so will not work.)For example:
curl http://127.0.0.1:12345/foo.html
will make a GET request for foo.html.
You can also:
get more information about what’s sent over the connection with the
--verbose
option:curl --verbose http://127.0.0.1:12345/foo.html
change the method of the request to HEAD with the –head option
curl --head http://127.0.0.1:12345/foo.html
make multiple requests by including multiple URLs on the command line:
curl --verbose http://127.0.0.1:12345/foo.html http://127.0.0.1:12345/bar.html
and read curl’s output will indicate whether it reused the same connection or it reconnected.
change the method to something other than GET or HEAD with the –request option:
curl --request DELETE --verbose http://127.0.0.1:12345/someplace
and pass a request body while doing with the –data option:
curl --request POST --data 'This is the request body' --verbose http://127.0.0.1:12345/someplace
3.2 Using nc
You can use the
nc
utility to connect to your server and send arbitrary data.If you run
nc -C 127.0.0.1 PORTNUMBER
(wherePORTNUMBER
is the port your server is running on), you can type a request like:GET / HTTP/1.1<enter> Host: 127.0.0.1<enter>> <enter>
and
nc
will send it, including CRLFs, and show you any response. On a Linux machine, you can type control-D to close the connection.You could also enter your request(s) into a text file and run a command like
nc -C 127.0.0.1 PORTNUMBER < some-text-file.txt
to send
some-text-file.txt
to the server, followed by closing the connection.
3.3 Using a web browser
Provided that your web browser is running on the same machine as your server, you should be able to go to
http://127.0.0.1:PORTNUMBER/name.html
(wherePORTNUMBER
is the port your server is running on) and make a GET request for/name.html
In most web browsers, you can use
developer tools
to see which HTTP requests are being made, what HTTP resposnes were received and other details about them. Usually you can access these tools by using the menu, then going toMore tools
, then to an item labeledDeveloper tools
orWeb Developer Tools
. After this, there will usually be aNetworking
tab on the developer tools that will show the relevant information (that fills in as you visit pages; it won’t show requests/resposnes retroactively).
4 Hints
4.1 Bytes in Python
The socket functions in Python return and expect
bytes
, notstr
(strings). (bytes
are composed of 8-bit bytes, butstr
s are composed of Unicode characters.)To get
bytes
instead ofstr
s:- open files in
binary
mode (for exampleopen('webroot/404.html', 'rb')
instead ofopen('webroot/404.html', 'r')
) - write constants like
b'foo'
instead of'foo'
- given a string
s
, use something likes.encode('UTF-8')
to convert it to abytes
object
If you need to convert from a
bytes
objectb
tostr
, you can do something likeb.decode('UTF-8', errors='replace')
.- open files in
4.2 Reading requests
When you call
recv
, it will read whatever bytes are available, this may or may not be a full request. You may need to call recv multiple times to read enough of a request to figure out what to do.Since a request’s headers are always terminated by two CRLFs2, I would recommend calling
recv
in a loop, accumulating the bytes received into a buffer, until the buffer contains a doubled CRLF. At that point, you would have a full set of request headers.When dealing with multiple requests, note that it’s possible that you can read parts of multiple requests in a single
recv
call.In my implementatoin, I dealt with this by adding the result of the
recv
calls to the end of a buffer. Then, I would check if that buffer contained a full request. If it did, I would remove that request from the beginning of the buffer, but keep the buffer around for the next request.