Changelog:
- 31 Oct 2024: add requirement for HEAD + multiple request per connection support; remove requirement for particular error documents.
- 31 Oct 2024: add suggestions for testing with
curl - 1 Nov 2024: add suggestions for testing with
nc - 6 Nov 2024: add note about testing with web browsers
- 11 Nov 2024: note when talking about testing with
curl, that the command PowerShell provides calledcurlis not what we mean - 13 Nov 2024: consistently use
--requestforcurloption and not its alias-X; correct typo in getservname() instructions - 13 Nov 2024: point to RFC 9112 as a more specific HTTP/1.1 reference
- 15 Nov 2024: correct hex/decimal confusion in CRLF expansion footnote; (and 10:45pm) actually correct it to have the right order
1 Your Task
Using the standard Python socket library, and any data structure and text encoding-related standard Python libraries of your choice1, create a webserver. Your webserver must:
- set the SO_REUSEADDR socket option on its server socket
- be startable by running
python3 webserver.py 127.0.0.1 PORT - listen on IP address 127.0.0.1, port number PORT, and look for files in
webrootdirectory in the directory where it is run from - implement HTTP/1.1, where
- only the GET or HEAD methods is supported; requests using any other method always return a
405 Method Not Allowed
error. - GET requests for the path
/FOOare handled as follows:- if
FOOcontains any/s or the fileFOOdoes not exist inwebroot, then the webserver returns a404 Not Found
response whose body contains text of your choice. - if
FOOisredirect-example, returns a 301 Moved Permanently response with a Location header specifying /redirect-target.html and text of your choice as the response body - if
FOOexists in webroot but is not readable, the webserver returns a403 Not Authorized
response whose body contains text of your choice. - otherwise, the webserver returns
200 OK
response whose body is the contents of the fileFOOin webroot
- if
- HEAD requests for the path
/FOOare handled like GET requests, except no response body is returned. - When returning a file from a GET request, if its extension
htmlorhtm, theContent-Typeheader is set totext/html; if its extension istxt, theContent-Typeheader is set totext/plain. Otherwise, theContent-Typeheader may be omitted or set as you choose. - When returning a response with a message body, either include a correct
Content-Lengthor use thechunkedtransfer encoding for the message body. - Your server supports multiple requests in the same connection, as long as all of those requests are GET and HEAD requests.
- only the GET or HEAD methods is supported; requests using any other method always return a
Your webserver only need to handle one connection at a time.
Test your server, probably by using a utility like
curl.Submit your webserver.py to the submission site.
2 References
RFC 9112 is the official specification for HTTP/1.1 (and RFC 9110 is an official specification for HTTP more generally). You can find a friendlier introduction in Section 9.1.2 of Computer Networks: A Systems Approach
The reference documentation for the Python socket library is here.
There is a friendlier introduction in the official Python socket programming HOWTO. If you remember the socket chat lab from CSO1, you may notice that since the Python socket library wraps the C API you used in CSO1, it follows the same structure.
The socket API shows creating server sockets using
socket.socketandbind; you might find it easier to use thecreate_serverutility function as more readable shorthand.
3 Testing your server
- If you are testing on a shared server and aren’t sure what ports are free, you can bind to port 0 to ask the OS to select port. Then you can use
ssock.getsockname()(wheressockis your server socket name) to find out which port the OS selected.
3.1 Using curl
You can use the Linux
curlutility to make requests to your server. ([added 11 Nov 2024]: Confusingly, Windows PowerShell provides a command calledcurlwhich is missing many featuers we use in the commands below, and so will not work.)For example:
curl http://127.0.0.1:12345/foo.htmlwill make a GET request for foo.html.
You can also:
get more information about what’s sent over the connection with the
--verboseoption:curl --verbose http://127.0.0.1:12345/foo.htmlchange the method of the request to HEAD with the –head option
curl --head http://127.0.0.1:12345/foo.htmlmake multiple requests by including multiple URLs on the command line:
curl --verbose http://127.0.0.1:12345/foo.html http://127.0.0.1:12345/bar.htmland read curl’s output will indicate whether it reused the same connection or it reconnected.
change the method to something other than GET or HEAD with the –request option:
curl --request DELETE --verbose http://127.0.0.1:12345/someplaceand pass a request body while doing with the –data option:
curl --request POST --data 'This is the request body' --verbose http://127.0.0.1:12345/someplace
3.2 Using nc
You can use the
ncutility to connect to your server and send arbitrary data.If you run
nc -C 127.0.0.1 PORTNUMBER(wherePORTNUMBERis the port your server is running on), you can type a request like:GET / HTTP/1.1<enter> Host: 127.0.0.1<enter>> <enter>and
ncwill send it, including CRLFs, and show you any response. On a Linux machine, you can type control-D to close the connection.You could also enter your request(s) into a text file and run a command like
nc -C 127.0.0.1 PORTNUMBER < some-text-file.txtto send
some-text-file.txtto the server, followed by closing the connection.
3.3 Using a web browser
Provided that your web browser is running on the same machine as your server, you should be able to go to
http://127.0.0.1:PORTNUMBER/name.html(wherePORTNUMBERis the port your server is running on) and make a GET request for/name.htmlIn most web browsers, you can use
developer tools
to see which HTTP requests are being made, what HTTP resposnes were received and other details about them. Usually you can access these tools by using the menu, then going toMore tools
, then to an item labeledDeveloper tools
orWeb Developer Tools
. After this, there will usually be aNetworking
tab on the developer tools that will show the relevant information (that fills in as you visit pages; it won’t show requests/resposnes retroactively).
4 Hints
4.1 Bytes in Python
The socket functions in Python return and expect
bytes, notstr(strings). (bytesare composed of 8-bit bytes, butstrs are composed of Unicode characters.)To get
bytesinstead ofstrs:- open files in
binary
mode (for exampleopen('webroot/404.html', 'rb')instead ofopen('webroot/404.html', 'r')) - write constants like
b'foo'instead of'foo' - given a string
s, use something likes.encode('UTF-8')to convert it to abytesobject
If you need to convert from a
bytesobjectbtostr, you can do something likeb.decode('UTF-8', errors='replace').- open files in
4.2 Reading requests
When you call
recv, it will read whatever bytes are available, this may or may not be a full request. You may need to call recv multiple times to read enough of a request to figure out what to do.Since a request’s headers are always terminated by two CRLFs2, I would recommend calling
recvin a loop, accumulating the bytes received into a buffer, until the buffer contains a doubled CRLF. At that point, you would have a full set of request headers.When dealing with multiple requests, note that it’s possible that you can read parts of multiple requests in a single
recvcall.In my implementatoin, I dealt with this by adding the result of the
recvcalls to the end of a buffer. Then, I would check if that buffer contained a full request. If it did, I would remove that request from the beginning of the buffer, but keep the buffer around for the next request.