Web pages
Our introduction to interacting with the web is intentionally simple. Industrial-strength web applications also require familiarity with other and more powerful URL modules. There is an external library requests worth checking if you have further interest.
In the explanation that follows
- Variable
stream
references an object of typeHTTPResponse
. See below for a brief introduction toHttpResponse
.
- In the documentation below variable
link
has value
'http://www.cs.virginia.edu/~cs1112/datasets/words/most-misspelled'
It is a partial list of the most misspelled words in Google searches.
- A sample program reading from the web would be most_misspelled.py.
Module urllib.request
Overview
- Module
urllib
is is package of modules for working with URLs. Our only interest is with its modulerequest
.
- See urllib for the Python standard documentation
- Module
request
supports establishing a connection from a program to web resource
- See urllib.request for the Python standard documentation
How to get access
import urllib.request
Essential function (for us)
urllib.request.urlopen( link )
- Returns a connector providing access to the URL resource (think web page) named by string
link
.
- The fact it returns a connector is all we care about.
- If you need to know what the type of the connector, it is
HTTPResponse
, whereHTTPResponse
is a module in packagehttp.client
.
- Sample usage
stream = urllib.request.urlopen( link )
The assignment establishes
stream
to be a connection from your program to the URL resource specified bylink
.
Module http.client.HTTPResponse
Overview
- Supports client side interactions with a URL resource.
- See http.client for the Python standard documentation on
http.client
andhttp.client.HTTPResponse
How to get access
import http.client.HTTPResponse
Essential function (for us)
read()
- Returns the contents of a url data stream as an encoded string.
- Sample usage:
stream = urllib.request.urlopen( link )
page = stream.read()
Sets string
page
to be the encoded contents of the url resource named bylink
.
- The contents a
read()
-produced encoded string can be decoded withstr
methoddecode()
,
text = page.decode( 'UTF-8' )
Sets
text
to be the decoded contents of the url resource named bylink
.
- For our purposes, a sample program reading from the web would be most_misspelled.py.
# get access to needed web support
import urllib.request
# where is our page of interest
link = 'http://www.cs.virginia.edu/~cs1112/datasets/words/most-misspelled'
# establish a connection from our program to the web resource
stream = urllib.request.urlopen( link )
# get web source contents
page = stream.read()
# decode to standard text
text = page.decode( 'UTF-8' )
# print the result
print( text )
produces
cancelled
desert
gray
pneumonia
vacuum
appreciate
beautiful
definitely
diarrhea
leprechaun
maintenance
neighbor
© 2019 Jim Cohoon | Resources from previous semesters are available. |