Class 13 — Wednesday September 23
A dataset by any other name is still a dataset (but it is not a set)
In a loop, a loop — Nesting, but not like a bird — Repeating again
Look both ways
Agenda
- Introduce web processing
- Introduce data sets
Downloads
- Program master_plan.py
- Program dataset_intro.py
For the fun of it
- Reveal one of your super power(s).
- Share a selfie.
Hacks of Kindness
- A hackathon with no experience needed. It is sponsored by UVA's Women in Computer Science organization. Register at their website
To do list
- Review class artifacts.
- Complete homework.
- Prepare for Test 1 (Friday, 10 / 2 / 2020)
Web pages
Our introduction to interacting with the web in CS 1112 is intentionally simple. Industrial-strength web applications also require familiarity with other and more powerful URL modules. There is an external library requests worth checking if you have further interest.
For now the only thing we is access to the module urllib.request
. The module supports working with URLs.
import urllib.request
- The only thing we care about in the modudle is its function
urllib.request.urlopen()
that returns a connector to a URL resource (think web page). Sample usage:
stream = urllib.request.urlopen( link )
- If you care (and I do not), officially the value returned by
urlopen()
is anhttp.client.HTTPResponse
.
- All we care about is that a
stream
returned byurlopen()
has a functionread()
to get the contents of the web resource indicated bylink
.
page = stream.read()
- The contents provided by
read()
is a string encoded in a web format rather than as regular text. We can be decode it with string functiondecode()
.
text = page.decode( 'UTF-8' )
The above assignment sets text to be the decoded contents of the url resource named by
link
; that istext
is a string equally the contents of the url resource indicted by;ink
.
- The four statements form a template for getting the contents of a URL resource in string format.
import urllib.request # get module access
stream = urllib.request.urlopen( link ) # open connector to the link web resource
page = stream.read() # read contents of the resource
text = page.decode( 'UTF-8' ) # decode contents as normal text string
- What happens next is problem-dependent.
Program master_plan.py
- Displays the word of the day from the CS 1112 web file
word-of-the-day
.
???
Datasets
- A dataset is a list whose elements are lists.
- Datasets are sometimes called tables or data sheets
- The elements of a two-dimensional dataset are called rows. The elements of a row are called data values or cells.
- Most of the datasets that we process will come from the web.
- The datasets acquired by programs are often CSV files; that is, the values are separated by commas.
- One of the CSV dataset we will consider is the best selling fictional books of all time
Program dataset_intro.py
- Demonstrates looping through a dataset in different ways
- Demonstrates column analysis of a dataset
Program run
table: [['A', 'B', 'C'], ['D', 'E', 'F'], ['G', 'H', 'I'], ['J', 'K', 'L', 'M']]
the table has 4 rows
row ['A', 'B', 'C'] has 3 columns
row ['D', 'E', 'F'] has 3 columns
row ['G', 'H', 'I'] has 3 columns
row ['J', 'K', 'L', 'M'] has 4 columns
row 0 : ['A', 'B', 'C']
row 1 : ['D', 'E', 'F']
row 2 : ['G', 'H', 'I']
row 3 : ['J', 'K', 'L', 'M']
row : A B C
row : D E F
row : G H I
row : J K L M
row 0 : ['A', 'B', 'C']
column 0 of row 0 : A
column 1 of row 0 : B
column 2 of row 0 : C
row 1 : ['D', 'E', 'F']
column 0 of row 1 : D
column 1 of row 1 : E
column 2 of row 1 : F
row 2 : ['G', 'H', 'I']
column 0 of row 2 : G
column 1 of row 2 : H
column 2 of row 2 : I
row 3 : ['J', 'K', 'L', 'M']
column 0 of row 3 : J
column 1 of row 3 : K
column 2 of row 3 : L
column 3 of row 3 : M
row 0 : A B C
row 1 : D E F
row 2 : G H I
row 3 : J K L M
Enter column of interest: 1
Column 1 cell: B
Column 1 cell: E
Column 1 cell: H
Column 1 cell: K
Column 1 : ['B', 'E', 'H', 'K']
© 2020 Jim Cohoon | Resources from previous semesters are available. |