Class 17 — Friday March 12
Reading is fundamental
CVS Drugstore — Is not to be examined — CSV files yes
Look both ways
Agenda
- Dataset access and analysis
Downloads
- Program csv_is_not_a_pharmacy.py
- Program what_day_is_it.py
To do list
- Review class artifacts
- Complete homeworks
- Prepare for Test 1 (03 / 17 / 2021)
Last class continued
- Introduced the Python relational operators:
<
,<=
,>
,>=
,==
, and!=
.
- Introduced the Python logical operators:
and
,or
,not
,in
, andnot in
- Determined whether its input pH level is acidic; i.e., less than 7.0
- Used relational operator
<
to help determine whether a soil pH sample was acidic.
Enter pH level: 6.5
True
Enter pH level: 7.0
False
Enter pH level: 7.5
False
- Program
what_color_will_my_chrysanthemums_be.py
uses anif
statement to help determine the color of a chrysanthemum based on soil pH.
Enter soil pH level: 6.5
pink
Enter soil pH level: 7
blue
Enter soil pH level: 7.5
blue
Program csv_is_not_a_pharmacy.py — streamlining getting a dataset
-
Some datasets for analyzing
Location, Author, Max Height, Min Height
Narnia, Lewis, 4810, -10
Neverland, Milne, 426, -2
Oz, Baum, 1231, 679
Sleepy Hollow, Irving, 1629, 304
Stars Hollow, Sherman-Palladino, 725, 152
Toyland, MacDonough, 6187, 0
Wonderland, Carroll, 5895, -5
Country, Females, Males
Australia, 11175724, 11092660
Fiji, 421365, 439258
French Polynesia, 132082, 138682
New Caledonia, 125322, 125548
New Zealand, 2223281, 2144855
Papua New Guinea, 3359979, 3498287
Solomon Islands, 259909, 278239
Vanuatu, 117573, 122078
-
Some program runs
Enter name of dataset: oceania.csv
dataset:
Country, Females, Males
Australia, 11175724, 11092660
Fiji, 421365, 439258
French Polynesia, 132082, 138682
New Caledonia, 125322, 125548
New Zealand, 2223281, 2144855
Papua New Guinea, 3359979, 3498287
Solomon Islands, 259909, 278239
Vanuatu, 117573, 122078
header:
['Country', 'Females', 'Males']
data:
['Country', 'Females', 'Males']
['Australia', 11175724, 11092660]
['Fiji', 421365, 439258]
['French Polynesia', 132082, 138682]
['New Caledonia', 125322, 125548]
['New Zealand', 2223281, 2144855]
['Papua New Guinea', 3359979, 3498287]
['Solomon Islands', 259909, 278239]
['Vanuatu', 117573, 122078]
Enter name of dataset: elevations.csv
dataset:
Location, Author, Max Height, Min Height
Narnia, Lewis, 4810, -10
Neverland, Milne, 426, -2
Oz, Baum, 1231, 679
Sleepy Hollow, Irving, 1629, 304
Stars Hollow, Sherman-Palladino, 725, 152
Toyland, MacDonough, 6187, 0
Wonderland, Carroll, 5895, -5
header:
['Location', 'Author', 'Max Height', 'Min Height']
data:
['Narnia', 'Lewis', 4810, -10]
['Neverland', 'Milne', 426, -2]
['Oz', 'Baum', 1231, 679]
['Sleepy Hollow', 'Irving', 1629, 304]
['Stars Hollow', 'Sherman-Palladino', 725, 152]
['Toyland', 'MacDonough', 6187, 0]
['Wonderland', 'Carroll', 5895, -5]
Program what_day_is_it.py
- Analyze how President's day is and is not celebrated.
- Data set of interest: wdii.csv
ID, State, GWBD, When
AK, Alaska, Presidents' Day, 3rd Monday in February
AL, Alabama, George Washington-Thomas Jefferson Day, 3rd Monday in February
AR, Arkansas, George Washington's Birthday and Daisy Gatson Bates Day, 3rd Monday in February
AZ, Arizona, Lincoln-Washington-President's Day, 3rd Monday in February
CA, California, President's Day, 3rd Monday in February
CO, Colorado, Presidents' Day, 3rd Monday in February
CT, Connecticut, Washington's Birthday, 3rd Monday in February
DE, Delaware, No holiday observed, Non-applicable
FL, Florida, No holiday observed, Non-applicable
GA, Georgia, Washington's Birthday, Day before Christmas
....
IN, Indiana, Washington's Birthday, Near Christmas or Thanksgiving
....
VA, Virginia, Washington's Birthday, 3rd Monday in February
....
- Task
- For a user-specified web source dataset counts the number of rows, where for a user-specied column of interest label, the column value equals a user-specified value key
Algorithm
- What should it be?
Some sample program runs
Enter name of dataset: wdii.csv
Enter column of interest: GWBD
Enter column value of interest: No holiday observed
9
Enter name of dataset: wdii.csv
Enter column of interest: When
Enter column value of interest: 3rd Monday in February
38
Summarizing
- If you are only analyzing the dataset, then the easiest way of accessing the dataset.
dataset = ... # get the dataset
... # set up dataset processing
for row in dataset : # consider rows one by one
# process current row of the dataset
... # process the row
... # finish off dataset
- There are two basic ways of analyzing the individual cells in a dataset. Generally use the first way unless column index manners.
dataset = ... # get the dataset
... # set up dataset processing
for row in dataset : # consider rows one by one
# process current row of the dataset
... # prepare to process the cells
for cell in row : # consider cells of the row one by one
# process the current cell for the row
... # process the cell
... # finish off row analysis
... # finish off dataset analysis
dataset = ... # get the dataset
for row in dataset : # consider row one by one
# process current row of the dataset
nbr_columns = len( row ) # get number of columns in current row
... # prepare to process row's cells
# process the cells of the current row
for c in range( 0, nbr_columns ) : # consider row's column indices one by one
# process cell at row[ c ]
cell = row[ c ] # pick off the cell of the row
... # process the cell
... # finish off row analysis
... # finish off dataset analysis
- Create a copy of a column of a dataset
dataset = ... # get the dataset
c = ... # get the column index
column_copy = [] # need a cell accumulator
for row in dataset : # consider rows one by one
# get cell from row's column c
cell = row [ c ] # get the cell in column c
column_copy.append( c ) # copy the cell into accumulator
... # analyze the column
🦆 © 2022 Jim Cohoon | Resources from previous semesters are available. |