CS 1112: Spring 2021

Class 17 — Friday March 12

Reading is fundamental

CVS Drugstore — Is not to be examined — CSV files yes

Look both ways

Back or ahead

Agenda

Dataset access and analysis

Downloads

Program what_color_will_my_chrysanthemums_be.py

Program csv_is_not_a_pharmacy.py

Program what_day_is_it.py

To do list

Review class artifacts

Complete homeworks

Prepare for Test 1 (03 / 17 / 2021)

Last class continued

Introduced the Python relational operators: <, <=, >, >=, ==, and !=.

Introduced the Python logical operators: and, or, not, in, and not in

Determined whether its input pH level is acidic; i.e., less than 7.0

Used relational operator < to help determine whether a soil pH sample was acidic.

Enter pH level: 6.5
True

Enter pH level: 7.0
False

Enter pH level: 7.5
False

Program what_color_will_my_chrysanthemums_be.py uses an if statement to help determine the color of a chrysanthemum based on soil pH.

Enter soil pH level: 6.5
pink

Enter soil pH level: 7
blue

Enter soil pH level: 7.5
blue

Program csv_is_not_a_pharmacy.py — streamlining getting a dataset

Some datasets for analyzing

elevations.csv

Location, Author, Max Height, Min Height
Narnia, Lewis, 4810, -10
Neverland, Milne, 426, -2
Oz, Baum, 1231, 679
Sleepy Hollow, Irving, 1629, 304
Stars Hollow, Sherman-Palladino, 725, 152
Toyland, MacDonough, 6187, 0
Wonderland, Carroll, 5895, -5

oceania.csv

Country, Females, Males
Australia, 11175724, 11092660
Fiji, 421365, 439258
French Polynesia, 132082, 138682
New Caledonia, 125322, 125548
New Zealand, 2223281, 2144855
Papua New Guinea, 3359979, 3498287
Solomon Islands, 259909, 278239
Vanuatu, 117573, 122078

Some program runs

Enter name of dataset: oceania.csv

dataset:
Country, Females, Males
Australia, 11175724, 11092660
Fiji, 421365, 439258
French Polynesia, 132082, 138682
New Caledonia, 125322, 125548
New Zealand, 2223281, 2144855
Papua New Guinea, 3359979, 3498287
Solomon Islands, 259909, 278239
Vanuatu, 117573, 122078

header:
['Country', 'Females', 'Males']

data:
['Country', 'Females', 'Males']
['Australia', 11175724, 11092660]
['Fiji', 421365, 439258]
['French Polynesia', 132082, 138682]
['New Caledonia', 125322, 125548]
['New Zealand', 2223281, 2144855]
['Papua New Guinea', 3359979, 3498287]
['Solomon Islands', 259909, 278239]
['Vanuatu', 117573, 122078]

Enter name of dataset: elevations.csv

dataset:
Location, Author, Max Height, Min Height
Narnia, Lewis, 4810, -10
Neverland, Milne, 426, -2
Oz, Baum, 1231, 679
Sleepy Hollow, Irving, 1629, 304
Stars Hollow, Sherman-Palladino, 725, 152
Toyland, MacDonough, 6187, 0
Wonderland, Carroll, 5895, -5

header:
['Location', 'Author', 'Max Height', 'Min Height']

data:
['Narnia', 'Lewis', 4810, -10]
['Neverland', 'Milne', 426, -2]
['Oz', 'Baum', 1231, 679]
['Sleepy Hollow', 'Irving', 1629, 304]
['Stars Hollow', 'Sherman-Palladino', 725, 152]
['Toyland', 'MacDonough', 6187, 0]
['Wonderland', 'Carroll', 5895, -5]

Program what_day_is_it.py

Analyze how President's day is and is not celebrated.

Data set of interest: wdii.csv

ID, State, GWBD, When
AK, Alaska, Presidents' Day, 3rd Monday in February
AL, Alabama, George Washington-Thomas Jefferson Day, 3rd Monday in February
AR, Arkansas, George Washington's Birthday and Daisy Gatson Bates Day, 3rd Monday in February
AZ, Arizona, Lincoln-Washington-President's Day, 3rd Monday in February
CA, California, President's Day, 3rd Monday in February
CO, Colorado, Presidents' Day, 3rd Monday in February
CT, Connecticut, Washington's Birthday, 3rd Monday in February
DE, Delaware, No holiday observed, Non-applicable
FL, Florida, No holiday observed, Non-applicable
GA, Georgia, Washington's Birthday, Day before Christmas
....
IN, Indiana, Washington's Birthday, Near Christmas or Thanksgiving
....
VA, Virginia, Washington's Birthday, 3rd Monday in February
....

Task

For a user-specified web source dataset counts the number of rows, where for a user-specied column of interest label, the column value equals a user-specified value key

Algorithm

What should it be?

Some sample program runs

Enter name of dataset: wdii.csv
Enter column of interest: GWBD
Enter column value of interest: No holiday observed
9

Enter name of dataset: wdii.csv
Enter column of interest: When
Enter column value of interest: 3rd Monday in February
38

Summarizing

If you are only analyzing the dataset, then the easiest way of accessing the dataset.

dataset = ... # get the dataset

... # set up dataset processing

for row in dataset : # consider rows one by one
# process current row of the dataset
... # process the row

... # finish off dataset

There are two basic ways of analyzing the individual cells in a dataset. Generally use the first way unless column index manners.

dataset = ... # get the dataset

... # set up dataset processing

for row in dataset : # consider rows one by one
# process current row of the dataset
... # prepare to process the cells

for cell in row : # consider cells of the row one by one
# process the current cell for the row
... # process the cell

... # finish off row analysis

... # finish off dataset analysis

dataset = ... # get the dataset

for row in dataset : # consider row one by one
# process current row of the dataset
nbr_columns = len( row ) # get number of columns in current row

... # prepare to process row's cells

# process the cells of the current row
for c in range( 0, nbr_columns ) : # consider row's column indices one by one
# process cell at row[ c ]
cell = row[ c ] # pick off the cell of the row
... # process the cell

... # finish off row analysis

... # finish off dataset analysis

Create a copy of a column of a dataset

dataset = ... # get the dataset
c = ... # get the column index

column_copy = [] # need a cell accumulator

for row in dataset : # consider rows one by one
# get cell from row's column c

cell = row [ c ] # get the cell in column c
column_copy.append( c ) # copy the cell into accumulator

... # analyze the column

Resources from previous semesters are available.

Class 17 — Friday March 12

Reading is fundamental

Look both ways

Agenda

Downloads

To do list

Last class continued

Program csv_is_not_a_pharmacy.py — streamlining getting a dataset

Some datasets for analyzing

Some program runs

Program what_day_is_it.py

Algorithm

Some sample program runs

Summarizing