Spring 2024 — Assignment 3
Due 18-March, 12pm EST (before class)
Purpose:
- Be familiar with database queries
- Query data from a Database
- Write SQL queries using a combination of
basic, aggregate, joins, and subqueries
and interpret SQL to solve problems
If you use Google Cloud Platform (GCP):
- Hands-on experience with Google Cloud Platform —
Navigate and use the cloud services to create and host a database,
and write queries to solve problems
If you do the bonus points
- Analyze trends/patterns while exploring datasets
- Demonstrate understanding of results by understanding scope/relevance to real-life applications and
the strengths/weaknesses/limitations of a certain set of results
For this assignment you may work individually or with at most 4 partners (max group size = 5).
We strongly encourage you to work with partners as different query designers typically have different perspectives.
Besides, you can learn from each other.
Answer the following questions / complete the following tasks.
All homeworks are due before class on the due date.
Please remember that the UVA Honor Code is in effect.
Part 1: [Total: 100 points] Write SQL queries to solve the following problems
Use an SQL script
cardata.sql
to create and populate three tables of the US vehicle toy database.
(data from
https://www.kaggle.com/datasets/doaaalsenani/usa-cers-dataset?resource=download)
You may use phpMyAdmin
(locally or on the
CS server) or use
GCP SQL instance.
Write SQL queries to solve the following problems.
- (10 points) Find the number of silver vehicles per year.
List only vehicles after the year 2012.
Order the information by year.
- (10 points) Find all vehicle brands
with the registration years from 2018 to 2020
that are listed for the state of Virginia.
Do not repeat the names of the brands.
- (10 points) Find the vehicles with the "clean vehicle" title status,
listed in Virginia, with the price between $3,000 and $5,000 (inclusive).
Do not include vehicles with expired license conditions.
Display VINs and prices, and sort the information by price.
- (10 points) For each state,
find the number of vehicles with a price below the average price of their state
and the average mileage of these (below average price) vehicles.
Display the state, the number of these vehicles, and the average mileage.
- (10 points) For each year, find the ratio of average price vs. average mileage of the cars.
Display the years and the ratios, and sort the information by year.
Display only the year with a ratio of at least 0.1
- (10 points) Find the number of vehicles with prices above
the average price of their state
- (10 points) Find the brands of the vehicles
with "clean vehicle" title status listed in Virginia and Michigan.
Do not include vehicles with expired license conditions.
Do not repeat the names of the brands.
- (10 points) Find the brands and the average prices of the vehicles with
the "salvage insurance" title status.
Display only the brands with the average prices higher than 2000.
- (10 points) Find the differences in prices
between the least and most expensive vehicles in each state.
For each state, display the price difference.
Sort the information by the price differences from highest to lowest.
- (10 points) Find all vehicle brands that are not registered in Virginia.
Part 2: [10 bonus points] Data exploration and analysis
Let's practice data exploration and analysis using SQL!
For this section, the questions will be a bit more open-ended.
(Note: This is a bonus section of the assignment.
If you do not wish to work on this part
and have completed the SQL queries part,
then you have finished this assignment!)
- (Up to 5 points) Using the following listed criteria,
create at least 3 SQL queries on the
US vehicle toy database (see Part 1 of the SQL queries part, above)
that can be used for analysis.
You can create as many queries as you want
(they can be simple or complicated queries),
as long as all three of the following requirements are satisfied:
- One of your queries must use an aggregate function
(SUM, AVG, MIN, MAX, COUNT)
- One of your queries must make use of a join
(NATURAL JOIN, INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN)
- One of your queries must use a subquery (or subqueries)
The bonus points will be awarded based on the quality
(meaningful, reasonable, correctness, ...)
of your SQL queries.
Your SQL queries
must be *different* from the SQL queries in Part 1
and
different from the SQL queries for the
more practice
[sample solution] activity.
- (Up to 3 points) Now that you have all of the queried results,
what are some of the patterns or trends in the data?
Are there outliers that exist,
and if so,
what external factors do you think could be affecting these results?
Identify other areas of analysis, and
write a brief paragraph (5-7 sentences) with your findings.
The bonus points will be awarded based on the quality of your write-up
(meaningful, reasonable, precise, concise, ...)
- (Up to 2 points) Provide a visual representation of one of your findings.
This can be any type of graph/chart that best represents the data,
and provides insight into your analysis (above).
Make sure to label the visual with a title, axis labels and titles,
key (if needed), and measurement (if applicable).
The bonus points will be awarded based on the quality of your visual representation
and the information being conveyed
(meaningful, reasonable, consistent with the findings you discussed previously, ...)
Late submission
- (-25 points) for 24 hours late (submitted after 18-Mar 12pm (noon) EST, by 19-Mar 12pm (noon) EST)
- (-50 points) for 48 hours late (submitted after 19-Mar 12pm (noon) EST, by 20-Mar 12pm (noon) EST)
To grade and record your assignment efficiently,
the following details apply to all homework assignments,
unless otherwise specified.
- (-10 points) — For submitting a report in a Word document, handwriting, or hand drawing
What to submit
Part 1: Your SQL queries
- All of your SQL queries, saved as .sql file
- The teaching team will run your .sql file when grading.
- (-100 points) for not submitting .sql file.
Part 2:
If you do the bonus points, submit a report containing
- Your SQL queries
- Your analysis and findings
- A visual representation of one of your findings
Submission
Include all team member names and computingIDs in the report.
Part 1 — Save your answers as .sql file.
Part 2 (if you do the bonus points) — Save your report as a PDF —
No Word document. No handwriting.
Only typed PDF is acceptable.
Each team submits only one copy.
Upload your file(s) to Assignment 3 on Gradescope.
Make sure you connect your partner to your group on Gradescope so that everyone receives credit.
Verify that you submit the correct version of your file.
Making your submission available to the course staff is your
responsibility;
if we cannot access or open your file, we have to assign a zero grade.
Be sure to test access to your file before the due date.
Copyright © 2024 Upsorn Praphamontripong
Released under the
CC-BY-NC-SA 4.0 license.
Last updated 2024-02-29 9:31