Spring 2024 — Assignment 3

Due 18-March, 12pm EST (before class)
Purpose: If you use Google Cloud Platform (GCP): If you do the bonus points

For this assignment you may work individually or with at most 4 partners (max group size = 5). We strongly encourage you to work with partners as different query designers typically have different perspectives. Besides, you can learn from each other.

Answer the following questions / complete the following tasks. All homeworks are due before class on the due date. Please remember that the UVA Honor Code is in effect.


Part 1: [Total: 100 points] Write SQL queries to solve the following problems

Use an SQL script cardata.sql to create and populate three tables of the US vehicle toy database. (data from https://www.kaggle.com/datasets/doaaalsenani/usa-cers-dataset?resource=download)

You may use phpMyAdmin (locally or on the CS server) or use GCP SQL instance.

Write SQL queries to solve the following problems.

  1. (10 points) Find the number of silver vehicles per year. List only vehicles after the year 2012. Order the information by year.
  2. (10 points) Find all vehicle brands with the registration years from 2018 to 2020 that are listed for the state of Virginia. Do not repeat the names of the brands.
  3. (10 points) Find the vehicles with the "clean vehicle" title status, listed in Virginia, with the price between $3,000 and $5,000 (inclusive). Do not include vehicles with expired license conditions. Display VINs and prices, and sort the information by price.
  4. (10 points) For each state, find the number of vehicles with a price below the average price of their state and the average mileage of these (below average price) vehicles. Display the state, the number of these vehicles, and the average mileage.
  5. (10 points) For each year, find the ratio of average price vs. average mileage of the cars. Display the years and the ratios, and sort the information by year. Display only the year with a ratio of at least 0.1
  6. (10 points) Find the number of vehicles with prices above the average price of their state
  7. (10 points) Find the brands of the vehicles with "clean vehicle" title status listed in Virginia and Michigan. Do not include vehicles with expired license conditions. Do not repeat the names of the brands.
  8. (10 points) Find the brands and the average prices of the vehicles with the "salvage insurance" title status. Display only the brands with the average prices higher than 2000.
  9. (10 points) Find the differences in prices between the least and most expensive vehicles in each state. For each state, display the price difference. Sort the information by the price differences from highest to lowest.
  10. (10 points) Find all vehicle brands that are not registered in Virginia.

Part 2: [10 bonus points] Data exploration and analysis

Let's practice data exploration and analysis using SQL! For this section, the questions will be a bit more open-ended. (Note: This is a bonus section of the assignment. If you do not wish to work on this part and have completed the SQL queries part, then you have finished this assignment!)

  1. (Up to 5 points) Using the following listed criteria, create at least 3 SQL queries on the US vehicle toy database (see Part 1 of the SQL queries part, above) that can be used for analysis. You can create as many queries as you want (they can be simple or complicated queries), as long as all three of the following requirements are satisfied:
    • One of your queries must use an aggregate function (SUM, AVG, MIN, MAX, COUNT)
    • One of your queries must make use of a join (NATURAL JOIN, INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN)
    • One of your queries must use a subquery (or subqueries)

    The bonus points will be awarded based on the quality (meaningful, reasonable, correctness, ...) of your SQL queries.

    Your SQL queries must be *different* from the SQL queries in Part 1 and different from the SQL queries for the more practice [sample solution] activity.

  2. (Up to 3 points) Now that you have all of the queried results, what are some of the patterns or trends in the data? Are there outliers that exist, and if so, what external factors do you think could be affecting these results? Identify other areas of analysis, and write a brief paragraph (5-7 sentences) with your findings.

    The bonus points will be awarded based on the quality of your write-up (meaningful, reasonable, precise, concise, ...)

  3. (Up to 2 points) Provide a visual representation of one of your findings. This can be any type of graph/chart that best represents the data, and provides insight into your analysis (above). Make sure to label the visual with a title, axis labels and titles, key (if needed), and measurement (if applicable).

    The bonus points will be awarded based on the quality of your visual representation and the information being conveyed (meaningful, reasonable, consistent with the findings you discussed previously, ...)


Late submission

To grade and record your assignment efficiently, the following details apply to all homework assignments, unless otherwise specified.


What to submit

Part 1: Your SQL queries Part 2: If you do the bonus points, submit a report containing

Submission

Include all team member names and computingIDs in the report.

Part 1 — Save your answers as .sql file.

Part 2 (if you do the bonus points) — Save your report as a PDF —  No Word document. No handwriting. Only typed PDF is acceptable.

Each team submits only one copy.

Upload your file(s) to Assignment 3 on Gradescope. Make sure you connect your partner to your group on Gradescope so that everyone receives credit.

Verify that you submit the correct version of your file.

Making your submission available to the course staff is your responsibility; if we cannot access or open your file, we have to assign a zero grade. Be sure to test access to your file before the due date.



Copyright © 2024 Upsorn Praphamontripong

Released under the Creative Commons License CC-BY-NC-SA 4.0 license.

Last updated 2024-02-29 9:31