Fall 2025 — Assignment 4

Due 17-November-2025, 1pm EST (before class)
Purpose:

For this assignment, you may work individually or with at most 4 partners (max group size = 5). We strongly encourage you to work with partner(s) as different query designers typically have different perspectives. Besides, you can learn from each other.

Answer the following questions / complete the following tasks. All homework assignments are due before class on the due date. Please remember that the UVA Honor Code is in effect.


Note on handwriting and hand drawing

For this assignment, you may write your answers by hand ** neatly **

You may write RA expressions and/or draw RA trees by hand, take pictures, and embed them in your report. Do not submit the pictures separately. Submit a single report containing all of your answers, and save your report as a .pdf file.

** Draw / write legibly **. We have to deduct points if we are unsure what you drew/wrote.


[Total: 100 pts.]


Part 1 [70 points]: Relational algebra (RA)

Use the following schema, write RA queries to solve the problems.
Student (sID, sname, email)
Course (cID, insID, cname)
Instructor (insID, iname, email)
Grades (sID, cID, grade)
  1. (10 points) Find the IDs and names of all students who have taken some courses taught by an instructor whose ID is “ab3c”
  2. (10 points) Find the names and emails of all students who have taken the course with course ID "CS4750" Also display the grades they received in the CS4750.
  3. (10 points) Find the IDs of all students who have never received a grade “D” or “F”
  4. (10 points) Find the IDs and names of the courses taught by at least two different instructors
  5. (10 points) Find the IDs and names of all instructors and students
  6. (10 points) Find IDs and emails of all students who have received at least 2 D grades
  7. (10 points) Find IDs and names of all students who have taken all courses taught by an instructor whose ID is “ab3c”
    For example, an instructor “ab3c” teaches course IDs CS1111, CS2222, CS3333, and CS4444. 
        Minnie took CS1111, CS3333, CS4444.
        Mickey took CS1111, CS4444.
        Goofy took CS1111, CS2222, CS3333, CS4444.
        Daisy took CS2222, CS3333, CS4444. 
        Wacky took CS1111, CS3333, CS2222, CS4444.
        Humpty took CS5555, CS6666
        Dumpty took CS4444, CS5555 
    
    Your RA query would return only Goofy and Wacky 
    since they took all courses taught by an instructor “ab3c” 

Part 2 [30 points]: Cost estimation

  1. (10 points) Consider the relations P(a, b) and Q(b, c, d) with the following properties:
    T(P)        = 10,000   = total number of tuples in relation P
    V(P, a)     = 2000     = number of distinct values of attribute a in relation P
    V(P, b)     = 400      = number of distinct values of attribute b in relation P
    
    T(Q)        = 10,000   = total number of tuples in relation Q
    V(Q, b)     = 1,000    = number of distinct values of attribute b in relation Q
    V(Q, c)     = 5        = number of distinct values of attribute c in relation Q
    V(Q, d)     = 50       = number of distinct values of attribute d in relation Q 

    Draw an RA tree representing a logical plan of the given SQL query and then estimate the sizes (measured in number of tuples) of the result of the following query.

    For full credit, show all of your work — estimate the cardinality of each RA operator

    [You may draw the RA tree and compute the cost by hand, take a picture, and embed it in your report. Draw/write legibly; we have to deduct points if we are unsure what you drew/wrote.]

    SELECT *
    FROM   P NATURAL JOIN Q
    WHERE  P.a = '4750' AND Q.c = 'AAA'; 
    
    Assume: the conditions (P.a = '4750') and (Q.c = 'AAA') are independent. 
    Therefore, some rows with P.a = '4750' and some rows with Q.c = 'AAA' overlap but some do not.

  2. (20 points) Consider the relations instructor and department and the following statistics:
    instructor(ID, name, dept_name, salary)
       T(instructor)   = 13     #of tuples
       V(dept_name)    = 4      #of distinct values
    
    department(dept_name, building, budget)
       T(department)   = 5      #of tuples
       V(dept_name)    = 5      #of distinct values 
    1. (5 pts.) Write a SQL query to find the names and buildings of all departments that have more than two instructors working
    2. (10 pts.) Draw an RA tree representing a logical plan of the your SQL query and then estimate the number of tuples produced by the query.

      For full credit, show all of your work — estimate the cardinality of each RA operator

      Assume dept_name in instructor and department relations are fully overlap.

      [You may draw the RA tree and compute the cost by hand, take a picture, and embed it in your report. Draw/write legibly; we have to deduct points if we are unsure what you drew/wrote.]

    3. (5 pts.) Based on the above query what is the range of the number of instructors working in each department. For credits, explain your analysis or show your thought process.

Late submission

To grade and record your assignment efficiently, the following details apply to all homework assignments, unless otherwise specified.


Submission

Include all team member names and computingIDs in the report.

Save your report as a PDF.  No Word document.

Each team submits only one copy.

Upload your report as a PDF to the Assignment 4 on Gradescope. Make sure you connect your partner to your group on Gradescope so that everyone receives credit.

Making your submission available to the course staff is your responsibility; if we cannot access or open your file, we have to assign a zero grade. Be sure to test access to your file before the due date.



Copyright © 2025 Upsorn Praphamontripong
Released under the Creative Commons License CC-BY-NC-SA 4.0 license.
Last updated 2025-11-06 17:52