Assignment: RUST

Changelog:

  • 27 March 2025 10:30pm: Fix bugs in supplied code: not accepting correct numbers of arguments for some commands; not displaying second student associated with submisison correctly;
  • 27 March 2025 10:30pm: Correct description of support commands to match program; add example transcript
  • 27 March 2025 11:10pm: Reorganize hints to deemphasize smart pointer support that is probably not useful for this assignment as released
  • 27 March 2025 11:20pm: Correct description of use-after-free bug

Your Task

  1. Download the Rust program at link here. See below regarding compiling and running the program.

    The supplied program makes extensive use of unsafe, which should not be necessary. (It also, incidentally, has a use-after-free bug.)

    Fix the program to avoid using unsafe and avoid leaking memory, but support the same command-line interface.

  2. Create a readme.txt describing briefly your changes and whether you believe they have impcated the performance of the program.

  3. Submit your modified .rs files and readme.txt to the submission site.

Compiling + running the program

  1. You will need to install the Rust compiler and cargo. Probably the easiest way to do this is rustup.

  2. After extracting the .tgz file, run cargo build in the cs4630-rust-hw directory to (re)compile the program.

  3. To run the program, either run cargo run or run the executable file in target/debug/cs4630-rust-hw.

  4. When run, the program has accepts commands from its input as documented in the next section.

What the program does

This Rust program implements an in-memory data structure that tracks information for a hypothetical University class with one assignment which is submitted by partners.

Each submission is associatied with two students.

The program provides a command line interface including the following commands:

See below for an example usage session (including demonstrating a use-after-free problem).

Data structues in the program

      #[derive(Debug)]
pub struct Student {
    id: u32,
    sis_id: u32,
}

#[derive(Debug)]
pub struct Submission {
    grade: Option<u32>,
    student1: *mut Student,
    student2: *mut Student,
}

#[derive(Debug)]
pub struct Class {
    students: BTreeMap<u32, *mut Student>,
    graded_submissions: VecDeque<*mut Submission>,
    ungraded_submissions: VecDeque<*mut Submission>,
}

    
  1. These struct definitions all have #[derive(Debug)] so you can output a struct s’s values for debugging using something like println!("s contains {}", s); or println!("s contains {s}");.

  2. *mut Student and *mut Submission are raw pointers to Student and Submission respectively. To use them, the Rust code needs to use unsafe.

    Note that in this design, there can be multiple pointers to a Student object. This violates the object “ownership” idea we discussed where each object generally has a single owner. Thus, we cannot trivially replace these raw pointers with references. If we tried to replace the raw pointers with raw instances of the struct, then we would have the issue that we’d be storing multiple copies of Student objects for the same student ID. Updates to one of them would not be reflected in the others.

Some Rust usage notes

Documentation generally

  1. You can find documentation for Rust’s standard library at https://doc.rust-lang.org/stable/std/index.html and more general documentation for Rust at https://doc.rust-lang.org/stable/.

Debugging prints

  1. We’ve marked the structs we use with #[derive(Debug)].

    This means that, for example, given a Student object s you can output it for debugging with something like

    println!("s = {:?}", s);
    

    println! works similarly to printf. {:?} repreents to take an argument and output in the debugging format.

Option types and pattern matching

  1. Rust does not have a null, so to represent an object of type T or nothing Rust provides a special Option<T> type. Given some variable x of this type x can be None or can be Some(y) where y is a T.

  2. In the code we supply, we often need to extract the value from an Option<T> if it’s a Some(,,,) or print an error otherwise. One way we do this is with pattern matching syntax. Given an x that is either None or Some(...)

    match x {
        Some(y) => f(y),
        None => g(),
    }
    

    runs f on the contents of the Option if it’s non-empty (a Some) and runs g otherwise.

    Rather than explicitly writing out None, we can also use an _ to match the pattern of “anything else”.

  3. Often when accessing a value from a container we will end up with an Option<&T> — that is a value which either represents a reference to a T or represents nothing.

    If we want an actual copy of that value, we might need to further dereference it, as in:

    match x {
        Some(y) => {
            let copy = *y;
            f(copy)
        }
        None => {},
    }
    

Using VecDeque

  1. VecDeque is a double-ended list (similar to java.util.Deque in java). It supports efficiently adding or removing elements at the front and back and is implemented using arrays.

  2. Like Java, Rust supports generic types, so a VecDeque<T> contains objects of type T.

  3. The front() method (and back() method) returns an Option<&T>. Unlike Java, we cannot return null, so we use the Option type to represent the possibility that the container is empty. But like Java, we get a reference to the actual object stored in the container.

  4. The default version of methods to read elements of the deque return non-mutable references. If we want to modify a value stored in the deque we need to use mutable versions. For example, instead of front we could use front_mut.

Using BTreeMap

  1. BTreeMap is a map implemented with a balanced binary tree (similar to java.util.TreeMap in Java).

    Since it is kept as a balanced tree, the map is automatically sorted by its key.

  2. Like with VecDeque, BTreeMap is a generic type so BTreeMap<T, U> is a map containing keys of type T and values of type U.

  3. Also like VecDeque, most methods that read values in the map return Option types (since null values are not possible) and return references to the values in the container.

Some possible strategies

Having a single reference to Students and Submissions

  1. If Submission objects did not pointers or references to Students, then normally, the only thing holding Student objects would be the Class struct.

    If so, we could change the Class struct to hold Student objects in a way which marks them as owned by the Class struct.

    One option would be to have the Student objects stored directly in the BTreeMap:

    pub struct Class {
        students: BTreeMap<u32, Student>,
        ...
    }
    

    The reason the code does not start out this way is that we can’t do something like:

    pub struct Submission {
        ...,
        student1: &Student,
        student2: &Student,
    }
    ...
    submission.student1 = class.students.get(&12345);
    submission.student2 = class.students.get(&23456);
    

    unless the compiler can prove that class.students won’t be modified before submission goes out of scope. Otherwise, the compiler needs to be concerned that the student1 and student2 objects will be deleted or moved, making the references invalid.

    In our code, the compiler definitely could not prove that the student refernces are not moved/deleted, because they would be for some sets of commands we could run.

  2. To avoid having Submission objects reference students, we could instead have them refer to students by their ID numbers.

Using Rc

  1. Rather than replace pointers by ID numbers to lookup in a single table, we could use Rust’s smart pointers. Most likely in this case we’d use rust’s reference counting support.

    This allows for mutliple shared references to the same object, with use-after-free prevented by keeping a count of reference.

  2. Rust’s Rc, part of the std::rc module provides “smart pointers” to an object that implement reference counting.

    In this scheme, a “reference count” is kept alongside each object. Whenever a new “smart pointer” to an object is made, that count is incremented; whenever one is freed, that count is decremented. The object is freed when the count becomes zero.

  3. Reference counted “smart pointers” use the type Rc, which can be used like a normal read-only reference &T.

  4. Most releevantly the std::rc module provides the following functions:

    • Rc::new(object) — allocate space for object and its reference count and move object to that space. Then return an Rc "smart pointer" to that space and set the reference count to 1. When the smart pointer goes out of scope, the compiler will call `drop()` on the Rc object, which will decremet the reference count and free the space if the count became zero.
    • Rc::clone(&rc) — given an Rc "smart pointer" `rc`, create a new reference, incrementing the reference count.
  5. For this assignment, just Rc alone is probably sufficient. But, for slightly more complex tasks, using just Rc alone has some restrictions that we can workaround with additional tools:

    • it leaks memory when there are cyclic references (objects that point to an object that points to the original object). If necessary, we can deal with this using weak references as described below.
    • it doesn’t allow for changing the shared object. If necessary, we can deal with this with RefCell, which implements a runtime-checked version of Rust’s borrowing rules

Example session

      $ cargo run
Compiling cs4630-rust-hw v0.1.0 (/u/cr4bd/spring2025/cs4630/hw/rust/cs4630-rust-hw)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.08s
Running `target/debug/cs4630-rust-hw`
> new-student 1 10000
Added student #1
> new-student 2 20000
Added student #2
> new-student 3 30000
Added student #3
> new-student 4 40000
Added student #4
> new-student 4 40000
Added student #4
> new-student 5 50000
Added student #5
> new-submission 1 3
Created submission
> new-submission 1 2
Created submission
> new-submission 1 4
Created submission
> new-submission 2 5
Created submission
> new-submission 4 5
Created submission
> show
Student#1 [SIS ID#10000]
Student#2 [SIS ID#20000]
Student#3 [SIS ID#30000]
Student#4 [SIS ID#40000]
Student#5 [SIS ID#50000]
Graded submissions:
Ungraded submission [next-to-grade last]:
Submission for Student#4 [SIS ID#40000] and Student#5 [SIS ID#50000] [ungraded]
Submission for Student#2 [SIS ID#20000] and Student#5 [SIS ID#50000] [ungraded]
Submission for Student#1 [SIS ID#10000] and Student#4 [SIS ID#40000] [ungraded]
Submission for Student#1 [SIS ID#10000] and Student#2 [SIS ID#20000] [ungraded]
Submission for Student#1 [SIS ID#10000] and Student#3 [SIS ID#30000] [ungraded]
> grade-next 90
> grade-next 40
> grade-next 30
> show
Student#1 [SIS ID#10000]
Student#2 [SIS ID#20000]
Student#3 [SIS ID#30000]
Student#4 [SIS ID#40000]
Student#5 [SIS ID#50000]
Graded submissions:
Submission for Student#1 [SIS ID#10000] and Student#3 [SIS ID#30000] grade 90
Submission for Student#1 [SIS ID#10000] and Student#2 [SIS ID#20000] grade 40
Submission for Student#1 [SIS ID#10000] and Student#4 [SIS ID#40000] grade 30
Ungraded submission [next-to-grade last]:
Submission for Student#4 [SIS ID#40000] and Student#5 [SIS ID#50000] [ungraded]
Submission for Student#2 [SIS ID#20000] and Student#5 [SIS ID#50000] [ungraded]
> drop-student 4
Removed student '4'
> show
Student#1 [SIS ID#10000]
Student#2 [SIS ID#20000]
Student#3 [SIS ID#30000]
Student#5 [SIS ID#50000]
Graded submissions:
Submission for Student#1 [SIS ID#10000] and Student#3 [SIS ID#30000] grade 90
Submission for Student#1 [SIS ID#10000] and Student#2 [SIS ID#20000] grade 40
Submission for Student#1 [SIS ID#10000] and Student#1433903432 [SIS ID#5] grade 30
Ungraded submission [next-to-grade last]:
Submission for Student#1433903432 [SIS ID#5] and Student#5 [SIS ID#50000] [ungraded]
Submission for Student#2 [SIS ID#20000] and Student#5 [SIS ID#50000] [ungraded]
> exit

    

When just Rc isn’t enough…

reference counting cycles using weak references

  1. A problem with reference counting is that it leaks memory when there are cycles unless other measures are taken. For exampe, if A and B reference each other, then A will always have a reference count of at least 1 as long as B exists, and B will always have a reference count of at least 1 as long as A exists. As a result, neither will ever be freed by having their reference count reach zero.

  2. One solution to this issue would be to remove the references entirely from one of the objects in the cycle, tracking that information in some other way.

  3. Alternately, std::rc has a notion of “weak references”, which do not increment the reference count. This means that the reference count can go to zero, causing the object to be freed, while there is still a weak reference. To keep this from resulting in a use-after-free bug, dereferencing a weak reference checks whether the object has been freed. (This means that there is some additional bookkeeping information kept when an object is freed, but there is still a weak reference to it.)

    To create a weak reference one can called Rc::downgrade(&rc) where rc is an Rc object:

    let weak_ref: Weak<T> = Rc::downgrade(&rc);
    

    To use a weak reference one uses upgrade() as in

    let real_ref: Rc<T> = weak_ref.upgrade().expect("object was freed");
    

    In the example above upgade() returns an Option<> object, which either contains an Rc or is None. .expect("object was freed") extracts the Rc object if possible and otherwise crashes with the error message “object was freed”.

one reference at a time (RefCell)

  1. Rust’s RefCell, part of the std::cell module provides a “smart pointer” to an object that implements “dynamic borrowing”.

    The RefCell provides two important methods:

    • borrow_mut() — get a temporary mutable reference to the object, but first verify that there are no other references to the object in scope. With help from the compiler, the RefCell will automatically track when the temporary reference goes out of scope.
    • borrow() — get a temporary read-only reference to the object, but only if there are no temporary mutable references to the object in scope. With help from the compiler, the RefCell will keep track of when this reference goes out of scope.

    In order to track the whether there are currently references, a RefCell internally consists of a “flag” value, which is changed as “borrowed” references go in and out of scope, and a pointer.

  2. To give an example of a use for RefCell, this code will not compile because the compiler cannot prove that yes and no are not modified while being accesssed elsewhere:

    let mut yes: String = String::from("YES");
    let mut no: String = String::from("NO");
    let answer_1: &String = &yes;
    let answer_2: &String = &yes;
    let answer_3: &String = &no;
    
    println!("Answer #1: {}; Answer #2: {}; Answer #3: {}", answer_1, answer_2, answer_3);
    // output: Answer #1: YES; Answer #2: YES; Answer #3: NO
    
    // Compile error: `yes` is borrowed by `answer_1`, `answer_2`, cannot modify since still
    // needed for println! below
    yes.push_str("!");
    
    // Compile error: `no` is borrowed by `answer_3`, cannot modify since still needed for
    // println!() below
    no.push_str("!");
    
    println!("Answer #1: {}; Answer #2: {}; Answer #3 {}",
        answer_1, answer_2, answer_3);
    // intended output: Answer #1: YES!; Answer #2: YES!; Answer #3: NO!
    

    We can make the code work as intended using a RefCell

    let yes: RefCell<String> = RefCell::new(String::from("YES"));
    let no: RefCell<String> = RefCell::new(String::from("NO"));
    let answer_1: &RefCell<String> = &yes;
    let answer_2: &RefCell<String> = &yes;
    let answer_3: &RefCell<String> = &no;
    
    println!("Answer #1: {}; Answer #2: {}; Answer #3: {}",
        answer_1.borrow(), answer_2.borrow(), answer_3.borrow());
    // Answer #1: YES; Answer #2: YES; Answer #3: NO
    
    { 
        let mut yes_str = yes.borrow_mut();
        yes_str.push_str("!");
        let mut no_str = no.borrow_mut();
        no_str.push_str("!");
    }
    
    println!("Answer #1: {}; Answer #2: {}; Answer #3 {}",
        answer_1.borrow(), answer_2.borrow(), answer_3.borrow());
    // Answer #1: YES!; Answer #2: YES!; Answer #3: NO!
    
  3. RefCells provide a “workaround” for Rust’s immutablity rules. Notice in the exame above the yes and no RefCells are not marked mutable with mut, but we can still borrow_mut() from them.

  4. We can combine RefCell with Rc to create a shared reference to a mutable object, where the library checks that no one tries to write the object while another reference is being used to read it:

    let answer_1: Rc<RefCell<String>>;
    let answer_2: Rc<RefCell<String>>;
    {
        let yes: Rc<RefCell<String>> = Rc::new(RefCell:new(String::from("YES")));
        answer_1 = Rc::clone(&answer_1);
        answer_2 = Rc::clone(&answer_2);
        yes.borrow_mut().push_str("!");
    }
    // original yes out of scope, but that's okay because answer_1, answer_2 still in scope
    println!("1: {}; 2: {}", answer_1.borrow(), answer_2.borrow());