Changelog:

3 September 2019: Be explict that no part of pipelines with malformed commandsneed to be executed and that when other errors occur, one does not need to execute the non-erroneous parts of a pipeline.
6 September 2019: Provide updated version of shell_test.py that fixes two tests which were too sensitive to timing.
10 September 2019: Refer to make archive instead of make submit, since make archive is actually a target in the supplied Makefile.
10 September 2019: Rephrase text about outputting exit statuses to only require that just the printing of exit statuses needs to happen in the order commands appear in the pipeline rather than stating that commands must be waited for and their exit statuses printed in order. The rephrased text also makes it clearer that there’s a requirement on the order in which exit statuses are printed.

Note: The supplied tests before 6 September 2019 6pm had two tests (“three command pipeline without arguments where order matters”) which were too sensitive how quickly shells would print exit statuses for exited commands. You can download an updated shell_test.py to hopefully fix these tests. (Or if you downloaded the skeleton code after 6 September 2019 6pm, you should have this updated file.)

Your Task

Download the skeleton code last updated 2019-09-06, which contains:
- a starter Makefile which has a target to build a binary called msh
- a source file main.cc which implements a prompt that prints a “Not implemented” error on any command but “exit”.
Modify msh to be a Unix-like shell that supports running commands as described below, that:
- prompts for commands using > (greater than, followed by space)
- support running simple commands (e.g. /bin/cat foo.txt bar.txt)
- support input redirection from a file (e.g. commands like /usr/bin/gcc -E - < somefile.txt)
- support output redirection to a file (e.g. commands like /usr/bin/gcc -E file.cc > somefile.txt)
- support pipelines of multiple commands (e.g. commands like /bin/cat foo.txt | /bin/grep bar | /bin/grep baz or /bin/cat foo | /bin/grep baz > ouptut.txt)
- supports the builtin command exit
- outputs the exit status of each command as described below
- prints out error messages to stderr (e.g. via std::cerr) as described below
The sections below attempt to precisely describe the syntax of commands and the procedure to be followed when running commands.

We strongly recommend creating partially working versions such as described below. A full solution will be around 200 lines of code.

You may use additional source files, rename source files, switch from C++ to C, etc. so long as you modify the Makefile appropriately and still produce an executable called msh.
For the checkpoint, you must be able to run simple commands with no arguments (like /bin/something) and wait for them to finish before executing the next command and implement at least three of the following:
- printing the exit status of each command (like /bin/foo exit status: 4) or printing that a command terminated other than by exiting
- running commands with one or more arguments (/bin/something argument1 argument2)
- running commands with an input redirection (/bin/something < input.txt)
- running commands with an output redirection (/bin/something > output.txt)
- running a pipeline of two commands (/bin/first-program | /bin/second-program)
- running a pipeline of three or more commands (/bin/first-program | /bin/second-program | /bin/third-program)
For the checkpoint, we do not care how your program behaves from commands that you do not implement. For example, if you do not implement pieplines of two or more commands, you may reject all commands with a | token, or interpret the | as a normal argument, or something else.
We intend to test the shell you submit on a Linux enviornment similar to the course VM or to the department machines (e.g. portal.cs.virginia.edu). Please make sure your submitted shell will build and work in this environment.

To aid you in testing your final submission we have supplied a test in shell_test.py which can be run via make test.
Prepare a .tar.gz file like the one built using make archive in the given Makefile. This should have all your source files and a Makefile which can produce an msh binary. It should not contain any object files or a pre-built msh executable.
Submit the .tar.gz file on the submission site for the final submission or for the checkpoint submission

Specification

Shell language

Shell commands are lines which consist of a sequence of whitespace-seperated tokens. Whitespace characters are space (' ' in C or C++), form feed ('\f'), newline ('\n'), carriage return ('\r'), horizontal tab ('\t'), vertical tab ('\v').

Each line has a maximum length of 100 characters. We do not care what your shell does with longer input lines.

Each token is either

an operator, if it is < or > or |`, or
a word otherwise

Each line of inputs is a pipeline, which consists of a series of a commands (possibly just one) seperated by | tokens.

Each well-formed command consists of a series of (in any order):

up to one input redirection operation, which consists of a < token followed by a word token
up to one output redirection operation, which consists of a > token followed by a word token, and
one or more words (not part of any redirection operation), which are used to form the command for an exec system call

Any command which has a < or > operator not followed by a word token is malformed. Any command which has no word which is not part of a redirection operation is malformed. Any command which has more than one input redirection operation or more than one output redirection operation is malformed.

As a special case, you may optionally accept and execute no programs for a line containing no tokens, or you may treat it as a malformed command.

Running commands

To run a pipeline, the shell runs each command in the pipeline, then waits for all the commands to terminate. If any command in the pipeline is malformed, you may instead print an error and not execute any command in the pipeline.

To run a command, the shell:

checks the command is malformed according to the specification above
first checks if it is the built-in command listed below, and if so does something special.
forks off a subprocess for the command
if it is not the first command in the pipeline, connect its stdin (file descriptor 0) to the stdout of the previous command in the pipeline. The output should go to a pipe created by pipe() (see man pipe) before the subprocess was forked. (You may not, for example, create a normal temporary file, even if this sometimes works.)
if it is not the last command in the pipeline, connect its stdout (file descriptor 1) to the stdin of the next command in the pipeline
if there is an output redirection operation, reopen stdout (file descriptor 1) to that file. The file should be created if it does not exist, and truncated if it does.
if there is an input redirection operation, reopen stdin (file descriptor 0) from that file
uses the first word (ignoring words in redirection operations) as the pathof the executable run. The shell should not search the PATH environment variable.

Both redirection operations above must occur after connecting file descriptors for pipelines, so running a command like foo > test.txt | bar should result in foo’s standard output being the file test.txt and bar’s input being a pipe that immediately indicates end-of-file.

Any errors that occur executing or preparing to execute a command should result in error messages as described below. If an error occurs while preparing to execute a command, then the executable should not be run.

Outputting exit statues

After running each command in the pipeline and before prompting for the next command, the shell should wait for all commands in the pipeline and output their exit statuses. To output their exit statuses, for each command in the order the commands appeared in the pipeline, the shell should print out to stdout information about how the command terminated on its own line:

if the command terminates due to a signal (e.g. control-C), then print out a line starting with the name of the program, followed by any text of your choice that describes what happened;
otherwise, then output the name of the program, optionally followed by its arguments, followed by a space, then exit status: then another space, then the numerical exit status. For example, if running test/foo argument results in the foo executable calling exit(99) (where exit is the C exit function), then you may output test/foo exit status: 99 or test/foo argument exit status: 99.

Built-in command

Your shell should support the built-in command exit. When this command is run your shell should terminate normally (with exit status 0). You should not run a program called exit, even if one exists.

Handling errors

Your shell should print out all error messages to stderr (file descriptor 2).

You must use the following error messages:

If an executable does not exist, print an error message containing “Command not found” or “No such file or directory” (case insensitive). You may include other text in the error message (perhaps the name of the executable the user was trying to run) or print out additional messages in this case. Note that “No such file or directory” is what perror or strerror will output for an errno value of ENOENT, as will happen when execv is passed an executable path that does not exist. (See also the manual pages you can get by running man perror or man strerror.)
If a command is malformed according to the language above, print an error message containing “Invalid command” (case insensitive)

You must also print out an error message (but you may choose what text to output) if:

exec fails for another reason (e.g. executable found but not executable)
fork or pipe fail for any reason. Your program must not crash in this case.
opening a input or output redirected file fails for any reason.

If one command in a pipeline results in an error, you must print out at least one error message, but it is okay if other commands in the pipeline are executed even though some error messages are printed out (e.g. it’s okay if something_that_does_not_exist | something_real prints an error about something_that_does_not_exist after starting something_real). It is also acceptable for you to execute none of the commands in the pipeline in this case.

If multiple errors could occur while executing a command, then you may print out messages for any or all of the possible errors. For example, when running the command foo < input.txt > output.txt, if opening both input.txt and output.txt would fail, then you may output an error message about opening input.txt failing, about opening output.txt failing, or both.

Hints

Testing tool

We have supplied a shell_test.py program, which you can run by running make test. This will often produce a lot of output, so you might try redirecting its output to a file like with make test >test-output.txt.

We strongly advise against making changes and just looking at how it effects make test’s count of passed tests, except when you are very close to finish. Some tests may be looking for things which you could pass “by accident” (e.g. not producing excess output in some circumstances, which both a very incomplete or buggy but complete implementation mgiht do), so you must look at what tests in particular you are failing.

Note that we may use different tests when we grade your submission. The supplied tests are intended to help you test your final submission and not to substitute for doing your own testing.

The supplied Makefile builds with AddressSanitizer to help you detect memory errors, such as out-of-bounds accesses and memory leaks (at the cost of making your shell run a bit slower). You should expect any submission with memory errors not to get full credit (even if you disable AddressSanitizer in the Makefile you submit).

A possible order of operations

(This roughly matches how I implemented my reference solution.)

Implement and test parsing commands into whitespace-separated tokens. Collect the tokens into an array or vector to make future steps easier.
Implement and test running commands without pipelines or redirection. In this case the list of tokens you have will be exactly the arguments to pass to the command.
Add support for redirection.
Add support for pipelines.

Parsing

In C++, you can read a full line of input with std::getline. In C, you can read a full line of input with ``fgets`.
In C++, one way to divide a line of input into tokens is by using std::istringstream like
```
std::istringstream s(the_string);
while (s >> token) {
    processToken(token);
}
```
In C, one way to divide a line of input into tokens is by using strsep like
```
char *p = the_string;
char *token;
while ((token = strsep(&p, " \t\v\f\r\n")) != NULL) {
    ...
}
```
Note that strsep changes the string passed to it.
My reference implementation creates a class to represent each command in a pipeline (|-seperated list of things to run), and a class to represent the pipeline as a whole. I first collect each command line into a vector of tokens, then iterate through that vector to create and update command objects.
Our specification does not require redirection operations to appear in any particular place in commands. This means, for example, that
```
foo bar < input.txt
```
and
```
foo < input.txt bar
```
and
```
< input.txt foo bar
```
are all equivalent.

Running commands

Pseudocode for running comands is as follows:

for each command in the line {
    pid = fork();
    if (pid == 0) {
        do redirection stuff
        execv ( command, args );
        oops, print out a message about exec failing and exit
    } else {
        store pid somewhere
    }
}
for each command in the line {
    waitpid(stored pid, &status);
    check return code placed in status;
}

To implement redirection, probably the most useful function is dup2, which can replace stdin (file descriptor 0) or stdout (file descriptor 1) with another file you have opened. When redirecting to a file, you will most commonly use open() to open the file, call dup2 to replace stdin or stdout with a copy of the newly opened file descriptor, then close() the original file descriptor. This occurs typically would be done just before the call to execve as in the pseudocode above.
To implement pipelines, the typical technique is to call pipe to obtain a connected pair of file descriptors, use dup2 to assign these to stdout or stdin, and close the original file descriptor just before calling execve.
To convert from a const char** or array of const char*s to the type expected by execv, you can use a cast like (char**) pointer_to_first_element.

Printing Error Messages

In C++, one way to print to stderr is using cerr, which works like cout:
```
#include <iostream>
...
std::cerr << "Some message.\n";
```

In C or C++, one way to print to stderr is using fprintf:

#include <stdio.h>
...
fprintf(stderr, "Some message.\n");

Common problems

My shell hangs

If pipelines hang, then a likely cause is neglecting to close the ends of the pipes the parent process.

(reads on the pipe will not indicate end-of-file until all of the write ends of the pipe are closed.)

My shell stops working after I run too many commands

A likely cause is running out of file descriptors by failing to close all file descriptors.

General sources for documentation

All the Unix functions have “manpages” (short for manual pages) retrieved with the man command. For example, to get documentation on the pipe() function, you can run, from within a Linux enviornment, run
```
man pipe
```
The man command retrieves documentation both for commands and C functions. If both a command and a function exist with the same name, man by default shows information about the command. For example, there is a command called write and a function called write. Running
```
man write
```
gives only the documentation about command. There are two ways to get the documentation about the function. One is to run
```
man -a write
```
which shows the documentation for all things named “write” — it will show the documentation for the command and then for the function. Alternately, the documentation retrieved by man is divided into “sections”. You can get a list of all the entries for a word like “write” along with their sections by running
```
man -k write
```
On my system this shows
```
write (1)            - send a message to another user
write (2)            - write to a file descriptor
write (1posix)       - write to another user
write (3posix)       - write on a file
```
The text in parenthesis are the section numbers/names. For example, you can access the write entry in section 2 using
```
man 2 write
```
Generally, Linux divides its documentation into sections as follows:
```
*  section 1: commands intended for normal users
*  section 1posix: commands, but shows the POSIX standard's description (things that should be the same on all Unix-like OSs) rather than Linux-specific information about a command
*  section 2: "low-level" functions that usually wrap a specific system call
*  section 3: other functions
*  section 3posix: functions, but shows the POSIX standard's description (things that should be the same on all Unix-like OSs) rather than Linux-specific information about a function
*  section 8: commands intended for system adminstrators
```

Contents

Your Task

Specification

Shell language

Running commands

Outputting exit statues

Built-in command

Handling errors

Hints

Testing tool

A possible order of operations

Parsing

Running commands

Printing Error Messages

Common problems

My shell hangs

My shell stops working after I run too many commands

General sources for documentation