Changelog:

31 Mar 2023: add some hints regarding reading all data from a pipe
2 Apr 2023: be more explicit that parallelgetoutput requires reading from all processes at once and that therefore most likely the output will be interleaved
2 Apr 2023: specify that my_system implementation is on resources tab on collab
3 Apr 2023: include required NULL on execv’s in examples

Write and submit a single file fork_run.c that defines the following two functions, specified below:
- char *getoutput(const char *command)
- char *parallelgetoutput(int count, const char **argv_base)
You should not submit any other files (i.e., no .h files, Makefile, etc). You may include helper functions in the file, but it should not contain main.

1 `char getoutput(const char command)`

This should behave something like system, except that instead of letting the child print to stdout, it should collect what the child prints and return it as a \0-terminated malloc-allocated char *. If the command’s output includes a \0 itself, it should return your choice of the output up to the first \0 or the entire output (with an additional terminating \0).

I am not aware of a standard library function that does this, but if you find one do not use it; do this by forking, execing, and piping yourself.

We will supply an example implementation of my_system from the prior fork lab as a potential starting point sometime after the late deadline for that lab on the resources tab on Collab.

The following main function

int main() {
    printf("Hi!\n");
    printf("Text: [[[%s]]]\n", 
        getoutput("echo 1 2 3; sleep 2; echo 5 5"));
    printf("Bye!\n");
}

Should print

Hi!

then wait for 2 seconds before printing

Text: [[[1 2 3
5 5
]]]
Bye!

Note that this main also has a memory leak: my_getoutput invokes malloc and main is not invoking free.

To do this, use the following outline.

Create a pipe. A pipe looks like a pair of file decriptors, one opened for reading and the other for writing, and is a tool used extensively to help processes talk to each other. See man 2 pipe for details.

Make sure you invoke pipe before you invoke fork so that both processes have access to the same pipe.
In the child,
1. replace stdout with the write-end of the pipe. The dup2 command is used for this, copying one file descriptor with a new number. You want to copy the write-end of the pipe to 1, stdout.
2. close both of the pipe file descriptors. You don’t need the read end in the child, and the write-end is now duplicated as stdout.
3. exec, etc, as you did for my_system.
In the parent,
1. close the write end of the pipe – only the child needs that.
2. read all contents from the read end of the pipe, mallocing enough space to store it all.
3. close the read end of the pipe when you are done reading.
4. waitpid on the child after reading everything. If you wait before reading everything, then you are relying on the OS buffering the program’s output for you. This will work when the program’s output is not too long, but when the program has a lot of output, the program will wait for more buffer space to be available. Since if you waited first, you wouldn’t be reading to help free up some buffer space, the program will hang.

You may assume that the command exists and executes normally; no need to add any error-handling logic.

As a tip, one of the easiest ways to read everything there is to read is to use getdelim with the delimiter '\0'. getdelim wants a FILE *, not a file descriptor; see fdopen for how to wrap a file descriptor in a FILE *.

2 `char *parallelgetoutput(int count, const char **argv_base)`

Run count child processes simulatenously and collect their output into a single string, returning only after all the child processes have finished. Each of the child processes should run a command specified by the NULL-pointer-terminated array argv_base as follows:

the executable run shall be argv_base[0]
the arguments (argv value) for the program run shall be the elements of argv_base followed by the 0-based index of the child process converted to a string

The output collected may interleave the output from each of the child processes. [Added 2 April:] In order to ensure that the processes run in parallel, if they are writing to a pipe, you need to be reading from that pipe to prevent them from hanging (if they write too much). With multiple processes, the easiest way of doing this would be to use a single pipe for all the programs, which will result in their output being interleaved as if you ran them simulatenously in a terminal.

You may assume the executable argv_base[0] exists and executes normally and that we supply the full path of the executable.

Before returning, parallelgetoutput must waitpid for each child process.

For example, a main() like:

int main() {
    const char *argv_base[] = {
        "/bin/echo", "running", NULL
    };
    const char *output = parallelgetoutput(2, argv_base);

    printf("Text: [%s]\n", output);
}

would start two child processes. One of them would run something equivalent to:

const char *argv[] = {"/bin/echo", "running", "0", NULL};
execv("/bin/echo", argv);

And another would run something equivalent to:

const char *argv[] = {"/bin/echo", "running", "1", NULL};
execv("/bin/echo", argv);

Then, it would wait for both child processes to finish and collect their combined output into a single string output. On a system with a /bin/echo program like exists in portal, the output would probably be either:

Text: [running 0
running 1]

or:

Text: [running 1
running 0]

(but on some systems, maybe other interleaved outputs would be possible like:

Text: [running running 1
0]

)

Like with the getoutput example, the program above has a memory leak; output is dynamically allocated and never freed.

3 Hints

3.1 fork lab code

You may find it useful to consult your code for the fork lab. If you did not complete it, an example solution is available on Collab, under the resources tab.

3.2 Testing that you `waitpid`

If your getoutput and parallelgetoutput call waitpid properly, then after they return running

waitpid(-1, NULL, 0)

should return -1 and set errno to ECHILD (indicating that there are no child processes to wait for).

3.3 On reading all data from a pipe

The operating system maintains a limited amount of space to hold values which have been written to a pipe but not read yet. When it runs out of space, writes to the pipe will hang (until space is freed up by reading from the pipe). For this reason, you should not wait for one of the child processes to terminate before reading their output — otherwise, they might never terminate.
Reading from a pipe will result indicate end-of-file if the all the file descriptors referencing the write end of the pipe are closed. If the child processes have exited, then exiting will, as a side effect, close all their file descriptors. So, if you setup the pipes so the write end is only open in the child processes, then you can read the pipe until it indicates end-of-file.
For parallelgetoutput, I would strongly recommend using just one pipe. Since you need to read from a pipe to preevent processes producing a lot of output from hanging, if you have multiple pipes and read from then one at a time, it is very unlikely you’ll actually run them in parallel in general.

3.4 Some corner cases to test

I would recommend testing:

programs that output a lot (tens of kilobytes) at least.
programs that output something, then wait, then output another thing, then wait, etc. If you’re writing programs to simulate this, I recommend using fflush() or similar to make sure the output is sent when you expect it.

1 char *getoutput(const char *command)

2 char *parallelgetoutput(int count, const char **argv_base)