Changelog:

  • 23 August 2023: make examples consistent with prototype of string_split having a num_words argument; make examples of command-line have newlines show up more consistently across browsers
  • 23 August 2023: clarify that *num_words should be set by string_split (not by its caller)
  • 23 August 2023: avoid omitting declarations of result and size from examples
  • 23 August 2023: link to CSO1 labs as possible reference in hints
  • 25 August 2023: fix whitespace not being formatted properly on split example 1
  • 25 August 2023: tweak wording of hint on realloc to account for not imply initializing the space is mandatory
  • 29 August 2023: clarify that main.c should be possible to test with our own split.c
  • 31 August 2023: clarify in Your task that words can be zero-length (matching what is shown in the examples)
  • 1 Sep 2023: be more explicit about behavior for empty words at beginning/end in Your task
  • 8 Sep 2023: clarify that no-hard-coding does not mean no-special-cases.
  • 8 Sep 2023: give testing string_split with our own main as reason why everything string_split needs should be in split.c

1 Your task

  1. Create C files as follows:

    • in split.c, write an implementation of the function

       char **string_split(const char *input, const char *sep, int *num_words);

      The function should take a \0-terminated string input and a \0-terminated string listing separating characters in sep and return an array of words (of zero or more characters) separated by a sequence of one or more of the characters in sep. string_split should set *num_words to the size of the array. The returned array and each of the strings in that array should be allocated so that they can be freed using free().

      ([added 1 Sep:] As shown in the examples below, you should consider input to always start/end with a word, even if that word is empty, and otherwise not generate empty words even when there are multiple consecutive characters from sep.)

      In addition:

      • The function may not modify the input strings.
      • The function may not leak memory. (When the function returns, the only newly allocated memory should be pointed to by the value returned.)
      • We do not care what your function does if input is an empty string or if sep is an empty string.
      • We do not care what your function does if allocating memory fails (provided that your function does not try to allocate a huge amount of memory relative to the size of its arguments)

      Some examples of using this function are shown below.

      Your implementation may use additional utility functions, provided those are also in split.c (so we can easily test your split.c with our own main.c). It may also use any functions in the C standard library.

    • in split.h, write an appropriate header file declaring split.c and include #include guards to protect against multiple inclusion.

    • in main.c, write a main() function that:

      • constructs a value for sep by the command line arguments, or, if no command line arguments are provided, uses " \t". When there are command-line arguments, the value of sep should be the result of concatenating all the command line arguments. For example, if the main() is compiled into an executable split, then running

          ./split a b c

        or

         ./split abc

        or

         ./split ab c

        should choose a sep value of abc.

    • reads lines of input (without prompting). For each line, it should:

      • exit if the line is one period (.) with no other text or whitespace;
      • otherwise call string_split with that input and the chosen sep value, then print out the resulting array with each word surrounded by square brackets ([ and ]) and without spaces between words, followed by a newline, then frees the resulting array.

    Your program must support lines of input of at least 4000 bytes. If an input line is more than 4000 bytes, your program may not access out of bounds memory (such as might cause a segfault), but otherwise we do not care what it does.

    We intend to test your main.c with our own split.c, so please put any functions other than string_split that it needs within main.c.

    (Some examples of expected transcripts are shown below.)

  2. Create a Makefile such that typing make will build (if necessary) split.o, main.o and link them into an executable split.

  3. Make sure your C files do not produce any warnings when compiled with -Og -g -std=c11 -Wall -pedantic (assuming GCC or Clang). (It is okay if your Makefile uses different options.)

  4. Make sure your code does not have memory leaks or errors. We will test your code with AddressSanitizer enabled to help check for this (which can be enabled by compiling and linking with -fsanitize=address).

  5. We may use automated tests to assess your submission. Your solution may not hard-code the solution for any of the test cases or intentionally interfere with the testing environment. (By hard-code the solution, we mean code that looks for our specific test inputs (such as checking if the input contains the XfooX because one of the test cases is XfooXbar) rather than the more general situation being tested (such checking for separator characters at the beginning of the input); it’s not a prohibition on having special cases.)

2 Examples

2.1 string_split

  1. Running

    char **result;
    int size;
    result = string_split("foo", ":", &size);

    should have the same effect as

    char **result;
    int size;
    result = calloc(sizeof(char *), 1);
    result[0] = malloc(4);
    strcpy(result[0], "foo");
    size = 1;
  2. Running

    char **result;
    int size;
    result = string_split("foo:bar:quux", ":", &size);

    or

    char **result;
    int size;
    result = string_split("foo:bar!quux", "!:", &size);

    or

    char **result;
    int size;
    result = string_split("foo:bar!quux", ":!", &size);

    should have the same effect as

    char **result;
    int size;
    result = calloc(sizeof(char *), 3);
    result[0] = malloc(4);
    strcpy(result[0], "foo");
    result[1] = malloc(4);
    strcpy(result[0], "bar");
    result[2] = malloc(5);
    strcpy(result[0], "quux");
    size = 3;
  3. Running

    char **result;
    int size;
    result = string_split(":foo!:bar::quux!", ":!", &size);

    should have the same effect as

    char **result;
    int size;
    result = calloc(sizeof(char *), 5)
    result[0] = malloc(1);
    strcpy(result[0], "");
    result[1] = malloc(4);
    strcpy(result[1], "foo");
    result[2] = malloc(4);
    strcpy(result[2], "bar");
    result[3] = malloc(5);
    strcpy(result[3], "quux");
    result[4] = malloc(1);
    strcpy(result[4], "");
    size = 5;

2.2 split executable

In the following example transcripts bold represents input typed in and $ represents the shell’s prompt:

2.2.1 Example 1

$ ./split
foo     bar   baz
[foo][bar][baz]
quux-no-space quux-with space !
[quux-no-space][quux-with][space][!]
   indented
[][indented]
.
$

2.2.2 Example 2

$ ./split XY Z
fooXXXXbarZXYXYXZbazYYYYY
[foo][bar][baz][]
XXXXXXXXXXXXX
[][]
X.X
[][.][]
.
$

3 Hints

  1. I found the C standard library functions strspn and strcspn useful in my solution.

  2. To avoid scanning the string multiple times, you can use realloc to change the size of a dynamically allocated array. (But note that you might need to initialize space made by realloc — it might not default to 0/NULL.)

  3. You may find referencing the CSO1 labs on using C helpful, see for example labs 9 through 12 of the Spring 2023 offering of CSO1