Changelog:
- 15 Sep 2025: mention the
-gcompiler option when talking about address sanitizer in item 4 and in the hints.
1 Your task
Create C files as follows:
in
split.c, write an implementation of the functionchar **string_split(const char *input, const char *sep, int *num_words);You can see examples of what this function should do below.
The function takes a
\0-terminated stringinputand a\0-terminated string listing separating characters insep.The function interprets
inputinto a sequence ofwords
(which it returns as described in the next paragraph). These words are separated by one or more of the separating characters insep.inputshould be considered to always start and end with a word. If theinputstarts or ends with a separating character, this word will be empty (zero-length). Empty words must not be generated in any other circumstance.The function should return the words using a dynamically allocated array of
\0-terminated strings. It must possible to free the array by callingfree()on each element of the array and the array itself. The function must store the length of the array in*num_words.In addition:
- The function may not modify the input strings.
- The function may not leak memory. (When the function returns, the only newly allocated memory should be pointed to by the value returned.)
- We do not care what your function does if
inputis an empty string or ifsepis an empty string. - We do not care what your function does if allocating memory fails (provided that your function does not try to allocate a huge amount of memory relative to the size of its arguments)
Your implementation may use additional utility functions, provided those are also in
split.c(so we can easily test yoursplit.cwith our ownmain.c). It may also use library functions that are part of the C standard or the POSIX standard. This includes most functions you may be familiar with declared in<string.h>and<stdio.h>, with the notable exceptions ofstrsep,index, andrindex. You can determine if a function meets this criteria by looking for- the
CONFORMING TO
section of its manpage listing some version ofPOSIX
or C89 or C99 or C11 (or multiple of these); or - it appearing in this list of functions from the 2018 POSIX specification
Some examples of how
string_splitshould behave, that you can use to test your function, are shown below.in
split.h, write an appropriate header file declaringstring_split()and include#includeguards to protect against multiple inclusion.in
main.c, write amain()function that:constructs a value for
sepby the command line arguments, or, if no command line arguments are provided, uses" \t"[a space character followed by a tab character]. When there are command-line arguments, the value ofsepshould be the result of concatenating all the command line arguments. For example, if themain()is compiled into an executablesplit, then running./split a b cor
./split abcor
./split ab cshould choose a sep value of
"abc".reads lines of input (without prompting). For each line, it should:
- exit if the line is one period (
.) with no other text or whitespace (other than a trailing newline, which we will not consider part of the line); - otherwise call
string_splitwith that input (without any trailing newline or similar), and the chosensepvalue, then print out the resulting array with each word surrounded by square brackets ([and]) and without spaces between words, followed by a newline, then frees the resulting array.
- exit if the line is one period (
Some examples of expected transcripts from the
splitprogram are shown below.Your program must support lines of input lines of at least 5000 bytes. If an input line is more than 5000 bytes, your program may not access out of bounds memory (such as might cause a segfault), but otherwise we do not care what it does. Your
string_splitfunction may not have any arbitrary limits on the size of its arguments (but we do not care how it handles running out of memory).We will test your
main.cwith our ownsplit.c, so put any functions other thanstring_splitthat it needs withinmain.c.
Create a Makefile such that typing
makewill build (if necessary)split.o,main.oand link them into an executablesplit.Make sure your C files do not produce any warnings when compiled with
-D_XOPEN_SOURCE=700 -Og -g -std=c11 -Wall -pedantic(assuming GCC or Clang). (It is okay if your Makefile uses different options.)Make sure your code does not have memory leaks or errors. We will test your code with AddressSanitizer enabled to help check for this (which can be enabled by compiling and linking with
-fsanitize=address; you should also add-gso AddressSanitizer can show line numbers).Also, we will test your
string_splitby calling it with buffers that are just big enough for the arguments, so going just one byte out of bounds would be an error. (This may not match how yourmain.ccallsstring_split.)We may use automated tests to assess your submission. Your solution may not hard-code the solution for any of the test cases or intentionally interfere with the testing environment. (By
hard-code the solution
, we mean code that looks for our specific test inputs (such as checking if the input contains theXfooX
because one of the test cases isXfooXbar
) rather than the more general situation being tested (such as checking for separator characters at the beginning of the input); it’s not a prohibition on having less specific special cases.)Submit your solution to the submission site
2 Examples
2.1 string_split
Running
char **result; int size = ANY_VALUE; result = string_split("foo", ":", &size);should have the same effect as
char **result; int size; result = calloc(sizeof(char *), 1); result[0] = malloc(4); strcpy(result[0], "foo"); size = 1;Running
char **result; int size; result = string_split("foo:bar:quux", ":", &size);or
char **result; int size; result = string_split("foo:bar!quux", "!:", &size);or
char **result; int size; result = string_split("foo:bar!quux", ":!", &size);should have the same effect as
char **result; int size; result = calloc(sizeof(char *), 3); result[0] = malloc(4); strcpy(result[0], "foo"); result[1] = malloc(4); strcpy(result[1], "bar"); result[2] = malloc(5); strcpy(result[2], "quux"); size = 3;Running
char **result; int size; result = string_split(":foo!:bar::quux!", ":!", &size);should have the same effect as
char **result; int size; result = calloc(sizeof(char *), 5) result[0] = malloc(1); strcpy(result[0], ""); result[1] = malloc(4); strcpy(result[1], "foo"); result[2] = malloc(4); strcpy(result[2], "bar"); result[3] = malloc(5); strcpy(result[3], "quux"); result[4] = malloc(1); strcpy(result[4], ""); size = 5;
2.2 split
executable
In the following example transcripts bold represents
input typed in and $ represents the shell’s prompt:
2.2.1 Example 1
$ ./split
foo bar baz
[foo][bar][baz]
quux-no-space quux-with space !
[quux-no-space][quux-with][space][!]
indented
[][indented]
.
$
2.2.2 Example 2
$ ./split XY Z
fooXXXXbarZXYXYXZbazYYYYY
[foo][bar][baz][]
XXXXXXXXXXXXX
[][]
X.X
[][.][]
.
$
3 Hints
I found the C standard library functions
strspnandstrcspnuseful in my solution.To avoid scanning the string multiple times, you can use
reallocto change the size of a dynamically allocated array. (But note that you might need to initialize space made byrealloc— it might not default to 0/NULL.)You may find referencing the CSO1 labs on using C helpful, see for example labs 9 through 12 of the Spring 2023 offering of CSO1
When using AddressSanitizer or debuggers, if you want to get line numbers, local variables, etc., you may neeed to compile with
-gor a simlar option.In addition to the problems caught by AddressSanitizer, you might have memory errors that arise from assuming local variables are initialized in a particular way even though you don’t set them explicitly. With recent versions of GCC and with clang you can use the
-ftrivial-auto-var-init=patternflag to help catch this — this will make the compiler initialize any uninitialized local variables to a pattern likely to cause issues if it is used.