Changelog:
- 23 August 2023: make examples consistent with prototype of
string_split
having anum_words
argument; make examples of command-line have newlines show up more consistently across browsers - 23 August 2023: clarify that
*num_words
should be set bystring_split
(not by its caller) - 23 August 2023: avoid omitting declarations of
result
andsize
from examples - 23 August 2023: link to CSO1 labs as possible reference in hints
- 25 August 2023: fix whitespace not being formatted properly on split example 1
- 25 August 2023: tweak wording of hint on realloc to account for not imply initializing the space is mandatory
- 29 August 2023: clarify that
main.c
should be possible to test with our ownsplit.c
- 31 August 2023: clarify in
Your task
that words can be zero-length (matching what is shown in the examples) - 1 Sep 2023: be more explicit about behavior for empty words at
beginning/end in
Your task
- 8 Sep 2023: clarify that no-hard-coding does not mean no-special-cases.
- 8 Sep 2023: give testing string_split with our own main as reason why everything string_split needs should be in split.c
1 Your task
Create C files as follows:
in
split.c
, write an implementation of the functionchar **string_split(const char *input, const char *sep, int *num_words);
The function should take a
\0
-terminated stringinput
and a\0
-terminated string listing separating characters insep
and return an array ofwords
(of zero or more characters) separated by a sequence of one or more of the characters insep
.string_split
should set*num_words
to the size of the array. The returned array and each of the strings in that array should be allocated so that they can be freed usingfree()
.([added 1 Sep:] As shown in the examples below, you should consider
input
to always start/end with aword
, even if that word is empty, and otherwise not generate empty words even when there are multiple consecutive characters fromsep
.)In addition:
- The function may not modify the input strings.
- The function may not leak memory. (When the function returns, the only newly allocated memory should be pointed to by the value returned.)
- We do not care what your function does if
input
is an empty string or ifsep
is an empty string. - We do not care what your function does if allocating memory fails (provided that your function does not try to allocate a huge amount of memory relative to the size of its arguments)
Some examples of using this function are shown below.
Your implementation may use additional utility functions, provided those are also in
split.c
(so we can easily test your split.c with our own main.c). It may also use any functions in the C standard library.in
split.h
, write an appropriate header file declaringsplit.c
and include#include
guards to protect against multiple inclusion.in
main.c
, write amain()
function that:constructs a value for
sep
by the command line arguments, or, if no command line arguments are provided, uses" \t"
. When there are command-line arguments, the value ofsep
should be the result of concatenating all the command line arguments. For example, if themain()
is compiled into an executablesplit
, then running./split a b c
or
./split abc
or
./split ab c
should choose a sep value of
abc
.
reads lines of input (without prompting). For each line, it should:
- exit if the line is one period (
.
) with no other text or whitespace; - otherwise call
string_split
with that input and the chosensep
value, then print out the resulting array with each word surrounded by square brackets ([
and]
) and without spaces between words, followed by a newline, then frees the resulting array.
- exit if the line is one period (
Your program must support lines of input of at least 4000 bytes. If an input line is more than 4000 bytes, your program may not access out of bounds memory (such as might cause a segfault), but otherwise we do not care what it does.
We intend to test your
main.c
with our ownsplit.c
, so please put any functions other thanstring_split
that it needs withinmain.c
.(Some examples of expected transcripts are shown below.)
Create a Makefile such that typing
make
will build (if necessary)split.o
,main.o
and link them into an executablesplit
.Make sure your C files do not produce any warnings when compiled with
-Og -g -std=c11 -Wall -pedantic
(assuming GCC or Clang). (It is okay if your Makefile uses different options.)Make sure your code does not have memory leaks or errors. We will test your code with AddressSanitizer enabled to help check for this (which can be enabled by compiling and linking with
-fsanitize=address
).We may use automated tests to assess your submission. Your solution may not hard-code the solution for any of the test cases or intentionally interfere with the testing environment. (By
hard-code the solution
, we mean code that looks for our specific test inputs (such as checking if the input contains theXfooX
because one of the test cases isXfooXbar
) rather than the more general situation being tested (such checking for separator characters at the beginning of the input); it’s not a prohibition on having special cases.)
2 Examples
2.1 string_split
Running
char **result; int size; result = string_split("foo", ":", &size);
should have the same effect as
char **result; int size; result = calloc(sizeof(char *), 1); result[0] = malloc(4); strcpy(result[0], "foo"); size = 1;
Running
char **result; int size; result = string_split("foo:bar:quux", ":", &size);
or
char **result; int size; result = string_split("foo:bar!quux", "!:", &size);
or
char **result; int size; result = string_split("foo:bar!quux", ":!", &size);
should have the same effect as
char **result; int size; result = calloc(sizeof(char *), 3); result[0] = malloc(4); strcpy(result[0], "foo"); result[1] = malloc(4); strcpy(result[0], "bar"); result[2] = malloc(5); strcpy(result[0], "quux"); size = 3;
Running
char **result; int size; result = string_split(":foo!:bar::quux!", ":!", &size);
should have the same effect as
char **result; int size; result = calloc(sizeof(char *), 5) result[0] = malloc(1); strcpy(result[0], ""); result[1] = malloc(4); strcpy(result[1], "foo"); result[2] = malloc(4); strcpy(result[2], "bar"); result[3] = malloc(5); strcpy(result[3], "quux"); result[4] = malloc(1); strcpy(result[4], ""); size = 5;
2.2 split
executable
In the following example transcripts bold represents
input typed in and $
represents the shell’s prompt:
2.2.1 Example 1
$ ./split
foo bar baz
[foo][bar][baz]
quux-no-space quux-with space !
[quux-no-space][quux-with][space][!]
indented
[][indented]
.
$
2.2.2 Example 2
$ ./split XY Z
fooXXXXbarZXYXYXZbazYYYYY
[foo][bar][baz][]
XXXXXXXXXXXXX
[][]
X.X
[][.][]
.
$
3 Hints
I found the C standard library functions
strspn
andstrcspn
useful in my solution.To avoid scanning the string multiple times, you can use
realloc
to change the size of a dynamically allocated array. (But note that you might need to initialize space made byrealloc
— it might not default to 0/NULL.)You may find referencing the CSO1 labs on using C helpful, see for example labs 9 through 12 of the Spring 2023 offering of CSO1