Back Python Chrestomathics Ahead |
Part 5 — String massage
Manipulating — Not something I like to do — But with strings it’s fine
Section 5.1: Stripping
Stripping to basics — Makes things easier to see — And patterns emerge
- Our focus now is cleaning-up, inspecting, and analyzing strings. We start with string method
strip()
. The function is useful when processing user-supplied data.
- When invoked with no arguments, method
strip()
produces a copy of the string with all leading and trailing whitespace removed. The following code from program wiped.py demonstrates input stripping.
# Get text
reply = input( "Enter text: " )
# Get a cleaned version of the text
s = reply.strip()
# Show the differences in the strings
print( "string of interest:", s, ":" )
print( " stripped version:", t, ":" )
- Suppose in reaction to the prompt, the user enters the following.
Enter text: scraunched is just one syllable
- Then the initial tracing of the program would be
Variable | Value |
reply | " scraunched is just one syllable " |
- The below assignment invokes the
strip()
function onreply
, and makess
a string without any leading and trailing whitespace. The statment does not changereply
.
s = reply.strip()
- The tracing is now
Variable | Value |
reply | " scraunched is just one syllable " |
s | "scraunched is just one syllable" |
- Causing the print statements to produce the following.
string of interest: scraunched is just one syllable :
stripped version: scraunched is just one syllable :
Section 5.2: Proper casing
Who, WhaT, wHy, and whEN — WEIRD CaPiTaLiZaTion — Can be too glaring
- To make text processing simpler to analyze, programmers often put the text into a specific form (e.g., all lower case). Python supplies three string methods to get altered casing.
lower()
: returns a copy of its string with all alphabetic characters in lower case.
upper()
: returns a copy of its string with all alphabetic characters in upper case.
capitalize()
: returns a copy of its string with all alphabetic characters in lower case, except the first character of the string, which if alphabetic is in upper case.
- Program thats_amoray.py examines a string
s
equal to"When an Eel Climbs a Ramp to Eat Squid From a Clamp, That’s a Moray"
and produces three different capitalizations of the string. The initialization fors
is the title of a 2021 NY Times article that spoofs the love song That’s Amore.
© Capital Records
- The program code is
# Set string of interest
s = "When an Eel Climbs a Ramp to Eat Squid From a Clamp, That’s a Moray"
# Get different capitalization versions
v1 = s.lower() # Get copy with letters in lower case
v2 = s.upper() # Get copy with letters in upper case
v3 = s.capitalize() # Get copy with first character upper case, rest
# lower case
# Print results
print( "s:", s )
print()
print( "s.lower():", v1 )
print( "s.upper():", v2 )
print( "s.capitalize():", v3 )
- Its output is below. See that non-alphabetic characters are left alone.
s: When an Eel Climbs a Ramp to Eat Squid From a Clamp, That’s a Moray
s.lower(): when an eel climbs a ramp to eat squid from a clamp, that’s a moray
s.upper(): WHEN AN EEL CLIMBS A RAMP TO EAT SQUID FROM A CLAMP, THAT’S A MORAY
s.capitalize(): When an eel climbs a ramp to eat squid from a clamp, that’s a moray
Section 5.3: Finding
Finding what you need — Takes an instant or lifetime — Me: love at first sight
- Once text is cleaned up, it often needs to be searched. The primary string searching method is
find()
. The function looks forward into the string for a substring of interest. There is an analogous function,rfind()
, that looks backward into a string.
- Like many of the string methods, function
find()
has flexibility. Thefind()
function can handle one or more arguments. We limit our interest to when there are either one or two arguments. The initial argument is always the substring being looked for. The below code demonstratingfind()
is from program finding.py.
- If only one argument is given to method
find()
, the function returns where to find the index (location) of the first occurrence of that substring argument.
s = "can anyone"
t = "an"
i1 = s.find( t ) # Finds first occurrence of string t within s
- If there is a second argument, then it is an integer index indicating where to start looking. The function returns the index of the first occurrence of the substring argument when starting from the indicated index.
i2 = s.find( t, i1 +1 ) # Finds first occurrence of string t within s
# starting at index i1 + 1
- If the
find()
function cannot find the substring argument, it returns -1.
- The use of index to indicate a location in a string was deliberate. Python follows the convention of its ancestor languages and starts numbering index locations at 0. So, characters c, y, and e in the string
"can
anyone"
are at indices 0, 6, and 9 respectively.
- A gotcha to many programmers is forgetting that indices start at 0. A particular problem is getting the index of the last character — if a string has length n, then the last character is not at index n, but at index n-1.
- The following code from finding.py attempts to examine a string
s
for the first three occurrences of a stringt
.
# Initialize strings
s = "can anyone"
t = "an"
# Find occurrences
i1 = s.find( t )
i2 = s.find( t, i1 + 1 ) # Start looking for next occurrence right after
i3 = s.find( t, i2 + 1 ) # where the previous occurrence started
# Print results
print( "s:", s )
print( " ==========" )
print( " 0123456789 <== indices into s" )
print( "t:", t )
print()
print( "s.find( t ):", i1 )
print( "s.find( t,", i1, "):", i2 )
print( "s.find( t,", i2, "):", i3 )
- The program output follows.
s: can anyone
----------
0123456789 <== indices into s
t: an
s.find( t ): 1
s.find( t, 1 ): 4
s.find( t, 4 ): -1
- Prior to the first search, the program tracing is.
Variable | Value |
s | "can anyone" |
t | "an" |
- The first search by program initializes variable
i1
.
i1 = s.find( t )
- Because the
find()
usage only has one argument, the function returns the index of the very first occurrence of substringt
ins
. Causing variablei1
to be set to 1.
Variable | Value |
s | "can anyone" |
t | "an" |
i1 | 1 |
- The second search by the program initializes variable
i2
.
i2 = s.find( t, i1 + 1 )
- The second search gives two arguments to
find()
. The first argument is the same substring of interest. The second argument is the value ofi1 + 1
. This argument causes the search to start looking immediately after the first substring occurrence. Resulting ini2
being set to 4.
Variable | Value |
s | "can anyone" |
t | "an" |
i1 | 1 |
i2 | 4 |
- The third search by the program initializes variable
i3
.
i3 = s.find( t, i2 + 1 )
- This search again gives two arguments to
find()
. The first argument is the same substring of interest. The second argument is the value ofi2 + 1
. This argument causes the search to start looking immediately after the second substring occurrence. Because there are no more occurrences of stringt
ins
, thefind()
function returns -1. Resulting ini3
being set to -1.
Variable | Value |
s | "can anyone" |
t | "an" |
i1 | 1 |
i2 | 4 |
i3 | -1 |
Section 5.4: Accessing
Need to get inside — Is easy with the right key — For strings it’s brackets
- Python provides the brackets operator
[]
for accessing a substring within a string. In its simple form, a single index is given within the brackets. When used this way, the operation produces a single character. For example,
c = reply[ i ]
- The above operation returns the character at index
i
ofreply
. That value is then used to set variablec
.
- The following code from program cat.py makes three uses of the string brackets operator. The program begins by getting three indices from its user. To get indices in numeric form, the input must be split up and then casted to integers.
# Get three indices of user interest
reply = input( "Enter three indices: " )
i1, i2, i3 = reply.split()
i1 = int ( i1 )
i2 = int ( i2 )
i3 = int ( i3 )
- Suppose the user supplied the indices 0, 8, and 9.
Enter three indices: 0 8 9
- A trace of the program so far would be
Variable | Value |
reply | "0 8 9" |
i1 | 0 |
i2 | 8 |
i3 | 9 |
- The program then uses those indices to peak into its target string
"chrestomathics"
.
# Initialize target string
s = "chrestomathics"
# Use the indices to pick off single characters from s
c1 = s[ i1 ] # Grab character at index i1 in s
c2 = s[ i2 ] # Grab character at index i2 in s
c3 = s[ i3 ] # Grab character at index i3 in s
- A trace of the program to this point would be
Variable | Value |
reply | "0 8 9" |
i1 | 0 |
i2 | 8 |
i3 | 9 |
s | "chrestomathics" |
c1 | "c" |
c2 | "a" |
c3 | "t" |
- The program next uses the characters to build a new string.
# Use the characters to make a new string
t = c1 + c2 + c3
- The tracing is now
Variable | Value |
reply | "0 8 9" |
i1 | 0 |
i2 | 8 |
i3 | 9 |
s | "chrestomathics" |
c1 | "c" |
c2 | "a" |
c3 | "t" |
t | "cat" |
- The program completes by printing the result.
# Print result
print( t )
- Resulting in the output
cat
- Here are two more possible program
cat.py
interactions. I strongly suggest that you trace through the code with these inputs.
Enter three indices: 13 1 3
she
Enter three indices: 2 8 7
ram
Section 5.5: Slicing
A slice of a pie — Fresh just out of the oven — Is a dream come true
- The brackets operator
[]
can be used to grab a substring of characters when coupled with the colon operator (:
).
- Program slice_of_pie.py grabs four substrings from a user-supplied string. The substrings are determined by two user-specified indices.
- The program starts with getting inputs and converting the indices into numeric form.
# Get string of choice
s = input( "Enter favorite pie: " )
# Get indices i and j
t = input( "Enter two indices i and j: " )
i, j = t.split()
i = int( i )
j = int( j )
- Suppose the user indicates the following input values.
Enter favorite pie: chocolate
Enter two indices i and j: 3 7
- When this section completes our tracing would be
Variable | Value |
s | "chocolate" |
i | 3 |
j | 7 |
- Next comes the slicing. The presence of a colon inside a pair brackets indicates that a substring rather than a single character is wanted.
- If there is an index in front of the colon, the substring starts at the indicated index.
- If there is no index in front of the colon, the substring starts at index 0.
- If there is an index after the colon, the substring ends at the character before the indicated index.
- If there is no index after the colon, the substring continues to the end of the string.
- The slicings made by the program, demonstrate all four possibilities.
# Get slices of s
slice1 = s[ i : j ] # Substring starts at index i, ends at index j-1
slice2 = s[ i : ] # Substring starts at index i, continues to end of string
slice3 = s[ : j ] # Substring starts at index 0, ends at index j-1
slice4 = s[ : ] # Substring equals the entire string
- After the slices are gotten, the tracing looks like
Variable | Value |
s | "chocolate" |
i | 3 |
j | 7 |
slice1 | "cola" |
slice2 | "colate" |
slice3 | "chocola" |
slice4 | "chocolate" |
- Lastly, the results are printed, which for our run is
Enter favorite pie: chocolate
Enter two indices i and j: 3 8
s[ i : j ]: colat
s[ i : ]: colate
s[ : j ]: chocolat
s[ : ]: chocolate
- Two more possible program runs follow. I strongly suggest that you trace through the code with these inputs.
Enter favorite pie: pizza
Enter two indices i and j: 2 3
s[ i : j ]: z
s[ i : ]: zza
s[ : j ]: piz
s[ : ]: pizza
Enter favorite pie: garlic tart
Enter two indices i and j: 1 8
s[ i : j ]: arlic t
s[ i : ]: arlic tart
s[ : j ]: garlic t
s[ : ]: garlic tart
Section 5.6: Replacement
- Another handy string method is
replace()
. We again concern ourselves with its basic form, where the function takes two string arguments. The function returns a copy of its string where all occurrences of its first argument are replaced with its second argument.
- The following statement from program replacement.py sets
result
to be a version oftext
where occurrences ofs
are replaced withr
.
result = text.replace( s, r )
- The complete code for the program is
# Get text, search string, and replacement string
text = input( "Enter text: " )
s = input( "Enter substring (s): " )
r = input( "Enter substring (r): " )
# Get version of text with occurrences of s replaced with r
result = text.replace( s, r )
# Print substitution
print()
print( "text.replace( s, r ):", result, " # text's s's replaced with r's" )
- Below are some program runs of program replacement.py. Each run gets from the user the text of interest, the search string, and the replacement string.
- The first run produces a new string where occurrences of
"ll"
in the text sting are replaced with"L L"
.
Enter text: hello mellow pillow
Enter substring (s): ll
Enter substring (r): L L
text.replace( s, r ): heL Lo meL Low piL Low # text’s s’s replaced with r’s
- The second run produces a new string where occurrences of
"eepers"
in the text string are replaced with the empty string""
.
Enter text: jeepers creepers look at those peepers
Enter substring (s): eepers
Enter substring (r):
text.replace( s, r ): j cr look at those p # text’s s’s replaced with r’s
Section 5.7 String casting
- Analogous to the
int()
andfloat()
functions that respectively cast a numeric string into an integer or decimal, there is a built-in functionstr()
that can cast a number into string. The built-in function is namedstr()
. The function is helpful when a number needs to be embeded within a string.
- Consider the below illegal statement.
formula = "H" + 2 + "O" # illegal use of + operator
- The statement does not translate because the
+
operand does not permit mixing strings and numbers as operands.
- Although the code be written in other ways, the below statement demonstrates doing it with the
str()
function.
formula = "H" + str( 2 ) + "O"
Section 5.8 What’s next
- We next turn our attention to some other built-in functions and Python features.
Back Python Chrestomathics Ahead |