SYSC 2006 C Winter 2012 String Processing in C D.L. Bailey, Systems and Computer Engineering, Carleton University
References ● Hanly & Koffman, Chapter 9 ● Some examples adapted from code in The C Programming Language , Second Edition , Kernighan & Ritchie, Prentice Hall, 1988
Objectives ● Understand how C implements character strings ● Look at a few string functions from the C standard library (caller's view) ● Illustrate string processing algorithms by reimplementing some of the standard library functions
String Types ● Unlike C++, Java and Python, C does not have a named string type ○ C++: type string ○ Java: type String ○ Python: type str ● C strings are implemented using arrays of characters
String Constants ● A sequence of characters in double quotes is a string constant or string literal ● Example: "SYSC" ● Stored as an array of characters, terminated by '\0' (the null character ) ○ C compiler creates the array and initializes it
String Constants ● Note 1: '\0' is not the same as '0' (the character zero) ● Note 2: number of elements in the array is 1 more than the number of chars between the quotes ● Adjacent string constants are concatenated at compile time ○ "Hello, " "world!" and "Hello, world!" are equivalent
String Variables ● This declaration: char dept[] = "SYSC"; allocates an array called dept , initialized with 5 chars: 'S', 'Y', 'S', 'C', '\0'
String Variables ● char dept[] = "SYSC"; is equivalent to: char dept[5]; dept[0] = 'S'; dept[1] = 'Y'; dept[2] = 'S'; dept[3] = 'C'; dept[4] = '\0';
String Variables ● We don't need to initialize all the elements in a character array char dept[5]; dept[0] = 'I'; dept[1] = 'M'; dept[2] = 'D'; dept[3] = '\0'; ● dept[4] is uninitialized (that's o.k., because the string is properly terminated with '\0')
String Variables ● Can't assign a string literal to a character array ● This isn't permitted: char dept[5]; ... dept = "SYSC"; // Error!
String Variables ● const qualifier tells the compiler that the array elements should never be altered (compiler should flag any attempt to do so) const char dept[] = "SYSC";
String Operations ● C's operators are not overloaded to support string operations ● Example: in C++, Java and Python, + is the string concatenation operator ○ In C, + cannot be used to concatenate two character strings
<string.h> ● C standard library provides several functions that provide common string operations ● Prototypes are found in <string.h>
strlen ● int strlen(const char s[]); ● Returns the length of its character string argument, excluding '\0' #include <string.h> ... char greeting[] = "Hello"; int len; len = strlen(greeting); // returns 5 (not 6)
strcmp ● int strcmp(const char s[], const char t[]); ● Returns ○ negative value if s < t ○ 0 if s == t ○ positive value if s > t
strcmp Example: char name1[30]; char name2[30]; // Initialization of name1 and name2 // not shown if (strcmp(name1, name2) != 0) { // strings are different }
strstr ● char *strstr(const char s[], const char t[]); ● Returns the location of substring t in string s as a character pointer (we'll study pointers later) ● If substring t isn't found in s , returns the value NULL ○ NULL is defined in several header files ● If all you need to know is whether or not t is in s , but you don't care where, just check if the function returns NULL
strstr Example: char phrase[] = "quick brown fox jumped"; if (strstr(phrase, "fox") == NULL) { printf("fox is not in the string"); } else { printf("fox is in the string"); } ● Output is: fox is in the string
strcpy ● char *strcpy(char s[], const char t[]); ○ ignore char * return type for now ● Copies all chars in t to s , including '\0' ○ Programmer is responsible for ensuring that s is big enough to hold all chars copied from t
strcat ● char *strcat(char s[], const char t[]); ○ ignore char * return type for now ● Concatenates t to end of s , including '\0' ○ Programmer is responsible for ensuring that s is big enough to hold all chars copied from t
A strlen Implementation ● Loop over the string, counting characters until we reach null int CU_strlen(const char s[]) { int i = 0; while (s[i] != '\0') { i = i + 1; } return i; }
A strcmp Implementation ● Loop, comparing the two strings on a character- by-character basis, until we find two characters that differ ● Calculate the difference of those two characters to determine if 1 string is > or < than the other ● If while looping we reach the end of both strings before finding chars that differ, the two strings are equal
A strcmp Implementation int CU_strcmp(const char s[], const char t[]) { int i; for (i = 0; s[i] == t[i]; i = i + 1) { if (s[i] == '\0') return 0; } // i is first pos'n where s and t differ return s[i] - t[i]; }
A strcpy Implementation ● Loop over the source string, copying chracters into the destination string, until we reach the end of the source string ● Null terminate the destination string
A strcpy Implementation void CU_strcpy(char s[], const char t[]) { int i = 0; while (t[i] != '\0') { s[i] = t[i]; i = i + 1; } // Terminate s s[i] = '\0'; }
A strcat Implementation ● Loop over the destination string, until we find null ● Loop over the source string, copying chracters into the destination string, until we reach the end of the source string ○ 1st character copied from source overwrites null in destination ● Null terminate the destination string
A strcat Implementation void CU_strcat(char s[], const char t[]) { int i, j; for (i = 0; s[i] != '\0'; i = i + 1) ; // find end of s // Copy t to s, except for null for (j = 0; t[j] != '\0'; j = j + 1) { s[i] = t[j]; i = i + 1; } s[i] = '\0'; // Terminate s }
Recommend
More recommend