C strtok function
last modified April 8, 2025
String manipulation is fundamental in C programming, and strtok is a
key function for splitting strings into tokens. This tutorial covers
strtok in depth, including its syntax, usage, and potential
pitfalls. We'll explore practical examples and discuss safer alternatives like
strtok_s. Understanding strtok helps with parsing and
processing string data while maintaining program safety.
What Is strtok?
The strtok function breaks a string into tokens using specified
delimiters. It's declared in string.h and modifies the original
string by replacing delimiters with null characters. strtok is not
thread-safe and maintains internal state between calls. For safety-critical
code, consider strtok_s or strtok_r which provide
bounds checking and thread safety. Always use caution with string modification.
Basic strtok Usage
This example demonstrates basic string tokenization using strtok.
#include <stdio.h>
#include <string.h>
int main() {
char str[] = "apple,orange,banana";
char *token;
// Get first token
token = strtok(str, ",");
// Get remaining tokens
while (token != NULL) {
printf("Token: %s\n", token);
token = strtok(NULL, ",");
}
return 0;
}
Here, strtok splits the string at each comma delimiter. The first
call uses the string pointer, while subsequent calls use NULL. The function
returns pointers to each token. Note that strtok modifies the
original string. This is a simple way to parse comma-separated values or similar
delimited data.
Multiple Delimiters with strtok
strtok can handle multiple delimiter characters, as shown here.
#include <stdio.h>
#include <string.h>
int main() {
char str[] = "apple orange,banana;pear";
char *token;
// Use space, comma, and semicolon as delimiters
token = strtok(str, " ,;");
while (token != NULL) {
printf("Token: %s\n", token);
token = strtok(NULL, " ,;");
}
return 0;
}
This example uses multiple delimiters (space, comma, semicolon) to split the
string. The delimiter string contains all characters that should separate
tokens. strtok treats any sequence of these characters as a single
delimiter. This flexibility makes it useful for parsing various text formats.
Remember that consecutive delimiters are treated as one.
Safe Alternative: strtok_s
This example demonstrates the safer strtok_s function available in
C11.
#define __STDC_WANT_LIB_EXT1__ 1
#include <stdio.h>
#include <string.h>
int main() {
char str[] = "one:two:three";
char *token;
char *context;
// Safe tokenization with context pointer
token = strtok_s(str, ":", &context);
while (token != NULL) {
printf("Token: %s\n", token);
token = strtok_s(NULL, ":", &context);
}
return 0;
}
strtok_s adds thread safety by using an explicit context pointer
instead of internal state. The context pointer tracks tokenization progress.
This function is recommended for multithreaded applications. While not
universally available, it's included in C11's optional Annex K. The macro
__STDC_WANT_LIB_EXT1__ enables these safer functions.
Tokenizing with Different Delimiters
This example shows how to change delimiters between strtok calls.
#include <stdio.h>
#include <string.h>
int main() {
char str[] = "name=John Doe;age=30;city=New York";
char *token;
// First split by semicolon
token = strtok(str, ";");
while (token != NULL) {
printf("Field: %s\n", token);
// For each field, split by equals
char *key = strtok(token, "=");
char *value = strtok(NULL, "=");
printf(" Key: %s, Value: %s\n", key, value);
token = strtok(NULL, ";");
}
return 0;
}
This code first splits the string by semicolons, then splits each resulting token by equals signs. Nested tokenization is possible by using different delimiters. However, this approach can be confusing and may lead to errors. For complex parsing, consider dedicated parsing libraries or writing custom parsers. Always document such nested tokenization clearly.
Tokenizing a File Line by Line
This example demonstrates reading a file and tokenizing each line.
#include <stdio.h>
#include <string.h>
int main() {
FILE *file = fopen("data.txt", "r");
if (file == NULL) {
perror("Error opening file");
return 1;
}
char line[256];
while (fgets(line, sizeof(line), file) {
// Remove newline character
line[strcspn(line, "\n")] = '\0';
char *token = strtok(line, ",");
while (token != NULL) {
printf("Token: %s\n", token);
token = strtok(NULL, ",");
}
printf("----\n");
}
fclose(file);
return 0;
}
This program reads a file line by line, tokenizing each line with commas.
fgets reads each line safely with buffer size checking. The
newline character is removed before tokenization. This pattern is useful for
processing CSV files or other line-based formats. Remember to always check file
operations for errors and close files properly.
Best Practices for Using strtok
- Avoid modifying source strings: Make copies if you need the original.
- Consider thread safety: Use
strtok_sorstrtok_rin multithreaded code. - Handle empty tokens: Consecutive delimiters produce empty tokens.
- Document delimiter changes: When changing delimiters between calls.
- Check for NULL returns: Always verify tokens before using them.
Source
This tutorial has explored the strtok function, from basic usage to
advanced considerations. While powerful for string parsing, always use it
carefully to prevent security issues and undefined behavior in your programs.
Author
List C Standard Library.