I have very frequently seen people discouraging others from using scanf
and saying that there are better alternatives. However, all I end up seeing is either
In this answer I'm going to assume that you are reading and interpreting lines of text. Perhaps you're prompting the user, who is typing something and hitting RETURN. Or perhaps you're reading lines of structured text from a data file of some kind.
Since you're reading lines of text, it makes sense to organize
your code around a library function that reads, well, a line of
text.
The Standard function is fgets()
, although there are others (including getline). And then the next step is to interpret
that line of text somehow.
Here's the basic recipe for calling fgets
to read a line of
text:
char line[512];
printf("type something:\n");
fgets(line, 512, stdin);
printf("you typed: %s", line);
This simply reads in one line of text and prints it back out.
As written it has a couple of limitations, which we'll get to in
a minute. It also has a very great feature: that number 512 we
passed as the second argument to fgets
is the size of the array
line
we're asking fgets
to read into. This fact -- that we can
tell fgets
how much it's allowed to read -- means that we can
be sure that fgets
won't overflow the array by reading too much
into it.
So now we know how to read a line of text, but what if we really
wanted to read an integer, or a floating-point number, or a
single character, or a single word? (That is, what if the
scanf
call we're trying to improve on had been using a format
specifier like %d
, %f
, %c
, or %s
?)
It's easy to reinterpret a line of text -- a string -- as any of these things.
To convert a string to an integer, the simplest (though
imperfect) way to do it is to call atoi()
.
To convert to a floating-point number, there's atof()
.
(And there are also better ways, as we'll see in a minute.)
Here's a very simple example:
printf("type an integer:\n");
fgets(line, 512, stdin);
int i = atoi(line);
printf("type a floating-point number:\n");
fgets(line, 512, stdin);
float f = atof(line);
printf("you typed %d and %f\n", i, f);
If you wanted the user to type a single character (perhaps y
or
n
as a yes/no response), you can literally just grab the first
character of the line, like this:
printf("type a character:\n");
fgets(line, 512, stdin);
char c = line[0];
printf("you typed %c\n", c);
(This ignores, of course, the possibility that the user typed a multi-character response; it quietly ignores any extra characters that were typed.)
Finally, if you wanted the user to type a string definitely not containing whitespace, if you wanted to treat the input line
hello world!
as the string "hello"
followed by something else (which is what
the scanf
format %s
would have done), well, in that case, I
fibbed a little, it's not quite so easy to reinterpret the line
in that way, after all, so the answer to that part of the question will have
to wait for a bit.
But first I want to go back to three things I skipped over.
(1) We've been calling
fgets(line, 512, stdin);
to read into the array line
, and where 512 is the size of the
array line
so fgets
knows not to overflow it. But to make
sure that 512 is the right number (especially, to check if maybe
someone tweaked the program to change the size), you have to read
back to wherever line
was declared. That's a nuisance, so
there are two much better ways to keep the sizes in sync.
You could, (a) use the preprocessor to make a name for the size:
#define MAXLINE 512
char line[MAXLINE];
fgets(line, MAXLINE, stdin);
Or, (b) use C's sizeof
operator:
fgets(line, sizeof(line), stdin);
(2) The second problem is that we haven't been checking for
error. When you're reading input, you should always check for
the possibility of error. If for whatever reason fgets
can't
read the line of text you asked it to, it indicates this by
returning a null pointer. So we should have been doing things like
printf("type something:\n");
if(fgets(line, 512, stdin) == NULL) {
printf("Well, never mind, then.\n");
exit(1);
}
Finally, there's the issue that in order to read a line of text,
fgets
reads characters and fills them into your array until it
finds the \n
character that terminates the line, and it fills
the \n
character into your array, too. You can see this if
you modify our earlier example slightly:
printf("you typed: \"%s\"\n", line);
If I run this and type "Steve" when it prompts me, it prints out
you typed: "Steve
"
That "
on the second line is because the string it read and
printed back out was actually "Steve\n"
.
Sometimes that extra newline doesn't matter (like when we called
atoi
or atof
, since they both ignore any extra non-numeric
input after the number), but sometimes it matters a lot. So
often we'll want to strip that newline off. There are several
ways to do that, which I'll get to in a minute. (I know I've been
saying that a lot. But I will get back to all those things, I promise.)
At this point, you may be thinking: "I thought you said scanf
was no good, and this other way would be so much better.
But fgets
is starting to look like a nuisance.
Calling scanf
was so easy! Can't I keep using it?"
Sure, you can keep using scanf
, if you want. (And for really
simple things, in some ways it is simpler.) But, please, don't
come crying to me when it fails you due to one of its 17 quirks
and foibles, or goes into an infinite loop because of input your
didn't expect, or when you can't figure out how to use it to do
something more complicated. And let's take a look at fgets
's
actual nuisances:
You always have to specify the array size. Well, of course, that's not a nuisance at all -- that's a feature, because buffer overflow is a Really Bad Thing.
You have to check the return value. Actually, that's a wash,
because to use scanf
correctly, you have to check its return
value, too.
You have to strip the \n
back off. This is, I admit, a true
nuisance. I wish there were a Standard function I could point
you to that didn't have this little problem. (Please nobody
bring up gets
.) But compared to scanf's
17 different
nuisances, I'll take this one nuisance of fgets
any day.
So how do you strip that newline? Three ways:
(a) Obvious way:
char *p = strchr(line, '\n');
if(p != NULL) *p = '\0';
(b) Tricky & compact way:
strtok(line, "\n");
Unfortunately this one doesn't always work.
(c) Another compact and mildly obscure way:
line[strcspn(line, "\n")] = '\0';
And now that that's out of the way, we can get back to another
thing I skipped over: the imperfections of atoi()
and atof()
.
The problem with those is they don't give you any useful
indication of success of success or failure: they quietly ignore
trailing nonnumeric input, and they quietly return 0 if there's
no numeric input at all. The preferred alternatives -- which
also have certain other advantages -- are strtol
and strtod
.
strtol
also lets you use a base other than 10, meaning you can
get the effect of (among other things) %o
or %x
with scanf
.
But showing how to use these functions correctly is a story in itself,
and would be too much of a distraction from what is already turning
into a pretty fragmented narrative, so I'm not going to say
anything more about them now.
The rest of the main narrative concerns input you might be trying
to parse that's more complicated than just a single number or
character. What if you want to read a line containing two
numbers, or multiple whitespace-separated words, or specific
framing punctuation? That's where things get interesting, and
where things were probably getting complicated if you were trying
to do things using scanf
, and where there are vastly more
options now that you've cleanly read one line of text using fgets
,
although the full story on all those options could probably fill
a book, so we're only going to be able to scratch the surface here.
My favorite technique is to break the line up into
whitespace-separated "words", then do something further with each
"word". One principal Standard function for doing this is
strtok
(which also has its issues, and which also rates a whole
separate discussion). My own preference is a dedicated function
for constructing an array of pointers to each broken-apart
"word", a function I describe in
these course notes.
At any rate, once you've got "words", you can further process
each one, perhaps with the same atoi
/atof
/strtol
/strtod
functions we've already looked at.
Paradoxically, even though we've been spending a fair amount of
time and effort here figuring out how to move away from scanf
,
another fine way to deal with the line of text we just read with
fgets
is to pass it to sscanf
. In this way, you end up with
most of the advantages of scanf
, but without most of the
disadvantages.
If your input syntax is particularly complicate, it might be appropriate to use a "regexp" library to parse it.
Finally, you can use whatever ad hoc parsing solutions suit
you. You can move through the line a character at a time with a
char *
pointer checking for characters you expect. Or you can
search for specific characters using functions like strchr
or strrchr
,
or strspn
or strcspn
, or strpbrk
. Or you can parse/convert
and skip over groups of digit characters using the strtol
or
strtod
functions that we skipped over earlier.
There's obviously much more that could be said, but hopefully this introduction will get you started.
Let's state the requirements of parsing as:
valid input must be accepted (and converted into some other form)
invalid input must be rejected
when any input is rejected, it is necessary to provide the user with a descriptive message that explains (in clear "easily understood by normal people who are not programmers" language) why it was rejected (so that people can figure out how to fix the problem)
To keep things very simple, lets consider parsing a single simple decimal integer (that was typed in by the user) and nothing else. Possible reasons for the user's input to be rejected are:
Let's also define "input contained unacceptable characters" properly; and say that:
From this we can determine that the following error messages are needed:
From this point we can see that a suitable function to convert a string into an integer would need to distinguish between very different types of errors; and that something like "scanf()
" or "atoi()
" or "strtoll()
" is completely and utterly worthless because they fail to give you any indication of what was wrong with the input (and use a completely irrelevant and inappropriate definition of what is/isn't "valid input").
Instead, lets start writing something that isn't useless:
char *convertStringToInteger(int *outValue, char *string, int minValue, int maxValue) {
return "Code not implemented yet!";
}
int main(int argc, char *argv[]) {
char *errorString;
int value;
if(argc < 2) {
printf("ERROR: No command line argument.\n");
return EXIT_FAILURE;
}
errorString = convertStringToInteger(&value, argv[1], -10, 2000);
if(errorString != NULL) {
printf("ERROR: %s\n", errorString);
return EXIT_FAILURE;
}
printf("SUCCESS: Your number is %d\n", value);
return EXIT_SUCCESS;
}
To meet the stated requirements; this convertStringToInteger()
function is likely to end up being several hundred lines of code all by itself.
Now, this was just "parsing a single simple decimal integer". Imagine if you wanted to parse something complex; like a list of "name, street address, phone number, email address" structures; or maybe like a programming language. For these cases you might need to write thousands of lines of code to create a parse that isn't a crippled joke.
In other words...
What can I use to parse input instead of scanf?
Write (potentially thousands of lines) of code yourself, to suit your requirements.