Jump to: navigation, search

Chapter 11 - Getting input into a string

When you need information from the user of your program, the usual way to do it is to read that information into a string.

As you might expect, C provides a very good function for the purpose, but unfortunately at this stage of the tutorial it would involve a little magic (i.e. a reference to something a little too complicated for now), and we'd rather avoid that if we can. So we'll write our own function. This will have the added benefit of forcing us to think about how to use strings safely and effectively.

Our input function must be able to capture input from the standard input device and place it into our string. To do that, it has to know where the string starts, so we'll have to pass it a pointer to the first byte of the string. It also needs to know how much data it can safely write. We could, of course, write a null terminator into our char array, and ensure that our input function stops when it encounters the null character. If we do that, though, we would have to write non-null characters in the rest of the array -- otherwise, the input function would stop immediately, without telling us anything! And that would be very inconvenient. But of course there is a simpler way, which is simply to tell the function a maximum number of bytes we're prepared to accept.

We also want the input function to tell us whether it succeeded in reading input into our string. We'll do that via a return value. We have a couple of sensible choices here, but perhaps the best at this stage is a simple 0 to mean that everything worked, or a special C identifier, EOF, if anything went wrong. (EOF, which stands for End Of File, is not a character. It's a special value to indicate that no more data is available.)

And so we're ready to write our function, input_str, which will copy data from the standard input device into a string, until it runs out of room or encounters a problem or encounters a newline character.

input_str() - a first attempt

#include <stdio.h>

int input_str(char *s, int maxroom)
{
  int rc = 0; /* we set this to EOF if there's a problem */
  int ch;     /* temporary storage for each character */
  int curpos = 0; /* where to write the next character */
  int finished = 0;  /* set to 1 when done */

  --maxroom; /* reserve a place for the null terminator */

  while(finished == 0) /* not finished */
  {
    if(curpos < maxroom)
    {
      ch = getchar();
      if(ch != EOF) /* getchar gives EOF on error */
      {
        s[curpos] = ch; /* place the character into
                           the string */
        ++curpos; /* move the current position marker */
      }
      else
      {
        finished = 1; /* error - can't go further */
        rc = EOF; /* prepare to return EOF */
      }
      if(ch == '\n')
      {
        finished = 1; /* user pressed ENTER */
      }
    }
    else
    {
      finished = 1; /* we ran out of space */
    }
  }
  s[curpos] = 0; /* null terminate the string */

  return rc;
}

This is rather a lengthy function, considering its rather limited purpose, but it has the merit of being clear. At every stage, we can see exactly what is happening.

Nevertheless, we should perhaps start to introduce some idiomatic short-cuts (C is good at these short-cuts, and they are frequently used, so we should get to know them).

Firstly, let's see if we can reduce the number of times we have to set finished. We set it to 1 three times. Why not set it to 1 just once, and then set it to 0 in the one single circumstance when we don't need to finish?

Also, while(finished == 0) is a little wordy. We could just say while(!finished), which we can read out loud as while not finished, and this is actually clearer than the original. So let's make those two improvements:

input_str() - second try

#include <stdio.h>

int input_str(char *s, int maxroom)
{
  int rc = 0; /* we set this to EOF if there's a problem */
  int ch;     /* temporary storage for each character */
  int curpos = 0; /* where to write the next character */
  int finished = 0;  /* set to 1 when done */

  --maxroom; /* reserve a place for the null terminator */

  while(!finished)
  {
    finished = 1;
    if(curpos < maxroom)
    {
      ch = getchar();
      if(ch != EOF) /* getchar gives EOF on error */
      {
        s[curpos] = ch; /* place the character into
                           the string */
        ++curpos; /* move the current position marker */
        if(ch != '\n')
        {
          finished = 0; /* room for more */
        }
      }
      else
      {
        rc = EOF; /* prepare to return EOF */
      }
    }
  }
  s[curpos] = 0; /* null terminate the string */

  return rc;
}

This is a little better, but we can still improve it. For example, we know that curpos++ yields the value that curpos had before being incremented (increased by one), so we could use that to combine two lines into one:

  s[curpos++] = ch;

But could we somehow work that into the loop statement? If we could, it would dramatically shorten the code:

input_str() - third try

#include <stdio.h>

int input_str(char *s, int maxroom)
{
  int rc = 0; /* we set this to EOF if there's a problem */
  int ch;     /* temporary storage for each character */
  int curpos = 0; /* where to write the next character */

  --maxroom; /* reserve a place for the null terminator */

  while(curpos < maxroom &&
        (ch = getchar()) != EOF &&
        (s[curpos++] = ch) != '\n')
  {
  }

  s[curpos] = 0; /* null terminate the string */

  if(ch == EOF)
  {
    rc = EOF;
  }
  return rc;
}

The body of the loop has been reduced so drastically that it no longer contains any statements at all! This is such a radical change that we ought to assure ourselves it's correct.

To begin that process, let's just check that we never write outside the memory we've been told we can use. We've been given a limit of maxroom bytes. We deliberately chopped one off maxroom so that we wouldn't have to worry about leaving space for the null terminator. (Remember, C is pass by value, so changing maxroom in our function has no effect on the calling function).

We can now write into any element in the array whose first element s points to, with a maximum index of maxroom-1. The first test in our loop control ensures that curpos has a lower value than maxroom, so if we write to s[curpos] we'll be okay.

The second test never happens unless curpos is lower than maxroom. The && operator introduces a sequence point after its left-hand test, and it never bothers to conduct the right-hand test unless the left-hand test passed.

So we can move on. The next test is:

(ch = getchar()) != EOF

This one is rather more challenging. Let's start by pointing out that the first = sign is assignment, not comparison. Comparison has a relatively high precedence. It is certainly higher than assignment. Since we want the call to getchar to be assigned to ch, we must therefore use parentheses to force the assignment to happen first.

The assignment operator has the curious property of yielding a value, which is of course the value being assigned. So the expression ch = getchar() has a value, i.e. the value assigned to ch. It is this value that is then compared to EOF. If the two values are equal, then the != test fails, and the loop stops. Otherwise, we keep going, on to the next test, which is:

(s[curpos++] = ch) != '\n'

Let's break this down a little. Firstly, we use parentheses to force the assignment to take precedence over the comparison (which otherwise it wouldn't, as before).

The value being assigned is obvious; it's ch. But the object being updated is less obvious. If it were just s[curpos], that would be fine - we're writing a character value into the string. But then, next time round the loop, we'd write the next character into the same position, which isn't what we want. So we post-increment curpos, by which I mean that we add one to its value using curpos++, the result of which is not only to add 1 to curpos, but also to yield the value before that 1 was added. So the character is written into the correct place in the string, and curpos is correctly adjusted for the next time round the loop.

It's a very busy loop, but with the saving grace that we don't have to worry about what's going on in between the curly braces, because nothing is going on.

Now let's use this function to capture some strings and display them:

#include <stdio.h>

int input_str(char *s, int maxroom)
{
  int rc = 0; /* we set this to EOF if there's a problem */
  int ch;     /* temporary storage for each character */
  int curpos = 0; /* where to write the next character */

  --maxroom; /* reserve a place for the null terminator */

  while(curpos < maxroom &&
        (ch = getchar()) != EOF &&
        (s[curpos++] = ch) != '\n')
  {
  }

  s[curpos] = 0; /* null terminate the string */

  if(ch == EOF)
  {
    rc = EOF;
  }
  return rc;
}

int main(void)
{
  char input[8];
  puts("Please type your name:");
  input_str(input, 8); /* line A */
  printf("You answered: [%s]\n", input); /* line B */

                /* space C */

  return 0;
}

The first time you run this program, just type a short name (Tom, or Jenny, or something like that), to assure yourself that the program works.

The second time, try typing a much longer name, such as Bartholomew. What does the program display? Why?

Copy lines A and B into the space marked C, so that the input is captured twice. Then re-compile the program, and run it again. Use 'Bartholomew' as the input again. What does the program display this time? And why?

Searching through a string

Although the newline character that is (normally) written into the string by input_str is useful in that it tells us that the user's input could all fit into our array, it is nevertheless a bit of a nuisance, and it would be good if we could get rid of it. To do that, however, we need to know where it is. Here, then, is a function that searches through a string, looking for a particular character. If the function finds the character, it will return a pointer to that character. Otherwise, it will return a special pointer value known as NULL, for which we can test.

#include <stdio.h>

int find_character(char *s, int ch)
{
  char *found = NULL;
  while(*s != 0 && *s != ch)
  {
    ++s;
  }
  if(*s == ch)
  {
    found = s;
  }
  return found;
}

This function doesn't need to know how long the string is, because it can simply stop looking if it hits the null terminator. So we only need to tell it the place, s, where the string starts, and the character ch that we're searching for. The loop examines the string one character at a time, and stops if it runs out of string or if it finds the character.

We can use this function to remove a newline character, like this:

#include <stdio.h>

int find_character(char *s, int ch)
{
  char *found = NULL;
  while(*s != 0 && *s != ch)
  {
    ++s;
  }
  if(*s == ch)
  {
    found = s;
  }
  return found;
}

int main(void)
{
  char *newline;

  char data[] = "1234567890\nMore data\n";
  printf("Before: [%s]\n", data);

  newline = find_character(data, '\n');
  if(newline != NULL)
  {
    *newline = 0; /* replace the first newline with 0 */
    printf("After: [%s]\n", data);
  }
  return 0;
}

Compile and run the program. What is the output? What happened to 'More data'?

The strchr function

As it happens, there is already a standard C function to do the same job as the find_character() function. So why did we write it? Well, mostly to show you that there's nothing magical about standard C functions. Very often, we can write them ourselves in a few short lines of C. Nevertheless, they are handy things to have around. The C equivalent is called strchr. This is the first standard C function we've met that isn't prototyped in <stdio.h>. Rather, it is in a header called <string.h>.

#include <stdio.h>
#include <string.h> /* for strchr */

int main(void)
{
  char *newline;

  char data[] = "1234567890\nMore data\n";
  printf("Before: [%s]\n", data);

  newline = strchr(data, '\n');
  if(newline != NULL)
  {
    *newline = 0; /* replace the first newline with 0 */
    printf("After: [%s]\n", data);
  }
  return 0;
}

In this version of the program, we added <string.h> to the list of header inclusions, we removed the find_character function completely, and we replaced the call to it with a call to strchr. No other changes were necessary.

Armed with strchr, we can write a useful function that is not in the standard library, but which is still a very handy thing to have around:

#include <string.h>

int replace_character(char *s, int old, int new)
{
  char *t = strchr(s, old);
  int rc = (t == NULL);
  if(t != NULL)
  {
    *t = new;
  }
  return rc;
}

To use this function, you need only call it with the string you're concerned with, the character you want to replace, and the character you want to use instead.

Things to do

Write a program that uses input_str to fetch a string from the user. Then replace each newline '\n' character with an 'X' character, and every '4' character with a 'Z' character.

Test your program to ensure that it works properly.

Summary

In this chapter, you learned how to read information from the standard input device into a string. You also learned about the NULL value and the strchr function.

In the next chapter, we will discover some more of C's standard library functions for manipulating strings.

Progress

Terminology
  • NULL
  • EOF
  • signed integer type
  • unsigned integer type
  • byte
  • bit
  • variadic function
  • conversion specifier
  • precedence
  • array
  • index
  • pointer
Syntax
  • comments
  • types
  • operators
    • increment and decrement operators
      • ++n n++ --n n--
    • assignment operators
      • = += -= *= /= %=
    • additive operators
      • the + operator
      • the - operator
    • multiplicative operators
      • the * operator
      • the / operator
      • the % operator
    • equality and relational operators
      • == != < > <= >= !
    • logical operators
      • the && operator
      • the || operator
    • address and indirection operators
      • the unary * operator
      • the unary & operator
      • the array subscripting operator []
    • miscellaneous operators
      • the conditional operator ? :
      • the comma operator ,
      • the sizeof operator
  • control structures
    • if/else
    • while
    • do/while
    • for
Standard library functions, by header
Personal tools