A Quick Tutorial on C Programming using Berkeley UNIX - CiteSeerX

4 downloads 0 Views 377KB Size Report
Jan 13, 1992 - A Quick Tutorial on C Programming using Berkeley UNIX. Peter C.J. Graham. Dept. of Computer Science. University of Manitoba. Winnipeg ...
A Quick Tutorial on C Programming using Berkeley UNIX Peter C.J. Graham Dept. of Computer Science University of Manitoba Winnipeg, MB., Canada R3T 2N2 13 January 1992 Version 1.2

1

Introduction

This document hopes to fill a void between textbooks on C programming and the existing “quick reference” documents which give little more than information on C syntax. It should provide sufficient information to allow someone who is already familiar with a high-level, imperative programming language 1 to quickly be able to program effectively in C. It does not attempt to point out all of the “features” of C nor does it attempt to provide extensive examples of C code. For this reason, it may be necessary for some people to consult a conventional text in certain cases. C is an extremely terse language syntactically and to some extent semantically as well. This makes the programmer’s life easier since a program may be expressed more concisely, but complicates things when code written by others must be examined and/or modified. The key to being an effective C programmer is to learn to recognize and accept its limitations. Certain guidelines or “rules of thumb” will become apparent as you read this document and if you follow them you will find that C is a useful and enjoyable language to program in. If you ignore them, you will be bitten again and again by the quirks of the language. The very last section of this document contains some sample programs that have been collected to illustrate concepts covered herein. You are strongly encouraged to examine them after reading the text of the document. They are all examples of code that was written to solve some real problem, they were generall not contrived for illustrative purposes. 2

2

Basic Data Types and Operators

C’s builtin data types are not much different from those of other, similar languages. It has data types for representing integer, real, and character values as you would expect and provides all the normal operators for calculating with them. It also provides some unusual operators which are new and in many cases very useful. One of the biggest problems new C programmers experience is dealing with its “soft” typing. (e.g. integers, characters, and pointers may be freely intermixed.) It is thus important to either understand precisely what C does when types are mixed or to restrict your programming in C to type-safe operations. There is a C code checking program which attempts to warn you of type 1 2

such as Pascal, Modula, Turing, PL/1, etc. Due to the lack of ongoing examples it seemed important to include some “real” code.

1

mismatches (among other things). It is known as ‘lint’ and is well worth using. See man lint for more information. The basic builtin types in C are the following: int Your basic integer data type. Add ’em, subtract ’em, multiply ’em together. Just what you would expect of an integer. On most current machines these are 32 bit integers. short A (possibly) shorter integer provided to allow you to economize on space. Commonly this is a 16 bit integer although in some cases it may be as big as an ‘int’. long A (possibly) longer integer. Normally a 64 bit integer if ‘int’ is 32 bits. unsigned Gives you an unsigned ‘int’ if used alone as a data type. May also be used as a sort of type modifier in front of ‘int’, ‘short’, or ‘long’ giving an unsigned integer of the required length. (e.g. ‘unsigned long’) float This is your basic real data type. Size is probably 32 bits but exact details are implementation dependent. double As the name implies, a “double” size real. That probably means 64 bits, but if the size is really important check the compiler documentation. char A single byte character. On most reasonable machines the characters will be represented using ASCII. The ‘char’ type is commonly used to simply store a byte of data regardless of its type. Remember, C is softly typed. A declaration in C has the type preceding the variables being declared which is the opposite of what is done in Pascal. Thus, a declaration might be: int i,j,fred;

/* three integer variables named ‘i’, ‘j’, and ‘fred’ */

rather than the equivalent Pascal syntax of: VAR i,j,fred:INTEGER; Notice the use of ‘/*’ and ‘*/’ to delimit comments. Comments may span multiple lines but may not be nested. Notice also that C, unlike Pascal, uses semicolons as terminators not as separators. Thus there is no confusion as to whether to put in a semicolon or not and extra semicolons are never an error they are simply ignored. Finally, always remember that C is case-sensitive. Thus ‘int’ and ‘INT’ are different. The operators in C are pretty much a superset of those in other imperative languages. The basic math operators are what you would expect. + Addition of integer or real values. − Subtraction of integer or real values. ∗ Multiplication of integer or real values. / Division of integer or real values.

2

The remainder on integer division may be obtained using the ‘%’ operator. One fundamental difference between C and, say Pascal, is that in C, assignment (‘=’) is considered to be an operator. This means that it may occur in the middle of complex expressions. This is convenient in some cases but should probably be used sparingly for the sake of readability. C provides many different assignment operators. It offers a shorthand (‘< op >=’) for assignments of the form: < vble >=< vble >< builtin operator >< expression > Thus for example, adding two to a variable may be done using the syntax: < vble > + = 2; In order to have things work correctly the precedence of the assignment operators is lower than the precedence of the other operators. The relative precedence of the other operators is what you would expect. The syntax of the comparison operators (as used in expressions and in the control structures discussed in the next section) is as follows: == Comparison for equality. (Don’t use ‘=’, Since it is an operator, no syntax error will be produced, but your code will not run correctly) ! = Comparison for inequality. < Comparison for less than. > Comparison for greater than. = Comparison for greater than or equal. There is no ‘logical’ or ‘boolean’ type in C. Instead, comparison operators return either zero for false or one for true. C also has the normal logical connectives with the following syntax: ! Logical negation && Logical AND || Logical OR The logical operators work with zeroes and ones for true and false in the expected way. Unique to C are the bitwise operations and the shift operations, both of which treat their arguments as bit streams. (e.g. a ‘short’ with the value ‘7’ would be treated as a (presumably 16 bit) bitstring with the value ‘0000 0000 0000 0111’.) This class of operators includes the following: Shift right by the specified amount. & Bitwise AND operation. This operator ANDs the corresponding bit positions in its arguments together. (e.g. given a = 3; b = 6; a&b would give you the result ‘000..0010’ since the bit pattern for 3 is ‘000..011’ and the pattern for 6 is ‘000..0110’.) 3

| Bitwise OR operation. ˆ Bitwise EXOR operation. ˜ One’s complement. You should be careful not to confuse the bitwise AND and OR operations (& and |) with the logical AND and OR operators (&& and ||). C also provides increment and decrement operators (‘++’ and ‘−−’) which may either follow or precede the variable they are applied to. If they precede the variable, the increment or decrement will be performed before the variable is used in the expression. If they follow it, the increment or decrement is done after the variable is used. Hence, these are referred to respectively as “pre-” and “post” increments and decrements. You will see examples of the use of these operators later in this document.

3

Control Structures

C has the normal ifs and whiles and it has a very unusual for statement. An if statement has the syntax: if ( < expression > ) < statement > or: if ( < expression > ) < statement > else < statement > As in Pascal, a single statement is expected within control structures but multiple statements may be provided if they are combined into a “compound” statement. In C this is done using braces ( ‘{’ and ‘}’). They are used just as you would use BEGIN and END in Pascal. Every set of braces defines a “block” in C and every block may contain local declarations. This is a feature whose value is hotly contested. On the one hand people suggest that localizing declarations makes a program easier to maintain but on the other it is argued that such definitions are easily missed when scanning a program. The choice of whether or not to use them is up to you! The while statement has two variants depending on whether you want the test performed at the beginning or end of the loop. The syntax: while ( < expression > ) < statement > gives a loop which has its continuation test performed before the loop body is executed. The syntax: do < statement > while ( < expression >) gives a loop with its continuation test performed after the loop body is executed. It is important to realize that this is a continuation test and not a termination test. This is sometimes referred to as a do statement. C’s for statement is very general and very powerful. It may be used to easily implement a simple counting loop 3 but it can also do much more. The general format of a for statement is: for ( < expr1 > ; < expr2 > ; < expr3 > ) < statement > 3

which is what is normally provided by a for statement

4

The first expression is the “initialization” expression. It is evaluated once before the loop body is executed. The second expression is the “continuation” expression. It is evaluated at the beginning of each iteration of the loop. If it evaluates to ‘true’ (non-zero), the loop body is executed. If it evaluates to ‘false’, the loop exits. The third expression is the “increment” expression. It is unconditionally evaluated at the end of every iteration of the loop. Thus, for example, to implement a loop which counts from 1 to 10 you might code the following: for ( i = 1; i The for loop is also commonly used to walk linked lists and other structures. Furthermore, any or all of the three expressions may be omitted. So, for example, if all three are omitted, an infinite loop will result. C provides a way of terminating the current iteration of a loop (be it a for, while, or do loop). This is accomplished with the continue statement which may be placed anywhere in the body of a loop. The break statement may also be used within a loop to terminate that loop’s entire execution. The break and continue statements are considered (along with the goto statement and labels) to be unstructured constructs and as such should probably be avoided whenever possible. C also provides a multi-way branch structure to avoid deeply nested if statements. The switch statement implements this function. The usage of this statement is as follows: switch ( < expression > ) { case < const expr > : ... default :

< statements > break ; < statements > break ;

} The expression given in the switch statement is evaluated and the case label with the corresponding constant expression value will be branched to. The statements associated with that case are executed. The break statement, used in this context, causes control to exit the switch statement. If you omit it, execution will fall through into the statements associated with the next case label. Be careful, this is a common programming error. In the event that no case label matches the value of the switch expression, the statements associated with the default label will be executed. Finally, C provides what it calls a “conditional” expression. This is a sort of combination of an if statement and an expression. When you wish to assign one of two expressions to the same variable based on some logical condition, you can avoid using an if statement. That is, you can replace the code: if ( < expr >) < vble >=< expr1 > ; else < vble >=< expr2 > ; with the code: < vble >= (< expr >) ?

< expr1 > :

< expr2 > ;

5

4

Arrays

C language arrays are not very difficult to understand. They are often considered a difficult topic because of their relationship to character strings and pointers. Taken by themselves however, they differ from arrays in most other languages only by being indexed beginning at zero rather than one. An array is declared as in the following example: float a[37]; Notice the use of square rather than round brackets to specify subscripting. This code declares an array variable named ‘a’ which consists of 37 floats. The elements of ‘a’ are accessed individually as ‘a[0]’ through ‘a[36]’. Arrays are always indexed beginning at zero. Multi-dimensional arrays are also possible and are declared as in: int b[52][7]; In this example a 52 x 7 array of integers is declared. Array elements are accessed using the syntax ‘b[i][j]’. C provides a facility to allow arrays (or for that matter scalars) to be initialized when they are declared. Consider the following examples: int i = 4; float r[3] = { 2.45, 0.1234E-26, 987.42387 }; int ident2x2[2][2] = { { 1, 0 }, { 0, 1 } }; A particularly important application of arrays is in the implementation of strings. In C, character arrays are used to store strings. Each element of the array contains a single character in the string and the end of a string in a character array is normally marked by a zero byte. A single character in C is enclosed in single quotes (e.g. ’a’, ’G’, ’1’, ’(’, ’ ’). It is possible to specify certain “special” characters which cannot be entered from the keyboard using “escapes”. The “escape” character is the backslash ‘\’ and can be prefixed to certain letters to obtain special characters. The important ones are summarized in the following table: Escape Sequence ‘\0’ ‘\b’ ‘\f’ ‘\n’ ‘\r’ ‘\t’ ‘\v’ ‘\\’

Corresponding Character Null (zero byte) Backspace Formfeed Newline/linefeed Carriage Return Tab Vertical Tab backslash

A “string” constant in C is enclosed in double quotes (e.g. ”fred”, ”123456”, ”@4(*”, ”a”). 4 A string constant has an implicit Null appended to it and is really just a short hand for specifying each character in turn followed by a zero byte. C provides string constants and the ability to declare character arrays (to store strings into) but this is the extent of its support. Everything else (including string comparison, copying, concatenation, etc.) is provided by library routines. 4

Note that a single character enclosed in double quotes is very different from a single character enclosed in single quotes.

6

5

Subprograms

Subprograms (functions/procedures) in C cause many programmers a good deal of grief. This is primarily due to the way in which parameter passing is accomplished. In C all parameters are passed by value rather than by address. 5 At first, this seems to present a rather serious problem but in fact it does not. Rather than providing a different language-based mechanism for passing parameters by address, 6 C leaves the responsibility for this up to the programmer. The programmer must pass a pointer to a variable which is to be changed. In this way, even though the parameter (the pointer) cannot be changed, what it points at can be. Only the simplest details concerning pointers will be covered in this section. Further information about pointers in C will be left until later. C does not really distinguish between procedures and functions in the way that Pascal does. All subprograms are generally referred to as “functions” in C and return some value which may or may not be used by the function’s caller. A function has a name, some declarations associated with it, and some executable statements which form its body. Most notably, every program must have a function named ‘main’ which contains at least a single executable statement. When a compiled program is run, execution begins in the function named ‘main’. Hence, ‘main’ contains your “mainline” code which may in turn call other functions to accomplish work on its behalf. Perhaps the simplest (and most ubiquitous) C program is the following: main () { printf("Hello World"); } The ‘main’ function has no arguments (hence the empty parentheses ‘()’) and declares no variables. It simply prints the message “Hello World” using a routine from the standard I/O library. A function’s body consists of a block of statements which may contain declarations which are local to that function. In general there is no definition before use rule in C but it is safest to declare the variables used in a block at its start. Functions are no exception in this respect. If a function does have parameters, there are two ways to declare the types of its parameters. The preferred method, which is supported only in Ansi C, has the parameters declared right in the function header, while the old “K&R” (standard) C method declares the parameters between the function header and the ‘{’ which starts the function body. Ansi C does recognize the “K&R” format so if you are using an Ansi C compiler the choice of formats is yours. For example, the routine which returns the maximum of two integer values might be coded as follows using a standard C compiler: int max(x,y) int x,y; { ... Using an Ansi C compiler we could equivalently write: 5 6

In Pascal terms, there are no VAR parameters. so that the parameters may be changed and returned

7

int max(int x,int y) { ... The name of the function is ‘max’ and it is declared to have two integer parameters ‘x’ and ‘y’. It returns an integer (presumably the maximum of the two values) and this is indicated by the int keyword preceding the function name. Not all functions need to return values explicitly. In Ansi C the function name can be prefixed with the keyword void to indicate this fact. In “K&R” C the default return type is int even if nothing is specified. In many cases, it is useful to have a function return an integer which indicates whether or not its execution was successful. This allows its caller to detect when errors have occurred. By convention, a value of zero indicates that no error occurred while other values are used to indicate specific errors. The ‘return’ statement is used to return values as in: return (i*2); In C it is possible to return user-defined data types from a function so the return type of a function may be rather complex to specify. This is a chief contributing factor to the readability problems attributed to C functions. The problem is easily avoided through the use of typedefs as described later in this document. C provides easy mechanisms to declare a pointer to an object and to get the address of an object. This is necessary if a subroutine is to update the value of a variable declared by its caller (i.e. a parameter). In the following declarations, ‘xval’ is an integer and ‘xptr’ is a pointer to an integer. int xval; int *xptr;

/* Note the ‘*’ which makes ‘xptr’ a pointer variable */

To set ‘xptr’ to point to ‘xval’ we use the unary address operator (‘&’) as in the following code: xptr = &xval; When we wish to refer to what a pointer points at rather than the pointer itself, we dereference the pointer by prefixing it with the unary dereference operator (‘∗’). For example, the following code assigns the value of the integer pointed to by ‘xptr’ to the integer variable ‘yval’. 7 yval = ∗xptr; This pointer/address syntax is also used for passing the addresses of variables to functions. Consider for instance the standard routine for swapping two values commonly used by sorting routines: 7 It often helps beginning C programmers to choose variable names such as ‘xval’ and ‘xptr’ in order to keep track of whether they are dealing with an object or a pointer to it.

8

void swap(int *xptr, int *yptr) { int temp;

/* accepts pointers to two integers */

temp=*xptr; *xptr=*yptr; *yptr=temp; } This function could be called to swap the values in the ith and j th elements of the array ‘a’ as follows: ... swap(&a[i],&a[j]); ... In this example we are passing the address of each array element to the function swap. The address operator works equally well on array elements as it does on simple variables (or other objects).

6

Pointers

Pointers, particularly as they relate to character arrays and parameter passing, are perhaps the most difficult concept in C to grasp. This is due to the fact that C allows you to do some interesting things with pointers. The important thing to remember is that pointers in C are no different from pointers in any other language 8 . C’s operations on pointers are exactly that – operations on pointers. Problems are caused by confusing the pointer and what it points at. The problems are merely exacerbated in C because of its extended capabilities and terse syntax. You have already seen examples which declare and dereference pointers and that find the address of a data object. At this point the basic usage of pointers should be clear to you. You should also understand how pointers are used to “implement” call-by-address parameters. If you have any doubts, go back now and review the examples. One of the interesting things about C’s pointers is their relationship to arrays. There is a certain “duality” between pointers and arrays in that any subscripting operation can be equivalently accomplished using pointer operations and vice-versa. Let us declare some variables to work with. We will choose to deal with an array of integers and pointers to integers, but this is, of course, purely arbitrary and we could have chosen any other base type (builtin or user-defined) instead. int a[10]; int *ptr,*ptr2; We can assign ‘ptr’ the address of the first element of ‘a’, as we know, by coding ‘ptr=&a[0];’. C treats the name of an array as a synonym for the location of its first element. Thus we could 8

they still contain the address of some data object that they “point” at

9

equivalently write ‘ptr=a;’. Indeed this is the preferred form. Once we have made this assignment, dereferencing ‘ptr’ will give us the contents of the first element of ‘a’. Since ‘ptr’ and ‘ptr2’ are of the same type, we may legitimately assign one to the other. After executing the statement ‘ptr2=ptr;’, ‘ptr2’ will also point at the first element of ‘a’. Note that we have copied pointer values (after all we assigned one pointer to another) and not copied the integers themselves. If we dereference either pointer we access ‘a[0]’. Copying pointers does not copy the values they point at regardless of the underlying data type. When we specify a string constant using double quotes what we are really dealing with is a pointer to the character array (or memory location if you will) containing the characters which make up the string. 9 Thus, to code the following results in ‘sptr’ pointing at the string constant ‘"flange"’. It is incorrect to say that ‘"flange"’ is assigned to ‘sptr’ ! char *sptr; ... sptr = "flange"; C will allow address arithmetic to be performed on pointers. Thus, continuing with our previous example, the expression ‘ptr+1’ points to element one of ‘a’. It does not point one byte past the beginning of element zero. This means that if we code ‘*(ptr+1)’ we are accessing element one. When you operate on a pointer, C adjusts the operation so that it “increments” or “decrements” in multiples of the size of the object the pointer points at. If we have a character pointer and characters are a single byte in length then the increment value will be one. If we have a pointer to a double which is eight bytes in length then the increment will be eight and so on. This address arithmetic allows us to easily (and efficiently) process the elements of an array using a pointer. Consider our previous declarations and the following for loop which adds one to every element of ‘a’. for (ptr=a;ptr’), the ‘y’ field of the structure pointed to by ‘ptr’ is decremented. As mentioned earlier, C allows user-defined types to be the result type of functions. Thus, it would not be unexpected to see the code: struct point makepoint(int x, int y) { struct point temp; temp.x=x; temp.y=y; return temp; /* the ‘return’ stmt returns a ‘point’ value */ } Notice that this function returns a ‘point’ structure not a pointer to one. This means that the caller must have allocated a point structure in which to place the value returned by the function. This only makes sense since after the ‘makepoint’ routine completes the storage associated with ‘temp’ (its local variable) is free to be reclaimed by the system. (i.e. ‘temp’ is an “automatic” variable.) 12 Thus, a correct calling sequence would be: struct point origin; ... origin = makepoint(0,0); /* this is an example of structure assignment */ To be truly useful, it must be possible to dynamically allocate structures. In keeping with its general philosophy, C makes use of certain library routines to accomplish the necessary memory management rather than providing C language statements. Normally, such fundamental library routines as those associated with memory management (and basic I/O as we will see) are automatically made available to the programmer. The functions used for dynamic allocation/deallocation are: sizeof(typename) returns an integer indicating the number of bytes required to store an object of the given type. malloc(size) returns a pointer to a dynamically allocated area of memory of the specified size. free(ptr) frees up the space associated with the given pointer which was previously obtained from ‘malloc’. To dynamically allocate a ‘point’ structure we can use the following code: struct point *pptr; ... pptr = malloc(sizeof(point)); /* pptr points at the allocated structure */ ... free(pptr); 12

C also permits “static” variables by prefixing a declaration with the keyword ‘static’.

12

The ‘malloc’ routine need not be used to allocate space for only scalar variables. For example, you could determine the bounds of an array dynamically, calculate the size of a memory area needed to store it (‘bounds × element size’), dynamically allocate it and then access its elements using pointers rather than subscripting. This would have the obvious advantage of not having to preallocate the maximum sized array that could possibly be needed. The routine ‘malloc’ returns an untyped pointer into memory. For this reason, a good C compiler will complain when you attempt to assign its result to ‘pptr’ which is a pointer to a ‘point’ structure. To avoid such problems (and because it is good programming practice) you should always explicitly “cast” the returned pointer to the appropriate pointer type. No real conversion is required when going from an untyped pointer to a typed one but the use of the cast tells the compiler that you realize you have a type mismatch and are prepared to deal with the consequences. 13 Thus, we should really code: pptr = (struct point *)malloc(sizeof(point)); In C, casting is used whenever you must force an explicit type conversion. For example, we might cast an integer variable ‘i’ to a float using ‘(float)i’. Consider a final example which declares and references a linked list of integers. struct linknode { /* the linked list node structure */ int val; struct linknode *next; /* pointer to next node on list */ }; ... struct linknode *header, /* pointer to head of list */ *nodeptr; /* pointer to some list node */ ... header=0; /* zero is the Null pointer in C */ ... nodeptr=(struct linknode *)malloc(sizeof(linknode)); /* allocate one */ ... /* increment the value of the data stored in the second node on the list */ nodeptr=header->next; (*nodeptr)++; C provides a means of storing different types of data in a single storage area. A “union” (which implements this capability) is a variable which may, at different times, hold values of different types. The syntax for unions is very similar to that for structs. Thus, for example, we might code the following: 13

There are no consequences in this case.

13

union utype { int ival; float fval; char cval; } ... union utype x,y,z; ... x.ival = 27; y.fval = 32.23; z.cval = ’x’; There is no way to determine the type of the data currently stored in a union. Thus, it is the programmer’s responsibility to keep track of what is in a union at any given point in time. An easy way to do this is to associate a variable which indicates the current “type” of the union and always set it when the union is stored to. 14 If you are using Ansi C an enumeration is the perfect way to keep track of what is in a union. The combined dereferencing and member selection operator (‘−>’) also works when you have a pointer to a union. Thus, the following is legal: union utype *uptr; ... uptr->ival=77; Ansi C offers enumerations as do most modern versions of C. in the following example:

15

The use of enums is illustrated

enum boolean { false, true }; enum boolean x; ... x = true; As you would expect, the various user defined types may be combined in the normal ways. Thus you can have an array of structures (or vice-versa), arrays whose elements are enumerations, etc. In order to avoid the ugliness of saying ‘struct node x,y,z;’ or ‘enum boolean a,b;’, C provides a mechanism for defining new type names which represent user-defined types in a more aesthetically pleasing way. The typedef mechanism defines type names which may then be used in variable declarations just the way builtin types are. In its simplest form, typedef can be used to create synonyms for existing types. This is a potentially dangerous practice but its logical application (particularly in old C code) improves the readability of the code. Thus, we may choose to code: 14 15

A struct containing the union and its associated variable might be used for this. In the original ‘K&R’ C they were simulated with the assistance of the C preprocessor.

14

typedef short Short Integer; typedef char *String Pointer; ... Short Integer i,j,k; /* really just ‘short’s */ String Pointer sptr1, sptr2; /* really just ‘char *’s */ Many programmers choose to use upper case letters in their typedef’d names to make them easy to recognize. 16 In general, in a typedef declaration, the typedef name goes wherever a variable declaration would go. Thus, the following code defines two typedef names (‘Treeptr’ and ‘Treenode’) which may be used in later declarations. typedef struct tnode *Treeptr; ... typedef struct tnode { String Pointer dataptr; Treeptr leftchild; /* We get to use ‘Treeptr’ here */ Treeptr rightchild; /* and again here */ } Treenode; Notice that C allows us to reference the structure in defining a pointer to it before the structure itself is declared.

8

The C Preprocessor

The C preprocessor makes life much easier for the C programmer by making some of the more tedious tasks simpler. The preprocessor provides facilities for including source code from other files (useful in separate compilation and when using library routines) and for doing conditional compilation (which is used frequently when writing portable code). It also provides a powerful macro facility which can be very useful and which, in its simplest form, compensates rather nicely for C’s lack of proper constants. We will consider only two of the C preprocessor’s capabilities; text inclusion, and simple text substitution. An example of the text inclusion feature is where the “header” file for the standard I/O library is included using the statement: #include The file name is ‘stdio.h’ (‘h’ for a ‘h’eader file) and the fact that it is surrounded by angle brackets (‘’) indicates to the preprocessor that it is to search in the system “include file” directories to find the file. It is also possible to create your own header files and include them. If you use double quotes (‘" ... "’) instead of angle brackets, the preprocessor will search in the directory where it found the source file which is including the header file. If the header file is not found there, it will then search the system include file directories as well. It is possible to reference a header file in a directory which is “related” to the directory containing the source file doing the include as in the following example: 16

Some programmers use all uppercase letters.

15

#include "../../myheaderfile.h" When programming in C, you make use of the preprocessor’s ‘#define’ facility to do text substitution. It is important to always remember that this is a very simple-minded facility which simply replaces one text string where it finds an occurrence of another. It is thus easy to get into trouble if you are not careful. One of the most useful (and safest) ways to use this facility is to define constants. For example, we might choose to code the following if we were using an old version of C which did not support enums: #define False 0 #define True 1 ... typedef short Boolean; ... Boolean flag; ... flag = True; Notice that there are no semicolons at the end of the preprocessor statements. This is because they are not C code statements. A common beginner’s error is to include the semicolons. If this is done (‘#define True 1;’) then wherever you use ‘True’ the preprocessor will insert not ‘1’ but ‘1;’. In some cases this may result in syntax errors. Consider what would happen if you wrote: if (flag == True) ... Another common mistake is to put an equals sign between the two text strings. If you code ‘#define xyz=abc’ then the preprocessor replaces every occurrence of the string ‘xyz=abc’ with the empty string. Remember, this is simple text substitution! If you want to learn more about the C preprocessor see ‘man cpp’.

9

I/O in C

There are no I/O statements in C. All I/O is accomplished with calls to routines in the I/O library. To use the I/O library, we must include a header file which effectively describes the routines available in the library. In the case of the standard I/O library the name of this file is ‘stdio.h’ and is included by beginning your source code with the statement: #include The ‘#include’, as we have seen, is a C preprocessor statement. It is normally use the I/O functions. To quote from ‘K&R’ (ansi version):

17

required to

A “stream” is a source or destination of data that may be associated with a disk file or other peripheral. 17

Some C compilers automatically include this file for you.

16

When a program starts, it has access to three “standard” streams; ‘stdin’, ‘stdout’, and ‘stderr’ (the last being used for error reporting). When a program is run interactively (as is the normal case under Unix) these streams are connected to the user’s terminal. 18 Thus data written to ‘stdout’ or ‘stderr’ will appear on the terminal’s screen and data read from ‘stdin’ comes from the keyboard. There are special I/O routines for accessing these streams and similar, but more general ones for accessing other streams. A stream is associated with a file or device by opening it and the association is broken when the file is closed. When a file is opened, a file pointer (type ‘FILE *’) is returned. 19 Since ‘stdin’, ‘stdout’, and ‘stderr’ are streams, there is no need to open them. The following is a list of stream-oriented I/O functions and a brief discussion of what they do. For more detailed information, all the functions are documented online in section 3 of the manual. To get information on the ‘fopen’ routine, for example, type the command ‘man 3 fopen’. Note that in what follows, C considers ints and chars to be identical. FILE *fopen(char *filename,char *mode) Opens the file specified by the string pointed to by ‘filename’ for access as specified by the string pointed to by ‘mode’. int fclose(FILE *stream) Close the previously opened file associated with ‘stream’. int fputc(int ch,FILE *stream) Print the character ‘ch’ on ‘stream’. It returns the character written, or ‘EOF’ (defined in ‘stdio.h’) for error. int fgetc(FILE *stream) Reads and returns a character from ‘stream’. If end of file occurs, ‘EOF’ is returned. int fputs(char *sptr,FILE *stream) Prints the string pointed to by ‘sptr’ on ‘stream’ char *fgets(char *sptr,int n,FILE *stream) Reads at most the next ‘n-1’ characters into a character array pointed to by ‘sptr’, stopping if a newline is encountered; the newline is included in the array, which is terminated by ’\0’. It returns sptr or ‘NULL’ if end of file or an error occurs. int putc(int ch,FILE *stream) Print the character ‘ch’ on ‘stream’. int getc(FILE *stream) Reads and returns a character from ‘stream’. int ungetc(int ch,FILE *stream) “Writes” the character ‘ch’ back onto ‘stream’ where it will be the next character to be read. The character ushed back is returned unless there is an error in which case ‘EOF’ is returned. int putchar(int ch) Print the character ‘ch’ on ‘stdout’ (equivalent to ‘putc(ch,stdout)’). int getchar() Reads and returns a character from ‘stdin’ (equivalent to ‘getc(stdout)’). int puts(char *sptr) Prints the string pointed to by ‘sptr’ on ‘stdout’ char *gets(char *sptr) Reads the next input line from ‘stdin’ into the character array pointed to by ‘sptr’. It replaces the terminating newline with ’\0’. It returns ‘sptr’ or ‘NULL’ if end of file or an error occurs. 18 19

input and output redirection under Unix “redirect” these streams to files specified by the user. By convention, the terms “stream” and “file pointer” are used interchangeably.

17

int fflush(FILE *stream) Since I/O is buffered, some method of forcing output to be written is required. This routine causes any outstanding output to ‘stream’ to be “flush”ed. ‘EOF’ is returned for a write error and zero otherwise. This function is very important when debugging using “print” statements since if you don’t see all your output, you can’t tell where your program failed. You should immediately follow all debugging prints with an ‘fflush’. int fseek(FILE *stream,long offset,int origin) Allows the user to position to a particular byte offset within a file. See the man pages. long ftell(FILE *stream) Returns the current byte offset in ‘stream’. Used with ‘fseek’. See the man pages. In addition to those function outlined above, there are six additional I/O functions which are frequently used. These, unlike those previously described, support formatted I/O and have a variable number of arguments. The functions are ‘fprintf’, ‘sprintf’, ‘printf’ and ‘fscanf’, ‘sscanf’, ‘scanf’. Only ‘printf’ (which supports formatted output to ‘stdout’) and ‘scanf’ (which provides formatted input from ‘stdin’) will be described. The format of these functions is: int printf(char *format, ...) Prints the variables denoted by ‘...’ according to the format codes in the string pointed to by ‘format’. int scanf(char *format, ...) Reads the variables pointed to by ‘...’ according to the format codes in the string pointed to by ‘format’. The format codes used by these functions all begin with a ‘%’ and are followed by certain formatting characters which are selected based on the type of data you wish to read/write and the format it is in/to-assume. The most important format codes and examples of their use are summarized in the following table. For more information consult the man pages. Format Code d

Data Type int

e,f

float

c s

char string

%

N/A

Usage Examples %d – print taking necessary space %4d – print taking at least four spaces %f – print taking necessary space %e – print in scientific notation taking necessary space %6.2f – print ... with 2 digits to the right of the decimal point %c – print a character %s – print a string in the space it needs %-10s – print a string in at least 10 spaces, left justified %% – print a percent sign

It is important to remember that since ‘scanf’ is to read into the variables, you must pass it the addresses of those variables. This is another common C programming mistake. The functions ‘fprintf’ and ‘fscanf’ are the obvious generalizations of ‘printf’ and ‘scanf’. The functions ‘sprintf’ and ‘sscanf’ are extremeley useful. They operate not on streams but instead on strings. Thus, these functions provide useful mechanisms for formatting strings in memory. Consider the following examples of formatted I/O and associated discussions. Any character appearing in a ‘printf’ format string which is not a part of a format code is copied directly to the output. Non-printable characters are obtained by using the corresponding escape sequences. For example we can use ‘\n’ to get a newline in our output. printf("Hello World.\n"); 18

Format codes may appear anywhere in the format string allowing sophisticated output to be easily produced. printf("The value of a[%d] is %f7.3.\n",idx,a[idx]); Although it would not make sense to do it, we could achieve exactly the same result by coding the following: 20 char formatted string[128]; ... sprintf(formatted string,"The value of a[%d] is %f7.3.\n",idx,a[idx]); printf("%s",formatted string); To read an integer followed by a real from an input line into a ‘short’ and a ‘float’ one could use the following code: scanf("%hd %f",&short int vble, &float vble); Note the use of the address operator to pass the addresses of the variables to ‘scanf’. Also note the use of the format code ‘hd’ for reading into a short. The use of such “modified” format codes is necessary on input only. There is a corresponding modifier (‘ld’) for reading longs. ‘scanf’ will skip over blanks in the input when reading numeric values so you need not worry about exact spacing in your input. Format codes of ‘%s’ and ‘%c’ may also be used on input. When reading a string, either a length must be given (e.g. ‘%12s’) or everything to the next newline character will be read. The easiest way to learn about C I/O is by example and usage. Two simple examples are presented in what follows. This first example implements an ‘echo’ function. It is typical of unformatted, character by character I/O and simply echoes the characters it reads from ‘stdin’ to ‘stdout’. Since Unix systems are almost exclusively full duplex, this is a function that every command processor (‘shell’) must perform. 21 while (EOF!=(ch=getchar())) { putchar(ch); } A slight modification of the same program might be used to print the contents of a file in uppercase only. In this example, the file name will be hard-coded and the conversion of lowercase letters to uppercase will be done explicitly. We will see later that we could accept the filename in question as an argument to the program and that there is a library routine which does case translation for us. 20 21

Recall that in the ‘sprintf’ statement, ‘formatted string’ represents the address of the associated array. Albeit the function is greatly simplified in this example.

19

FILE *fp; ... if ((fp=fopen("/tmp/dummyfile","r")) == NULL) { printf("Error - cannot open file.\n"); } else { while (EOF!=(ch=getc(fp))) { if ((ch>=’a’) && (ch