Support for data abstraction is one half of C++, the other half is support for Object-Oriented Programming. Why someone who design a programming language would like his/her language to support data abstraction? Because it is one of the best approaches to handle the increasing complexity of software. For example, not so long ago, the only tool that people had to develop a GUI for a certain environment was a library of 500 to 800 functions and data structures. To develop the GUI you had to learn how many of these functions had to be called, used, etc. Now this is a daunting task, one function called another which updated a given data structure that resided in an other file and ... you quickly got lost. It was very easy to make errors. This is the problem with procedural programming and the library of functions it supported. When you design a GUI you would like to talk, think and code in term of scroll bars, dialog boxes, windows etc.
This is where data abstraction comes in: data abstraction means the ability to package as a type the data structures and functions which manipulate these data structures. A language supports data abstraction if it is "easy" for the user to do this. The new types defined by the user are called abstract data types (or as Stroustrup likes to call them user-defined types). One of the greatest (the greatest?) achievements of Stroustrup has been to create a language in which the user-defined types have as good (sometimes even better) support than built-in types (in every aspects, type-checking, efficiency, etc).
C++ is a big language but essentially all the features of the language can be traced to support for Data Abstraction and Object-Oriented Programming (and how to do it efficiently using all the help the compiler can give you eg strong typing). I would like to be able to define a Small C++ and forget about its "big brother" but I know it is wishful thinking. If you start with C and you want to add support for DA and OOP and you want to keep the "spirit" of C (which is flexibility and efficiency), you will end up with C++ (or if you are not as talented as Stroustrup you will almost certainly end up with much worst). In the rest of this chapter, I will show you how data abstraction is supported in C++ and introduce many of the features of the language that you have to use if you want to have useful user-defined types.
Here is the smallest "interesting" class we can start with:
class String { char * cptr; };class is a reserved word in C++, its purpose is to tell the compiler that we start the description of a user-defined type. Following the word class you have the name of the user-defined type, in our case "String" and this type contains only a pointer to a char. With this declaration you write a program like this (assuming the type String appears earlier in the file which contains the program):
void main() { String a, b; b = a; };This program defines two objects of type String a and b and b is assigned the value of String a.
The previous example shows typical uses of a new user-defined type or class in C++. You want to be able to define new objects of this class and do some operations on them. The following syntax was used to define a new object of a class in the previous example: String a (you use the class name followed by an identifier). This tells the compiler to create a new object and make it available. Except in the most trivial cases, you should not rely on the default operations of the compiler to create and initialize your objects. You should add your own constructors and destructors and this is how it's done:
class String { private: char * cptr; public: String(char *); // String constructor ~String(); // String destructor };
String::String(char * a_cptr) { if (a_cptr) { cptr = new char[strlen(a_cptr)+1]; strcpy(cptr, a_cptr); } else { cptr = new char[1]; cptr[0] = '\0'; } };
String::~String() { delete [] cptr; };The constructor and destructor are needed to insure that only well-formed (and behaved) objects are used. Since C++ allows us to define and use arbitrary class, you need a way to express how these objects are created and destroyed and this is the purpose of the constructor and destructor for a given class (a class can have many constructors but a single destructor). In our example, the constructor is passed a char * as argument. If this char * is not the nul pointer then the data member cptr is assigned the address of a new memory location that have enough room for the length of the string passed as argument (strings are implemented as pointer to character in C and are null terminated, this is the reason for the +1 in the code). After memory has been allocated, we call the standard C library function strcpy to copy the content of the string passed as argument to our new object. If the string argument is the null pointer then we allocated enough memory for a single character and copy the null terminating character to it (so that we always have string terminated by a null character).
To use the constructor above as a default constructor, we would have to change the signature of the constructor to include a default argument like this:
class String { private: char * cptr; public: String(char * = 0); //default constructor ~String(); //destructor };Now let's try, a small example (assuming the code for the String class appears ealier in the file):
main() { String a("Hello"); String b("World"); b = a; cout << b.cptr << endl; cout << a.cptr << endl; };This program compile without any error (Ouch!!! I must have been dreaming. A chance someone woke me up, check here for details). But there is a major problem with this program: after the assignment statement b = a; b points to the string "Hello" (we have copy pointers) and the memory allocated for "World" is unreachable now. Moreover, if a would happen to go out of scope before b (which is not the case in this example) the destructor for a would make the pointer in b invalid also. You definitely don't want this kind of behaviours from your objects. What you need is to define the meaning of an assignment for your objects and this is done by coding an assignment operator. The new String class becomes:
class String { private: char * cptr; public: String(char * = 0); //default constructor ~String(); //destructor String & operator=(const & from); //assignment operator };And the assignment operator can be coded like this:
String::String & operator=(const String & from) { if (this == &from) return * this; else { delete [] cptr; cptr = new char[strlen(from.cptr)+1]; strcpy(cptr, from.cptr); return * this; } };Let's look at what is going on here. We'll return to the function signature a bit later. First the member function operator=(...) (which is just an ordinary member function which uses the syntax = as its name, in C++ you can overload (ie give a different meaning) to most operators) checks to see if we are not trying to assign a variable to itself, if it is the case then you return immediatly because the normal behaviour of the assignment operator should be to release the memory for the object assigned to (the statement delete [] cptr; above) and copy the content of the from object to the object assigned to. If we don't check for self-assignment then we have deleted the memory just before we need it. Using C++, you are often in a situation where you try to make reference to the actual object you are writing the code for (the object "assigned to" in the previous discussion). To handle such a situation C++ provides a constant this pointer which always points to "this" object.
Now let's look at this function signature, it takes a single argument of type "reference to a constant String" and return a "reference to a String". The type of the argument is relatively simple to justify. C++ and C pass argument "by value" as a default rule. When you pass an argument to a function "by value", it means the function receives a copy of the actual value of the argument when the function is called. When dealing with large objects it is inefficient to use this method of communication with functions. C allows you to use pointer if you are not satisfied with the default "pass by value". C++ has introduced a third way of passing argument: "by reference". It is as efficient as passing argument using a pointer but it avoids the complications that arise from the unappealing syntax of pointers when dealing with functions overloading. Since we don't expect the value of the argument to an assignment operator to change during the assignment operation, we declared the parameter to be constant (we'll have more to say about the brave new world of constant objects later). The assignment operator is declared to return a reference to a String object. We have to do this if we want to be allowed to write multiple assignments in the same statement as in a = b = c;. This is translated in the following equivalent form by the compiler: a.operator=(b. operator=(c)). When written in this form, we see that the return value of the function b.operator(c) has to be used as input to the function a.operator=(). One way of doing this is to have the operator=() return a reference to a String object.
For our "toy" String class, the copy constructor will be declared like this and can be coded as follow:
String::String(const String &from) { cptr = new char[strlen(from.cptr)+1]; strcpy(cptr, from.cptr); };Our String class declaration now looks like this:
class String { private: char * cptr; public: String(char * = 0); //default constructor String(const String &); //copy constructor ~String(); //destructor String & operator=(const String &); //assignment operator };It would be nice if we could add two String objects together and get their concatenation as a result. We would like to be able to write a segment of code like this:
String a = "Hello"; String b = " World"; String c; String d = a + b; // d contains "Hello World"If we try to write the operator+ has a member function of the class String, we would end up with the following problem:
String::String & operator+(const String &); //assuming //this declaration of operator+ as a member String a = "Hello"; String b = " World"; String c = a; String d = c + " World"; //works fine String c = b; String d = "Hello" + c; //error
class String { friend String operator+(const String &, const String &); private: char * cptr; public: String(char * = 0); //default constructor String(const String &); //copy constructor ~String(); //destructor String & operator=(const String &); //assignment operator };And operator+(...) can be coded as follow:
String operator+(const String & a, const String & b) { String tmp; delete [] tmp.cptr; tmp.cptr = new char[strlen(a.cptr)+strlen(b.cptr)+1]; strcpy(tmp.cptr, a.cptr); strcat(tmp.cptr, b.cptr); return tmp; };We pass the two arguments as const String & to avoid some inefficient copying (in this toy example, it wouldn't matter much but for a real class it might be otherwise) but we return the function result by value! Since we have no guarantee that an object that contains the concatenation of the two arguments already exists (and this is the meaning of a reference: a new name for an already existing object). If you try to return a reference to tmp then the value returned by operator+ will be undefined because when tmp goes out of spope (which is when control returns to the caller), its destructor is called.
class String { friend String operator+(const String &, const String &); private: char * cptr; public: String(char * = 0); //default constructor String(const String &); //copy constructor ~String(); //destructor operator char * () { return cptr;} ; //conversion String to char * String & operator=(const String &); //assignment operator };So from now on, when the compiler is in the following situation:
extern int strlen(char *); String a = "Hello"; int x = strlen(a); // this is equivalent to strlen(a.operator char *())It will convert the String object a to a pointer to a char using the operation provided by operator char * ().
A natural operation on objects of type String is to try to access some of its characters. We will implement this operation for our class String, doing so, we will venture just a bit in the "brave new world" of const object in C++. Accessing the character of a String object will be implemented using the indexing operator[]. This is how it is implemented.
char & String::operator[] (int i) { return cptr[i]; } //no range checking!If you try to define a constant in C, you are going to use #define. #define is handled by the C preprocessor and the compiler is never going to see the name of the constant. Now C++ is a strongly-typed language, so you want to try to enroll the help of your compiler to make safe and legal uses of a constant, therefore C++ introduced the const keyword to handle such a case. So in a C++ program, you are going to see const int x = 5; to initialize a constant integer to 5, and after this statement is executed, you expect to compiler to warn you about an attempt to change this value. C++ hates to make difference between built-in types and user-defined types, therefore in C++, you are allowed to define a variables of any type to be a constant. If an object can be make constant then you should be able to express the fact that a particular member function is to be called on a constant object. You have to be consistent, once you have enrolled the help of the compiler to check "constness" of an integer, you are leaded to accept the fact that constant member functions should also be supported. The problem with support for constant objects is the gap between what you and me would like to see constant and what the compiler can effectively checked. Let's see a simple example. In the previous example, we return the caracter pointed by cptr[i], so it is reasonable to declare this function constant, like this (note the added const keyword after the declaration.
char & String::operator[] (int i) const { return cptr[i]; }Now, if we have the following segment of code:
const String a = "Hello"; a[0] = 'z'; //the compiler won't complain, a = "zello" nowWhat happened here? We have a constant object and the compiler let us changes the value. No, the bits inside the object a are not changed, the String object contains only a pointer and the value of this pointer is not changed. The compiler did its job. If you think this behavior is allowed because operator[] returns a reference, you are wrong. If it would return its result "by value", the previous example would not even compile. The way to fix this problem is to change the return type of operator[] to const char & a reference to a constant char. This way the compiler will not let you write a character to a reference to a constant character. And the previous segment of code will generate an error. It is easy to go over your code after learning the "benefits" of constant and add const here and there where it obviously makes sense. As the previous example showed, it is also very easy to make mistakes this way. Does the extra effort needed to learn the discipline of const worth it? I let you answer this. As far as I am concerned, C++ is consistent, C++ is strongly typed and treat user-defined types as "first class citizen". If I prefer to use const int over #define then I have to be willing to deal with constant member functions and the likes.
If you have modified your operator[] as mentioned above, you are confident that the compiler will not let you change the value of the individual character in a constant String. This is all well but now, if you don't define an operator[] for a non-constant String the const operator[] will be called on it when you try to read the caracter of a non-constant String and you'll get the following situation:
String a = "Hello"; a[0] = 'z'; //error, you are not allowed to change a constant object!If there is no operator[] defined for a non-constant object then a constant member function will do the job (it is safe to call a constant member function on a non-constant object, it is the other way around the compiler would not appreciate). Because the constant operator[] returns a reference to a constant character, you cannot change it. So you have no choice, you have to define operator[] for non-constant String, like this for example:
char & String::operator[] (int i) { return cptr[i]; } //range checking, pleaseNow that you have some feelings for the kind of support you can get from the compiler, you take a look at our String class as defined so far and you wonder if it wouldn't be a good idea to have the conversion operator defined to convert to a constant character string instead of a character string as defined right now and you propose to do the following change:
operator const char * () { return cptr; } // instead of operator char *This is a very good idea because, as defined earlier, the conversion operator was essentially a loop hole in the protection given to the field cptr, it wasn't private anymore since anyone could change the string pointed to like this, for example:
String a = "Hello"; char * any_ptr = a; any_ptr[0] = 'z'; // a = "zello"
We'll finish our presentation of data abstraction in C++ with a short example on Streams and how to use them to be able to input and output objects of String type. This discussion is a bit artificial because, given the conversion operator above we can output a String object since << knows how to display the built-in types and a String object can be converted to a built-in type (using the conversion operator just described). So we can display a String objects as easily as it would have been an object of a built-in type. As far as reading String objects from the keyboard, this is where we have to admit that our String class is a "toy" class just created to introduce the many features of C++ to support data abstraction. There is no way I can define the length of a String "a priori" and it is probably the reason why there is no built-in type string in C. So I won't show you how to read String objects from the keyboard. But I will define a new operator<< that can be adapted to your own situation.
If you try to define operator<< as a member function of the class String, you will end up having to do this:
Sting a = "Hello"; a << cout; // "Hello" is sent to the standard output deviceLook carefully, this is supposed to be the other way around but since the left operand of every operator member functions has to be an object of the class itself, you have to use this syntax (a << cout is the same as a.operator<<(cout)). This is pretty confusing so you decide to make operator<< a global function and since it will need access to the data member of the String class, you have to make it a friend. A possible implementation is as follow:
ostream & operator<< (ostream & output, const String & a) { return ouput << a.cptr; };It returns a reference for the same reason the operator= was defined to return a reference (you want to be able to write cout << a << b << endl;). cout is defined to be an object of the class ostream and this operator<< will allow you to send a String object to any object of type ostream.
class String { private: char * cptr; public: String(char * = 0); //default constructor String(const String &); //copy constructor ~String(); //destructor operator const char * () { return cptr;} ; //conversion from String //to a const char * String & operator=(const String &); //assignment operator char & operator[] (int i) { return cptr[i]; } //indexing operator const char & operator[] (int i) const { return cptr[i]; } //same friend String operator+(const String &, const String &); //concatenation friend ostream & operator<< (ostream & output, const String & a); //friend istream & operator>> (istream & output, const String & a); // left as an exercise :-) };Amazing? Intimidating? How can a simple class like a String with only a pointer to a character as a member leads us to something so complicated? It is not amazing, Stroustrup started with C and was determined to keep its efficiency but wanted to provide user-defined types the same kind of support the compiler can provide for built-in types. Even though the class looks intimidating at the beginning, it quickly become routine to write the skeleton once you have grasped the meaning of each part. Is it too complicated? A C programmer would argue that typedef char * String accomplish almost the same thing as the "big" String class above. This is true for a "toy" class like our String class but it is *definitely not* true in general. Stroustrup reports in his latest book that the most frequent criticism of C++ is "the language is too big". I have tried to convince you that starting with C and trying to create and efficient and strongly typed language which support data abstraction and object-oriented programming, you end up with C++.
Once you have digested and mastered this introduction plus all the concepts that it touched only superficially, you can be sure that you have assimilated half of C++, the other half is support for object-oriented programming that is introduced in the next chapter.
Copyright (C) 1994 Daniel Perron This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.See the GNU General Public License for more details.