CHARACTER SET IN C++
The character set in C++ comprises of all the keys we see on
our keyboard. Character set in C++ is the collection of single characters or symbols enclosed in
single quotes.
Example: ‘A’ , ‘b’ , ‘6’ , ‘#’ , etc
The character set in C++ can be divided into few sub
categories:
Letters : A to Z , a to
z
Digits : 0 to 9
Special Symbols :
space, +, -, *, /, &, %, #, $, @, ^, !, _(underscore), <, >
White spaces :
Horizontal tab (à), Return, Newline
Characters are represented using their ASCII values. ASCII
stands for American Standard Code for Information Interchange. It has assigned
a unique integer value for every character.
For example : 32 for space, 48-57 for 0 to 9, 65-90 for ‘A’ to
‘Z’, 97-122 for ‘a’ to ‘z’, etc.
A character is a 1 byte entity, i.e. it gets 8 bits of space
for storage in the memory. As we know in 2’s compliment form, the range of 8
bits space is -128 to 127, there are a total of 256 characters defined in C++.
The following tables give the ASCII codes and the corresponding characters:
As shown in the table,
there are a total of 255 characters. But these ASCII values are not the integer
values stored in memory. As char is a 1 byte variable, the maximum number it
can store is 127. ASCII above 127 are stored as negative numbers in range -128
to -1. See the further topics on this post for details.
How does compiler
differentiate between a character value and an integer value ?
As mentioned earlier, the characters in C++ are represented
using integers. Then how does compiler understand when to use an integer as
character or as an ordinary integer? Well the answer lies in the question
itself. The Compiler is the brain behind all these storing and conversions.
Whenever it encounters a character value, it marks the memory segment (1 byte)
and then converts the character’s ASCII into binary form. Whenever this segment
of memory is called, the compiler knows that the value is a character and not
an integer.
Another thing to note is that a character value takes only 1
byte of space in the memory while integer value takes 4 bytes (2 bytes in case
of older compilers).
What happens if a value
greater than 127 is stored in a char variable?
Storing a value which is out of the range of a variable is a
common mistake that programmers commit. The way the compiler reads such values
is not very tough to understand. The compiler simply converts the value to its
binary form and takes the first 8 bits from right as the legal value of the
variable. Henceforth, that value is used as the value of that variable for the
rest of the program.
Let us understand it with the help of an example :
Consider the following code (C++):
int main()
{
char a = 128,b = 257;
cout<<a<<“
”<<b;
return 0;
}
Here two character
variables are created 'a' & 'b'. 'a' is assigned with 128 while 'b' is assigned
with 257.
For a : Binary form of number 128 = (10000000)2
Now this is an 8 bit
number but note the MSB. MSB has ‘1’ which means the compiler, while retrieving
it, will consider it as a negative number (according to 2’s compliment rules).
So, the value stored in 'a' is actually -128. Nevertheless, it is still in range
and everything works fine.
For b: Binary form of
number 257 = (100000001)2
Now this is a 9 bit
binary number but compiler allocates only 8 bits for a character type variable.
So, the compiler takes the first 8 bits (starting from right) and ignores the
remaining bits.
So, the number stored
for 'b' is (00000001) and not (100000001). Thus, 'b' holds the character for the
ASCII value 1.
Do not stop here. To learn all about C++ click here
If you have any doubts, queries or any suggestions please comment.
For daily post updates subscribe to this website.




Post a Comment