Ch2 DATA Repres 4 2023
Ch2 DATA Repres 4 2023
Now that we know binary numbers, we will learn how to add them. Binary
addition is much like your normal everyday addition (decimal addition).
For example: in decimal addition, if you add 18 + 2 you get ten, which you
write as 10; in the sum this gives a digit 0 and a carry of 1. Something similar
happens in binary addition when you add 1 and 1; the result is two (as
always), but since two is written as 10 in binary, we get, after summing 1 + 1
in binary, we geta digit 0 and a carry of 1.
Therefore in binary:
0+0=0
0+1=1
1+0=1
1 + 1 = 10 (which is 0 and carry 1)
Example. Suppose we would like to add two binary numbers 10 and 11. We
start from the last digit. Adding 0 and 1, we get 1 (no carry). That means the
last digit of the answer will be 1. Then we move one digit to the left: adding 1
and 1 we get 10. Hence, the answer is 101. Note that binary 10 and 11
correspond to 2 and 3 respectively. And the binary sum 101 corresponds to
decimal 5: is the binary addition corresponds to our regular addition.
Add 10 and 11
10 = 2
+11 = 3
=101 =5
Add 1101 and 1100
1101 = 13
+ 1100 = 12
=11001 = 25
Add 1111 and 10001
1111 =15
+10001 =17
=100000 = 32
10101 + 1001
= 11110
Add 11011 , 10011 and 100
11011 =27
10011 =19
+ 100 = 4
=110010 = 50
Similarly an unsigned integer using n bits can have 2n values between 0 and 2n -1.
2 0 22 – 1 = 3
3 0 23 – 1 = 7
4 0 24 – 1 = 15
8 0 28 – 1 = 255
16 0 216 – 1 = 65,535
32 0 232 – 1 = 4,294,967,295
64 0 28 – 1 = 18,446,744,073,709,551,615
Character Representation
In addition to numerical data, a computer must be able to handle alpha numerical
information (numbers, alphabets and other special characters). A complete alphanumeric
code would include 26 uppercase and lowercase letters each, 10 numeric digits, 7
punctuation marks, 20 to 40 other characters as #, $, & etc. So alphanumeric code
represents all the various characters that are printable from a standard keyboard.
The most widely used alphanumeric code, the American Code for Information Interchange
(ASCII), is used in most microcomputers. The ASCII (pronounced as “askee”) code is a 7-
bit code, so it has 27 = 128 possible code groups. This is more than enough to
represent all the standard keyboard charcters.
=======
First, note that the table has four columns, and that the first one differs from the remaining
three in that it contains only control characters, while the rest of the table (with the exception
of the last character in the last column) contains only printable characters. The ASCII
characters with codes in the range 0 to 31 and which are called control characters
(or control codes) are "invisible", i.e., they do not represent printable characters on the
screen. Instead they provide a way to give instructions to peripheral devices such as
printers and screens. For example, the character with code 7 (the BEL character, denoted
by Ctrl+G, or CTRL+G, or ^G) will cause the little bell on a terminal to ring when it is sent.
Normally you need not worry about the meaning of the various mnemonics for the control
characters. Only if you need to become much more familiar with the ASCII control codes
than is usually the case for the average programmer will you need to concern yourself with
such details.
The remaining characters all represent printable characters, except for the last (code 127,
or DEL), which is also a control character. The character with code 32 is a special case.
This one represents the blank space character which is, of course, invisible by its nature,
but is nevertheless regarded as a "printable" character since it moves the cursor one space
to the right, leaving a "blank space" behind.
Here are some other occasionally useful facts about the table:
• The single digits, the capital letters (A-Z) (65-90), and the lower-case letters (a-z) (97-
122) each form a contiguous group of characters. A= 65=1000000
• The group of capital letters comes before the group of lower-case letters.
• The number of code positions separating any given capital letter from its
corresponding lower-case letter is 32.
HELP in binary= ?
72 69 76 80
jPs in binary = ?
j=106= 1101010
P=80= 1010000
s=115= 1110011
pYtHoN in binary= ?
p=112=1110000
Y=89 =1011001
t=116=1110100
H=72 =1001000
o=111= 1101111
N=78 = 1001110
ISCII has not been widely used outside certain government institutions and has now been
rendered largely obsolete by Unicode. Unicode uses a separate block for each Indic
writing system, and largely preserves the ISCII layout within each block.
ISCII is an 8-bit encoding, so total 256 characters can be represented. The lower 128 code
points are plain ASCII, the upper 128 code points are ISCII-specific.
UNICODE
Fundamentally, computers just deal with numbers. They store letters and other characters
by assigning a number for each one. Before Unicode was invented, there were hundreds of
different systems, called character encodings, for assigning these numbers. These early
character encodings were limited and could not contain enough characters to cover all the
world's languages. Even for a single language like English no single encoding was adequate
for all the letters, punctuation, and technical symbols in common use.
Early character encodings also conflicted with one another. That is, two encodings could
use the same number for two different characters, or use different numbers for
the same character. Any given computer (especially servers) would need to support many
different encodings. However, when data is passed through different computers or between
different encodings, that data runs the risk of corruption.
The Unicode Standard provides a unique number for every character, no matter what
platform, device, application or language. It has been adopted by all modern software
providers and now allows data to be transported through many different platforms, devices
and applications without corruption. Support of Unicode forms the foundation for the
representation of languages and symbols in all major operating systems, search engines,
browsers, laptops, and smart phones—plus the Internet and World Wide Web (URLs,
HTML, XML, CSS, JSON, etc.).
Unlike ASCII, which was designed to represent only basic English characters, Unicode was
designed to support characters from all languages around the world. The standard ASCII
character set only supports 128 characters, while Unicode can support roughly 1,000,000
characters. While ASCII only uses one byte to represent each character, Unicode supports
up to 4 bytes for each character.
There are several different types of Unicode encodings, though UTF-8 and UTF-16 are the
most common. UTF (Unicode Transformation Format)-8 has become the standard character
encoding used on the Web and is also the default encoding used by
many software programs. While UTF-8 supports up to four bytes per character, it would be
inefficient to use four bytes to represent frequently used characters. Therefore, UTF-8 uses
only one byte to represent common English characters. European (Latin), Hebrew, and
Arabic characters are represented with two bytes, while three bytes are used to Chinese,
Japanese, Korean, and other Asian characters. Additional Unicode characters can be
represented with four bytes. The unit of coding of UTF-8 is 8 bits, called an octet. The code
for every character is termed as its codepoint