On "Levels" of Programming Languages, and some History

On pages 18 and 19 of the text, there is a short discussion of the three "levels" of programming languages: machine language, assembly language, and high-level language. This note provides some additional information on this topic.

High-Level Languages

A high-level language is designed to be easily writable (and readable) by humans, but must be translated into machine language in order to be executed by a processor. Statements in high-level languages can generally be recognized by the presence of algebraic formulas or English words or both. Here are some examples:

FORTRAN:

      X = 0
      DO 37 J=1,20
   37 X = X + J

COBOL:

MULTIPLY LIST_PRICE BY DISCOUNT GIVING PRICE
MULTIPLY PRICE BY SALES_TAX_RATE GIVING TOTAL_PRICE

BASIC (an early dialect):

10 LET A7 = 50
20 LET A8 = A7+13

C++, C, C#, Java, Javascript (these language have lots of similarities):

for (i = 0; i < 17; i = i + 1)
   sum = sum + i*(i - 1);

Machine Languages

A computer program that is to be directly executed by a processor consists of a series of instructions, each of which consists of some number of bits, where each bit is either 1 or 0. In many early computers, each instruction contained the same number of bits; a group of bits that contained one instruction was called a word. Different computer designs used different word lengths. There have been computers designed with word lengths of 12, 16, 18, 24, 32, 36, 48, 60, and 64 bits. Here is a typical instruction for a DEC PDP-8 computer, which had a word length of 12 bits:
      101100101001

Now you may notice that writing machine language in this way is quickly going to get out of hand, especially with long word lengths. So two types of "shorthand" were invented: the octal system and the hexadecimal system. The octal system was popular for computers whose word length was a multiple of 3, such as 12, 16, 24, or 36. The hexadecimal system was popular for computers whose word length was a multiple of 4. It is also used with current-day processors such as those made by Intel. Instructions for Intel processors do not have a fixed length, but the length of every instruction is a multiple of 4 bits.

In the octal system, each group of three bits is encoded as one of the digits 0, 1, 2, 3, 4, 5, 6, or 7, thus:

Bits Code
000 0
001 1
010 2
011 3
100 4
101 5
110 6
111 7

In the hexadecimal system, each group of four bits is encoded as one of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, or F, thus:

Bits Code Bits Code
0000 0 1000 8
0001 1 1001 9
0010 2 1010 A
0011 3 1011 B
0100 4 1100 C
0101 5 1101 D
0110 6 1110 E
0111 7 1111 F

So let's return to our 12-bit example 101100101001. In the octal system, this would be written 5451. In the hexadecimal system, it would be written B29. (You should work out the details of these yourself!)

Assembly Language

Assembly language is intermediate between machine language and high-level language. Names are given to memory locations, like variables in a high-level language; but each line in an assembly language program corresponds to one instruction in machine language, and mnemonics are used to represent the various operations that the target processor can perform. For example, ADD A,B might indicate an instruction to add the contents of location B to the contents of location A, storing the result in location A. You will learn more about assembly languages and their relation to machine and high-level languages in CMIS 310, Computer Systems and Architecture.

A Historical Note: Why 12, 18, 24, 36, 60?

So why is it that so many early computers were designed with word lengths that are multiples of 6? The answer lies in one's perception of what computers are for. Today, we think of a computer as a text processing machine that incidentally does arithmetic. But in the early days, a computer was thought of as an arithmetic machine that incidentally had to process text, primarily to print labels on huge columns of numbers. To identify columns, you need the 26 (capital) letters of the alphabet, plus 10 digits, plus a few special characters. To encode these, six bits suffices. So for example, in a word length of 24, you could encode four characters. It simply never occurred to the early designers that there would ever be any requirement for handling lower-case characters!