Assembly language programming tutorial part 1: Getting started
By Petter Holmberg of Enhanced Creations
Edited version (original version posted in QB:tm)
Hello everyone!
This article is written for all of you who wants to learn how to program in
assembler in order to enhance your QuickBASIC programs. I know this is a
dream for many QB programmers, but they feel it's too complicated to learn
and they haven't found any good sources of information to get started with.
If you are one of these programmers, this article is written for you. You
will find that it's not easy to learn assembly language programming, but you
will probably also find that it's much easier than you first thought. This
article will not delve too deeply into assembly language programming, but it
will give you a solid start to work on.
So what is assembler then?
The early ancestors of today's computers, developed in the period of about
1940 to 1960, was a real pain to program. The circuits in these computers
could perform simple arithmetic operations, they could take data as input,
write data as output, and do other operations needed to solve problems for
the people that had built them. In order to make the computers understand
what they should do, they needed to be fed with instructions. These
instructions was given to the computers as series of codes. Let's say the
number 1 was the code for adding, the number 2 was the code for subtracting,
and the number 3 was the code for outputting the result. The programmers
would figure out a program, input it into the computer by turning switches
or making holes in paper cards and feed them to the computer. If the program
didn't work, the programmers had to go through each instruction again and
see were the error was and then reprogram the computer again. Not very
convenient, especially as the programs were all written as a series of ones
and zeroes. In order to make programming easier, they started writing the
programs as hexadecimal numbers instead of binary numbers. That changed 4
binary digits into one hexadecimal, making the programs shorter and easier
to read. (Binary and hexadecimal numbers will be explained in detail later.)
But a program was still just a sequence of numbers, hard to remember and
understand for any programmer. So someone had the great idea that they would
instead write the instructions as short words, that could be translated
directly into numbers and fed to the computer. So instead of saying 1 for an
addition, the programmers could say "add", and instead of 2 for subtraction,
they could say "sub". Now they could see more clearly what the program did,
and finding errors was not as hard anymore. The assembly language was
invented. Later on, computer engineers found out that you could actually make
programming a lot easier if you rewrote long sequences of assembly instructions into
codes much more like human language. They were called high-level programming
languages, and BASIC was one of the first ones.
Today's microprocessors still perform their duties as a series of simple
instructions, such as "add" and "sub", but programming languages like BASIC
makes sure that we usually shouldn't have to worry about it.
Why do I need to learn assembler?
There are many reasons to use a high-level language like BASIC instead of
assembler: A simple instruction such as PRINT could in assembler be more than
100 lines of code. It is therefore pretty obvious that BASIC programs are
easier to write and debug, and you don't have to worry about what the processor
actually does when it writes a letter on the screen. It just works. Another
reason to use high-level languages is that you could easilly convert your BASIC
program on your PC to work on, for example, an Amiga computer, using an Amiga
BASIC compiler. If you had wrote your program in assembler you would find that
the Amiga wouldn't understand it, because it's CPU doesn't work like a the CPU
in your PC. There are still reasons to use assembler instead of a high-level
language: QuickBASIC cannot do everything. There are sometimes things you want
to do with the computer that no BASIC instruction can do, and you often find
that your BASIC program needs to do so many calculations that the program gets
slow. The problem is that such an instruction as PRINT takes many possibilities
into account. It makes sure you have a valid string to print, it checks what
screen mode you use and what color you want to print the text in and so on.
Usually you know all these details when you want to print the text, and you
don't need the processor to perform all these checks. The only way to remove
them is to use assembler code instead of PRINT. But there's usually no point
in writing a full program in assembler. Only use it when you need to do
something really fast or something really low-level.
What do I need to know?
When you write a BASIC program, you don't really need to know much about how
the computer works. In assembler you work with the computer on it's own level,
and therefore you need to know what you're actually doing. You don't need to
know very much to get started though, and you will learn the rest as you're
learning assembler.
The first thing that you will find useful to know is how to count in the
binary and hexadecimal systems instead of the decimal. This is pretty easy to
learn.
Usually we count in the decimal system. We then have 10 numbers, ranging from
0 to 9. The lowest number we can use is 0, and as we count upwards we use the
numbers 1, 2, 3, 4, 5, 6, 7, 8 and 9. That's all the numbers we have, so in
order to continue we need to use two numbers. We reset the 9 to 0, and add a 1
to the right of it. The first number is now worth 10 times the second one. Then
we start all over again with the number to the right, counting up to 19. Now
we increase the left number to 2 and reset the right one to zero. We continue
in the same way until we reach the number 99, when all combinations of two
numbers has been used. Now we reset both of them and add a third number, worth
10 times more than the same number one position to the right and 100 times the
one two positions to the right.
This system of adding new numbers suggests that the number 1234 can be
expressed as 1*10^3 + 2*10^2 + 3*10^1 + 4*10^0. See the pattern?
What if you didn't have 10 numbers to play with? Well, it would work just as
fine anyways. The binary system, on which computer technology is based, uses
only two numbers: 0 and 1. You start counting from 0, and when you reach 1 you
have used all of your numbers and need to add a second one, and you get the
number 10. Each new number is worth 2 times the number to the right. The binary
number 10110 can thus be expressed as 1*2^4 + 0*2^3 + 1*2^2 + 1*2^1 + 0*2^0, or
22. The hexadecimal system works with 16 different numbers. Since we only have
invented 10 symbols for numbers, we use letters to represent the higher numbers.
The hexadecimal system therefore uses the numbers 0, 1, 2, 3, 4, 5, 6, 7, 8,
9, A, B, C, D, E and F. The hexadecimal number F3 can therefore be expressed
as 15*16^1 + 3*16^0, or 243. It's easier to understand if we put the three
systems in a table for comparisation:
Decimal Hexadecimal Binary
0 0 0
1 1 1
2 2 10
3 3 11
4 4 100
5 5 101
6 6 110
7 7 111
8 8 1000
9 9 1001
10 A 1010
11 B 1011
12 C 1100
13 D 1101
14 E 1110
15 F 1111
16 10 10000
As you can see, the number F in hexadecimal is the same as the number 1111 in
binary, or the number 15 in decimal. This also shows why the hexadecimal system
is often used in assembly language programming instead of the decimal. If you
want the binary number 1111000011110000, you can write it in hecadecimal as
F0F0. As you can see, it's easy to convert binary numbers to hexadecimal and
hexadecimal numbers to binary. Four binary digits corresponds directly to one
hexadecimal digit.
The number of different digits you can use is called the base of the counting
system. You can use any number as a base. If your number in any counting
system is, say, 3 digits long, it can be expressed as:
a*base^2 + b*base^1 + c*base^0, where a, b, and c are your three digits. The
most important thing when using different bases simultaneously is to keep track
of what system you use for a certain number. For example, is the number 10 the
usual decimal for 10, or the binary version of the decimal number 2?
If you still haven't understood this, read it again until you do or ask someone
who understands it to explain it to you. It's very useful to know about this
when you program in assembler.
The second thing that is necessary to know when programming in assembler is
the PC memory architecture. I'm not going to explain this in detail, because
it's a complicated issue.
A PC has 640 kilobytes of basic memory, and additional megabytes in special
memory circuits that you can insert into the computer yourself. The terms
EMS and XMS refers to this extra memory. That is not the memory I'm going to
talk about here. The interesting thing is the basic 640 kilobytes that
every PC has. You need to know how to find a certain position in the memory
if you want to use it, and you need to know how to do this if you are going
to be an assembly programmer.
Each position in the memory have an address, a number telling the computer
where to read or write data. It would have been easy if this addres would
just have been a number from 0 to 640k, but that's not the system used. A
memory position is described by two numbers, called the segment address and
the offset address. The actual memory position is a combination of the segment
and the offset address.
The segment address describes the memory as groups of 16 bytes. The first byte
in the memory, byte 0 if you like, has the segment address 0. The segment
address 1 is the 16th byte in memory, and the segment address 2 is the 32nd
byte in memory. The offset address is a number telling you how far from the
segment position in memory the byte you want is. So if you want to access byte
3 in memory, you use the segment address 0, and the offset address 3. Together
they form a number pointing at an exact memory position. Written as a formula
this can be expressed as: actual memory address = segment*16 + offset. if you
want to access byte 20 in the memory, you use the segment 1, giving you the
position 16, and the offset 4, adding 4 bytes to the position, for the final
number 20. But you can also use a segment address of 0, and the offset 20,
giving you the same memory position! The segment and the offset address
numbers can both range from 0 to 65535, giving you several possible combinations
when you want to use a certain memory position. This system makes it a little
complicated to understand memory addressing to beginners. You can see what
segment and offset a certain BASIC variable is located at by using the
functions VARSEG and VARPTR. Try it!
Now you might be wondering how it is possible for both the segment and offset
variables to be 65535. That gives you the biggest possible memory position of:
65535 * 16 + 65535 = 1114095, which is bigger than 640k. Well, this memory
certainly exists but it is not accessible as the first 640k of memory, and
I'm not going to delve deeper into this here and now. Later on, I will discuss
memory access in more detail.
Again, if you didn't understand this, read it again, and if that didn't help,
ask someone to explain it to you.
This was all for the first part of this article: A very brief introduction to
what's about to come. The next time I will start describing the basics of
assembler and how you use it in QuickBASIC. Make sure you understand the
different numbering systems and the memory addressing scheme until then.
Bye for now!
Petter Holmberg