Assembly language programming tutorial part 6: Closing up
By Petter Holmberg of Enhanced Creations
Edited version (original version posted in QB:tm)

Hello and welcome to the sixth part of my assembly tutorial series. This part
is also the very last one. No!!! I hear you screaming ;-) But the reason for
this is basically that I have covered almost all of the important aspects of
assembly programming. There's more to learn- there's always more to learn, but
I think you'll have more use of other documents from now on. This series has
been concentrating on the general aspects of asm programming. Now you are
ready to start exploring stuff that interests you in particular. Maybe you
want to learn more about interrupts, or you may be interested only in I/O port
programming. There are lots of good documents to download about every aspect
of assembly programming. It's not necessary for me to explain everything.

So, what should this last part be about? For the first time in this series,
it hasn't been easy for me to decide. Therefore, this part will cover some
miscellaneous stuff, mostly about some of the general aspects of assembly
programming. I'll try to share my experiences of asm programming with you so
you don't have to all the mistakes I've done.

Numbers in assembler:
When programming in assembler, it's very important to know how numbers are
stored in the registers, the stack and the memory. It's especially important
to know the difference between positive and negative numbers and how they are
stored. I've promised to discuss this earlier, so here it comes!
We begin by looking at positive integer values, i.e. the numbers 0,1,2,..,n:
Positive integer values are the easiest ones to store. When you need to use
a positive integer value in your asm code, the first thing you want to ask is:
How high numbers do I need to store? It's easy to select the number of bits
that needs to be used for storing a certain value. If you use n bits, the
biggest number that can be stored is 2^n - 1, so if you use 8 bits, you can
store any number from 0 up to 2^8 - 1 = 255. With 16 bits you can store any
number up to 65535 and so forth. If you only need to use a number bigger than
100, you shouldn't use 16 bits to store it as 8 bits are enough.
The different bits in a binary number are numbered from right to left. The
rightmost bit is called the least significant bit and it has the number 0.
The leftmost bit is called the most significant bit. If it's an 8-bit numbered
this bit is bit no. 7.
If you want to calculate what number you get if you set a certain bit to 1,
you can calculate this with the following formula:

n = 2^b

where n is the number you want to know and b is the number of the bit. So if
you set bit 5 to one, the number you get is 2^5 = 32.

Numbers that can only be positive are called unsigned values. A signed value
is a value that can be both positive and negative. It's called signed numbers
because the most significant bit of such numbers tell the sign of the number.
If we have a positive value, the most significant bit is 0, but if we have a
negative value, the most significant bit is 1. So 0 means + and 1 means -.
This means that a signed number needs one extra bit just to store the sign of
the number, so we get a lower maximum value with a certain amount of bits if
we use signed numbers than an unsigned number. The biggest signed number that
can be stored with n bits is 2^(n-1) - 1. So if we have an 8-bit signed number,
it can have the maximum value of 2^(8-1) - 1 = 127.
The smallest signed number that can be stored with n bits is -(2^(n-1)), so
with 8 bits, you can store any negative integer down to -(2^(8-1)) = -128.
Thus, an 8-bit signed number can store all interger values from -128 to 127.
Positive values are stored in the same way no matter if the number is signed
or unsigned, but negative values are stored in an interesting form: The most
significant bit is set to 1 to indicate it's a negative number, and the rest
of the bits represents the maximum possible value with those bits minus the
absolute value of the negative number plus one. So if you have the 8-bit
binary number -10, the most significant bit will be a 1 to indicate it's
a negative number, and the rest of the bits will represent the number
127 - 10 + 1 = 118. This form of storing positive and negative numbers is
called the two's complement.
If you want to convert a number from positive to negative or negative to
positive, you could use the SUB instruction to subtract a value from 0, like
this:

SUB AX, 0

This would work fine for all numbers. But you could also use the special
assembly instruction NEG. NEG works just in the same way, but it's faster,
takes less place in the memory and it's easier to understand what happens when
you see a NEG instruction than if you see a SUB instruction. The syntax for
NEG is simply:

NEG destination

the destination operand can be either a register or a memory pointer. So if
you have the number -10 in AL and execute the instruction NEG AL, AL will
become 10.
Sometimes it's necessary to convert an 8-bit number to a 16-bit number or a
16-bit number to a 32-bit number. This is easy if you have a positive number.
For example: If you have a positive integer in AL and you want that number to
be treated as a 16-bit value in AX, you only need to make sure AH is 0.
But what about negative numbers then? How do you convert a number in the
two's complement form upwards? Luckilly, there are two instructions that does
this for you. They're called CBW and CWD, short for Convert Byte to Word and
Convert Word to Doubleword. Their syntaxes are:

CBW
CWD

Note that they don't have any input/output operands. That's because these
instructions only works with the AX register. If you have an 8-bit number that
you want to convert to a 16-bit number, you should put it in AL, execute a CBW
instruction, and you'll get the correct 16-bit value in AX, no matter if it's
a positive or a negative integer you've converted.
If you want to convert a 16-bit value to 32 bits, you should put it in AX and
execute a CWD instruction. The result will be stored in DX:AX.

When to create an assembly routine and how to do it:
A little assembly code can often mean a huge improvement for a BASIC program.
However, it's important to know when to use it and when to avoid it. There's
no use to rewrite everything in assembler. There are two reasons you may have
to write an assembly routine: It will speed up your program, or it will
enable you to do something that you cannot do in QuickBASIC.
The speed issue is the most common reason for using assembly code. It's a well
known fact that in an average program, 90% of the execution time is used to
execute 10% of the code. By rewriting those 10% of the code into assembler,
90% of the time, your program will run faster. It's important to learn how to
find those 10% of source code. Usually, it's something that is repeatedly
called by the main program loop and involves a lot of calculations. If you
manage to pinpoint the part of your program that slows it down the most, you
should consider rewriting it in assembler.
It's also a good idea to use assembler when you need to do something that is
hard to handle in QuickBASIC. One example of this was the keyboard handling
example in the previous part of this series. The keyboard functions in
QuickBASIC doesn't let you control the keyboard at the level we often need for
computer games, but with a little assembly, we get all the control we want.

When you know what you want to rewrite in assembler, you may think it's going
to be easy to implement your idea into a working routine, but you may be
surprised at how hard it is to write the routine. Transforming your ideas
directly to assembler can be very hard because you have to take so many
technical issues into account. You may find that you run out of registers,
that you start writing really unstructured code or that you simply don't know
how to do some things in assembly.
My experience in assembly programming has told me that the best way to go is
this:
First, start writing your routine in BASIC. This way, you'll be sure that you
know every step in the process of writing the routine, and you don't have to
mind about registers and other low-level stuff.
Then, start optimizing your code. Try to push your BASIC code to its limits.
Even if the code still is terribly slow, you will discover how you can make it
faster and better. If you wrote everything in assembler from the beginning,
you probably would have missed these enhancements. Put comments everywhere in
your source code so that you know what each line does.
When you have an optimized BASIC version, start rewriting it in assembler. You
shouldn't try to make the final version right from the beginning though. Start
by writing a pseudo-version, where you use variable names instead of registers
and skip some low-level stuff like setting the memory registers correctly when
you want to get a number from the memory. Make your assembly code similar to
the BASIC version, so that you know exactly what you're doing. Put LOTS of
comments into the code so that you know what every line does.
When you have your pseudo-version finished, start transforming it into a
working version. Change variable names into registers, use the correct code to
get a number from the memory and so forth. Put in even more comments, as the
code will get harder to understand. The commenting should be so extensive that
you can tell what each line of assembly code does just from looking at the
comments. Divide the source into separate parts by using spacing and
commenting, so that you can isolate the different parts of the routine just
by looking at the layout of the source code. Even if you know what you're
doing right now, you may have forgotten it after a few days. Looking at your
own asm source without understanding it is really frustrating. DO NOT try to
optimize your assembly code yet, try to make it work first. You will probably
have a lot of bugs when you first run your routine, and you must fix them all
before continuing. If you haven't hurried too much these bugs should all be
about low-level issues such as the writing of data to the wrong position in the
memory or not returning to QBASIC in the right way. They should not have to do
with the purpose of the routine itself.
Now when you have a working assembly routine, you may want to try optimizing
it even further by using smart assembly code. This should be the final step in
the creation of your routine. If you start optimizing right away, you're
risking to lose control over your coding, but if you have a working routine
without optimizations to look back on, you will still know what you're doing
and you know that your routine works even if you don't manage to optimize it
so much.
Make sure your comments still explains what the routine does. Your optimized
code will be harder to understand than the original version, so you need to
watch out.
If you follow these steps, you stand a good chance of getting a really well-
written and terribly fast assembly routine!

Optimizing your assembly code:
It's one thing to optimize a BASIC program by using assembly code, but you can
also optimize your assembly code to gain even more speed.
Optimizing assembly code is a whole science itself, and I could probably write
another tutorial series as long as this one just about asm optimization if I
knew enough about it. There's so much to say about assembly optimization that
I can only cover some of the basics here.
When optimizing an assembly routine, it's not enough to just make the separate
instructions run faster by using smarter code. Many times you also need to
look at the code as a whole and ask yourself what the code's actually doing.
Your code may run at an near-optimal speed based on what it does, but maybe
it's doing more than it has to. For example, you may have a routine that has
a very time-consuming MUL instruction inside of a loop. You manage to make it
run a lot faster by using a SHL instruction instead, and you think you've been
very clever. But if you had taken another look, you might have discovered that
you're actually doing the same multiplication over and over, and it would have
been enough to do it once before the loop and save the result in a register
for later use. By moving the MUL instruction out of the loop you would probably
have gained much more than by changing it to a SHL instruction.
Keep in mind though, that because your assembly code is so fast from the
beginning compared to BASIC code, it's often unnecessary to mind about
optimizing it. If your routine runs adequately fast for your program, it's
better to leave it unoptimized since it's more readable that way. But it can
also be really important to optimize your assembly code because it runs so
often. Concentrate on optimizing loops and other assembly code that runs
frequently. If you don't have room for all your variables within the
registers, use the stack or the memory for the values you use the least, and
save the free registers for the code inside loops.
Sometimes you need to exchange the values of two registers. If you only use
MOV to exchange them, you will need a free register for temporary storage of
one of the values. For example, if we have one value in AX and one value in BX
and we want them to switch places, we could do this:

MOV CX, AX ; Store value of AX temporarily in CX
MOV AX, BX ; AX = BX
MOV BX, CX ; BX = CX (which is the old value of AX)

If you need to do this and you have no free register, the only solution seems
to be to push a value to free a register, but there's a better way. You can
use the instruction XCHG, short for eXCHanGe, to do it. The syntax for XCHG
is:

XCHG destination, source

Where the source and destination operands can be registers or memory pointers.
However, the source and the destination cannot both be memory pointers at the
same time. With XCHG you won't need a temporary register, and thus the stack
doesn't need to be used either.

Another way to make your code faster is to avoid jumps in the code. This
includes loops. Let's suppose you want to increase the AL register four times.
The following solution seems obvious:

MOV CL, 4
IncLoop:
INC AL
LOOP IncLoop

But what's actually happening when the computer executes this code? Well, this
is how the computer sees it:

Set CL to 4
Increase AL
Decrease CL
Is CL 0? No: Jump back one line
Increase AL
Decrease CL
Is CL 0? No: Jump back one line
Increase AL
Decrease CL
Is CL 0? No: Jump back one line
Increase AL
Decrease CL
Is CL 0? Yes: Continuse execution

Now suppose that we rewrote the code into this:

INC AL
INC AL
INC AL
INC AL

When this code is executed, this is what the computer does:

Increase AL
Increase AL
Increase AL
Increase AL

As you can see, a loop isn't always the best choice. Replacing loops with
repeated code is called unrolling. If the loop is executed so many times that
it's not practical to unroll it completely, you can do a partial unrolling.
For example, if you have a loop that executes 80 times, you can change it to
a loop that executes 10 times with 8 copies of the loop code inside it.

It's not only certain instructions that makes an assembly routine slow: Wait
states, bus transfers, the prefetch queue and dynamic RAM refreshes can also
steal time, and avoiding these bad guys can be really hard.
For every new CPU that's released, assembly optimization becomes less
important, not only because the processors get faster and faster, but also
because they execute time consuming assembly instructions more efficiently.
A MUL instruction for example, isn't as slow compared to an ADD instruction
in a Pentium as it was on the 80286. Also, a Pentium CPU can execute several
assembly instructions simultaneously if they fulfill certain requirements,
such as certain positioning of the instructions in the memory and the order of
multiple operations. So it's possible to optimize asm code specifically for a
Pentium by using these new features.

Finding errors:
One of the biggest pains of writing assembly code is to find and eliminate the
bugs. Once you're done writing an assembly routine and tests it for the first
time, it almost never works properly. Since the error often hangs your
computer so that you have to restart it, you'll probably get no hint at where
the error is. All there is to do is to start going through your code and
search for bugs. This can be very hard and time-consuming. Here are a few ways
to make the debugging easier:
First of all, "execute" the code in your head and try to see what the routine
is actually doing. Write it down on paper and you may discover that you're
doing something wrong. Keeping track of the state of the registers is often
important. Maybe some things doesn't work like you thought they did. Then look
it up in a document somewhere to make sure you know what you're doing!
If you still have problems finding the errors, try to execute only a part of
your program. Take the first snippet of code, insert a JMP to the end of the
routine and return the values that you're working with to the BASIC program so
you can look at them and see if they're correct so far. Then, move down the
JMP instruction a couple of lines and see if everything still is correct. If
you continue doing this, you will eventually find out where the computer hangs
and where the error is.
Also, keep in mind that some assembly instructions doesn't work with certain
combinations of registers, memory pointers and direct values. if DEBUG won't
accept an instruction that may seem correct, it may be because you're using
a combination of input/output values that cannot be handled by that
instruction.
Also, make sure the call to and return from the assembly routine is done
correctly. Many times, I've struggled to find the errors in completely correct
assembly code that won't run, just to find out that the error was in the
CALL ABSOLUTE line in QuickBASIC. Check this before you start searching for
errors in the assembly code!

Doing nothing:
If you feel an urge to do nothing for a while, the assembly language has an
instruction just for that: NOP. The NOP instruction takes up one byte in the
memory, and when it's executed... Well, nothing happens at all!
The syntax for NOP is:

NOP

Even though it doesn't do anything, it takes up a little amount of time.
Actually, NOP is equivalent to XCHG AL, AL, which, of course, doesn't change
anyhing.
It may seem silly to have such an instruction in the assembly language, but
there's actually some use for it sometimes, for example if you want to leave
some empty space in a program for data to be initialized later on.

And that was the final thing I had to say. I hope you've enjoyed this tutorial
series and that you've managed to understand everything. Thanks to everyone
who's sent me emails with comments on and suggestions for this series. If you
want to learn more about assembly programming, there's LOTS of good info to
find on the Internet. For example: Check out:

http://cs.smith.edu/~thiebaut/ArtOfAssembly/ArtofAsm.html

I thought I could finish with a list of all the assembly instructions we've
covered in this series. This list divides them into groups based on what they
do and explains their different functions. Try to tell what they mean before
looking at the explanations and see if you know them all!

Good luck with your assembly programming!

Petter


List of assembly instructions:

Name               Meaning
-------------------------------------------------------------------

Data transfer:     
MOV                Copy values between registers and the memory
PUSH               Put values on the stack     
POP                Get values from the stack
LODS               Load string
STOS               Store string
MOVS               Move string
IN                 Read values from I/O ports
OUT                Write values to I/O ports
NOP                Do nothing

Jumps:
CALL               Call subroutine
RET                Return from subroutine
RETF               Return from subroutine in another segment
JMP                Jump to another offset address
LOOP               Perform a loop
INT                Call interrupt routine
IRET               Return from interrupt routine

Conditional:
CMP                Compare two values (using subtractions)
TEST               Compare two values (using logical AND)
JB                 Jump if Below
JBE                Jump if Below or Equal
JE                 Jump if Equal
JAE                Jump if Above or Equal
JA                 Jump if Above
JL                 Jump if Less (signed)
JLE                Jump if Less or Equal (signed)
JGE                Jump if Greater or Equal (signed)
JG                 Jump if Greater (signed)
JNB                Jump if Not Below
JNBE               Jump if Not Below or Equal
JNE                Jump if Not Equal
JNAE               Jump if Not Above or Equal
JNA                Jump if Not Above
JNL                Jump if Not Less (signed)
JNLE               Jump if Not Less or Equal (signed)
JNGE               Jump if Not Greater or Equal (signed)
JNG                Jump if Not Greater (signed)

Data manipulation:
ADD                Add two values together
SUB                Subtract value from another
INC                Increase value by one
DEC                Decrease value by one
MUL                Multiply to values together (unsigned)
IMUL               Multiply to values together (signed)
DIV                Divide value by another (unsigned)
IDIV               Divide value by another (unsigned)
NEG                Negate value (invert its sign)
SHL                Shift bits to the left (unsigned)
SHR                Shift bits to the right (unsigned)
SAL                Shift bits to the left (signed)
SAR                Shift bits to the right (signed)
ROL                Rotate bits to the left
ROR                Rotate bits to the right
AND                Logical AND (1 if both bits are 1)
OR                 Logical OR (1 if one or both bits are 1)
XOR                Logical XOR (1 if one bit is 1 and one bit is 0)
NOT                Logical NOT (invert bit)
CBW                Convert byte value to word value (signed)
CWD                Convert word value to doubleword value (signed)