Learning Assembly
{ If you have any comments or questions regarding this tutorial, please E-mail me }
While learning ASM, i found many
tutorials to be very confusing, and did not cover assembly in the
detail that's necessary for such a complicated programming
language as this one. So, I write this rudementary tutorial in
order to ease the pain others may have learning ASM.
The problem with most beginner level tutorials is that they
assume the reader has previous programming knowledge in one
language or another. While i'll make comments that draw
connections between programming in BASIC and ASM, i hope to write
this is such a way that you can skip these remarks without
affecting your learning, therefore making this a completely
newbie-level tutorial.
I am also very much a beginner, so I have recent memory of
learning alot of the stuff i'm gonna cover.
First off, i beleive it very difficult
to learn programming without programming as you learn.
So, i suggest you have a copy of TASM, a necessary utility for writing
assembly programs.
Also before you start, it's imperative you understand about
Hexedecimal + binary.
2.1 - Introduction to programming
[Those with programming experience in any other language may
want to ignore this section]
So what is programming anyway. Well, the basic idea is that a
computer program is made up of a bunch of "instructions"
that a computer follows. For the most part, a program is made by
typing in a bunch of instructions that make much more sense to us
than they do to the computer. Then, they are tranlsated, "compiled"
or "assembled" into a program that the computer can
understand. This is why you need to download an install the
software higher up on this page.
For our means, we can type these commands into a simple, standard
text editor such as "notepad". Actually, this is
preferred - if you use a more advanced program like Microsoft
Word, you'll have to make sure that you save it as "text
only". So, if you can, use Notepad. It's standard with all
versions of windows.
2.2 - Your first program
Open up notepad, or whatever you happen to have decided to
type with.
For a start, your programs should always have this skeleton:
.MODEL SMALL
.STACK 200H
.CODE
START:
END START
That is, all your programs should include these lines. Your
whole program will go in lines between "start" and
"end start".
(By the way, all things that are code on this site are red text.
Not necessarily all red text is code, but all code is red text.)
It's very important that if you copy these lines into your file
instead of using Copy+Paste, notice the periods at the beginng of
the first few lines. And, notice the colon after START. Even the
smallest dot is a very important peice in programming so never
overlook them.
Now, start and end start don't really mean much to a computer.
But, to use, start is the beginning of something. And end start
doesn't make a lot of logical sense to us, but that's how it
goes, so just grin and bear it. End Start tell where the end of
the part of the main program is.
But, right now, our program does absolutely nothing! So,
we may want to learn about the different commands we can use in
assembly:
2.3 - Interrupts
We can write a very simple program that puts just
a character of text on the screen using just "interrupts".
If you're familiar with any higher level languages, you can think
of interrupts as essentially commands.
Interrupts each have some complicated operation(s) they perform,
and all they require is that you give them a small amount of
information.
In this case, we'll be using an interrupt that can put text
characters on the screen. Because one interrupt may have many
other functions it can perform, we must tell it which one to do.
Then, we give it the required information, and tell it to do
whatever it may do. We can therefore do very complicated
operations while being totally oblivious to how they work. Here
is our example program:
.MODEL SMALL
.STACK 200H
.CODE
START:
Mov ah, 2
Mov dl, 1
Int 21h
mov ah, 4ch
mov al,00h
int 21h
END START
It may seem like a collection of completely
arbitrary words and numbers. Only at first. We soon realize that
it is a VERY concrete concept. Every part along the way does it's
important part. The tiny peices of code result in one big program
that does exactly what we expected. Here's a breakdown, line by
line, of what the program does:
1 ) We put the number 2 in a specific location in the computer's
memory. Later, the computer will look at this number and, in this
case, this number tell which "function number" the
interrupt should do. As mentioned before, most interrupts can do
a variety of functions. So, we must tell it which one to do. In
this case, we want the DISPLAY OUTPUT function. This is function
number 2. So, we put the number 2 in a specific place, just
waiting for the computer to look it up later
2 ) We put the number 1 in a different specific
place in memory. We've already specified that we want to use
function 2 of the specific interrupt, which is DISPLAY OUTPUT.
But what should it display. Well, different characters of text
have different number codes assigned to them (this is unrelated
to the base-whatever numbering stuff we talked about earlier,
just to let you know). This code is called A.S.C.I.I (pronounced
ass-key).
So, if we're going to be displaying text, we should specify what
text. The number 1 in the ASCII code happens to correspond to a
little smiley face.
So, after all this, we've so far established that we want the
computer to do some text displaying; The text we want to display
is a smiley face
3 ) The two peices of information we gave the
computer would be worthless if we didn't do something with it. In
this 3rd line, we tell the computer to use interrupt #21. As soon
as this happens, it looks at the place in memory called "ah"
and sees whats there, because it must know which of it's numerous
functions it should do. It ends up figuring out that it should
display text, and ultimately, it should display a smiley face.
Note that there's an "h" after the number 21. If we put an h after a number, it
means that the number is not 21 in decimal. It's 21 in
hexedecimal. Remember, don't think that this means 21 in both
cases. Think of this hexedecimal number as "Two one";
And, if we convert it to decimal, we find that it's 33.
But, it's common progarmming practice to use hexedecimal when
referring to interrupts, rather than their decimal equivilents.
So, unless you're a devout non-conformist, make it easy on
yourself and think of this as "Int twenty 21 h"
not "Int thirty three". It'll make it much easier for
you to communicate about assembly, as everyone else calls
interrupts by their hexedecimal numbers.
4 ) The final commands end the program. This is necessary at the end of the all your programs, unless you want awful things to happen. If you forget this, random effects, that will more than likely freeze up the computer, will result.
There we have it! We've effectively written a
program doing exactly what we expected from the outset.
A couple things you should note:
Firstly, the blanks line are just my style of seperating code to
make it easier to read. The assembler, which we'll
explain using in just a second, doesn't care one way or the other
if there are blank lines, as long as they don't actually hurt the
code in some way: They generally don't. you can take them out if
you don't like - You can add more at your willl - It doesn't
matter, because the only important part is the code; The commands
involved in our program.
Secondly, in Mov ah....... ah is not "10" in
hexedecimal. In this instance, it's the name of a place in memory.
It's just a coincidence. More is explained in the next section.
2.4 - The Registers
In the last section there was a strange unexplained part of it.
Primarily, these two lines:
Mov ah, 2
Mov dl, 1
First, let's explain "Mov". It appears to be
shorthand for the word "Move". This makes a lot of
sense.
This command, unlike an interrupt, does a tiny, simple command.
However, it's a very very very very very important instruction in
ASM.
The first one takes the number 2 and "Moves" it into a
place the computer explicitly calls "ah". So we can
deduce that the next command moves the number 1 into a place
called "dl". It does.
The MOV instruction can be used in other ways. For example, we
could say:
Mov ah, dl
The computer would take whatever is in dl and move it into ah.
Well, to say "move" is misleading, because it's not
actually moved. Whatever's in dl stays there. But now, it's also
in ah. Likewise, this would work:
Mov dl, ah
So what are ah and dl anyway. We know from many previous mentions
that they're specific places in memory. They're called registers.
The ones we're mainly concerned with right now are AX, BX,
CX, and DX.
They're made up of two 'peices' each - hence, smaller registers.
Ah, that we've already encountered, is one of the parts of ax. Ax
also has another part called al. the h and l in ah and al, mean
"High" and "Low". They make up the higher and
lower parts of the register ax. For example, if we did this:
Mov ah, 1
Mov al, FF
Then, if we looked at what is in ax, we would see it contained:
01FF
Why? Because the "High" part contains 1, or 01. and
the "low" part contains FF. So, combined into the
bigger register, they make 01FF
So, we conclude that many registers, or at the least the ones we
care about are made of 2 smaller parts. And, to find their
values, we combine them (Dont add them, though: 01 + FF = 100,
not 01FF)
ah + al = ax
bh + bl = bx
ch + cl = cx
dh + dl = dx
One final thing to mention - al, bl, ah, bh, and so on, can each have a value of 0 to 255. So, when combined to make ax, bx, and so on, the total value possible for those is 0 to 65,535
2.5 - Compiling our programs
We left off part 2.3, with a finished program. But, we never
actually made a program out of it. Well, to make a program is
really quite simple:
First save your program, as something like "First.asm".
Then, go to the folder where you have TASM and type this into the
address bar that you should have on your screen (unless you use
windows 3, in which case i pity you.):
Tasm.exe First First.obj
As long as your program has no problems, this will make a file
in the same directory called "first.obj". Then, type
this into the address bar:
Tlink First.obj
Finally, this will make a program called "First.exe"! Hoorah! Our first successful compile! (hopefully). Now, click on it to run it. If you have problems seeing it run because it opens and closes itself too fast... well.... enjoy!
3.1 - Memory
True, MOV, interrupts, and registers are very
important, as you just read. However, there's not a whole
lot that can be done using only them. To move on, we'll
need to understand a little bit about the computer's memory. And
to do this, we also need to just know about memory in general.
We'll first start by how memory is divided up.
This can become quite complex, so just read through slowly, and
go back over it if something confuses you.
Basically, a computer's memory is a peice of
circuitry; most of the time many peices. It has small points in
the circuits called "transistors" that can either have
an electric charge of 5v, or no charge. The millions of these
that the computer has is where is stores everything.
Taking into account our previous knowledge of binary, we remember
that in binary a digit can only be either 0 or 1. So, we could
think of either a transistor with a charge, or one with no
charge, as the same as 1 and 0 in binary. This turns out to be
true. 1 and 0 represent each transistor of memory.
Each transistor is called a "bit". This is short for
"BInary digiT".
Well, hexedecimal is also important in our discussion. You see,
if we wanted to look at memory and it was all in binary form, it
would be very cryptic - 1010111010000111001100011100011100111110....
and so on. So, to make memory easier to read, we can read it in
hexedecimal numbers
Well, recall that in hexedecimal the highest
digit is F - which has a decimal equivilent of 15. In binary,
that would take up four digits to show:
1111
is the same as F in hexedecimal.
Since our previous unit was called a "Bit", to keep in
the same naming theme, 4 bits are called a "Nibble".
Then, it just goes up from there. All in all, it works like this:
8 bits = Byte
2 Bytes = Word
2 Words = Double Word (DWORD for short)
--- The rest i'm not sure of, but i think there's names for
things up to 80bits ---
Then, for really big numbers, there's these:
1024 bytes = Kilobyte (KB)
1024 KB = Megabyte (MB)
1024 MB = Gigabyte (GB)
1024 GB = Terabyte
1024 Terabytes = Petabyte
1024 Petabytes = Exabyte
I'm sure Terabyte can be TB, petabyte PB, and exabyte EB; But, i'm never seen this used. A terabyte is such a high amount of memory that most people don't ever need to know the term.
In any case, we just need to deal with the terms "bit",
"byte", and "word". They're the ones that'll
come up most often in conversation.
As I mentioned breifly in the last section, the registers ah, al,
and so on could only have a maximum value of 255. This may seem
arbitrary at first - why not 999? Because, they can only hold one
byte. One byte is 8 bits, and the highest number we can make in
binary with 8 bits is this:
11111111
So when we put together ah and al, the highest number is 65535.
Why? Well, each register can hold 1 byte, or 2 nibbles, or 8 bits
- it all means the same thing. So, with two registers we have 2
bytes, or 4 nibbles, or 16 bits. Assuming we made the highest
possible hexedecimal number with 4 nibbles, it would look like
this:
FFFF
Punch that into your computer's calculator and convert it to
decimal, and, suprise suprise : It equals 65535
3.2 - Addressing
In order to use the computer's memory - i.e.
store numbers, text, etc - we have to understand how the computer
goes about organizing it.
It does this by something called Segments and Offsets.
These are used to communicate between ourselves and the computer,
where things should be put and where they should be gotten from.
Whenever we want to read or write to memory, we must use numbers
pointing the the exact location of the BYTE we want to read;
Specifically, 2 numbers, called the Segment and Offset.
Usually these two numbers are WORDs [16 bits each]. One points to
the general area of memory, the "Segment". And the
other, how many bytes into that segment, known as the "offset".
This way, we can use up to 64KB of memory at once [65536 bytes].
For example, say these numbers [hexedecimal] we stored somewhere
in memory:
00 AB D2 AC 98 4E 67
and so on.. Now, say we wanted to read those numbers. Well, the computer has millions of bytes of memory - so we must have some way of specifying what part of memory they're in. This is called their "Address". For a real life addres with the street name and the number. Essentially the street name is what "part" of the city you live in. We do the same with computer memory, but both are numbers. So, that data above may be located at:
FE00:0000
FE00 would be the "segment", or part.
And 0000 would be how far in the data starts. So the address FE00:0000
would "point" to the hexedecimal number 00.
In that case, FE00:0001 would point to the hexedecimal number AB.
FE00:0002 points to D2, and so on.
Bare this in mind as we cover just one more section, before
making use of what we now know about memory.
3.3 - The Register DS
The registers we've covered: AX, BX, CX, DX, and
their smaller parts, are all called General Purpose Registers.
There is another kind of register called Segment
Registers.
In this case we're discussing DS - not to be confused with
DX - which stands for "Data Segment". Segment registers
are used not suprisingly to point to segments of memory. They
aren't usually used for holding data like the General Purpose
registers are.
Going back to the Previous sections, say we wanted to print some
text to the screen. We learned on method, but that would require
that we print each individual character one after another!!
There's a better way.
There is another function of Int 21 that will print an entire
string (a string is a bunch of text characters one after another).
For the sake of simplicity, say that the text we want is stored
at FE00:0000.
This program will allow us to print it out to the screen:
.MODEL SMALL
.STACK 200H
.CODE
START:
Mov ax, fe00
Mov ds, ax
Mov dx, 0
Mov ah, 09
Int 21h
mov ah, 4ch
mov al,00h
int 21h
END START
So, what does this program do. Well, first, it
puts fe00, the segment of the text, into AX. We use the MOV
instruction to do this, which in this case 'moves' (or rather
copies) fe00 into the ax register. fe00.
But, we wanted ds to have the segment. Well, that's one quirk of
the segment registers - you're not allowed to change them
directly. So, you cant just put a number right into ds. You can,
however, put another register into them.
So, we just put the segment number first into ax. Now, we move it
into ds, thereby bypassing this charming little fact of DS.
Then we put 0 into dx. For this interrupt, it requires that we
have the segment of the text in DS and the offset in dx. Since
the offset is 0, we put 0 into dx.
Next, we put 9 into ah. Since int 21 has a lot of different
functions it can do, we must specify which on we want. The one to
print text, by specifying a segment and offset, is #9.
And finally, we use int 21 again, but this time to end the
program.
In theory, this is just great. But, memory
doesn't work like that. We generally don't just put whatever we
want, where we want. At least not at this stage.
For example, when you run a program like this one [if you were to
compile and run it, which i don't recommend], the computer picks
out a free space in memory to load the program itself. You don't
specify this. So, what have we accomplished then with Segments
and Offsets if we can't use them - we can, as you will see.
3.4 - Variables
DS was important to introduce in the previous
section, because when you write a program, you can have things
called "variables". And whatever you put in these
variables is ususally put in the "Data Segment", which
is what DS points to.
When the compiler/assembler is done changing your program into
something the computer can actually read, it doesn't actually use
variables. But they make life a lot easier for programmers.
So, what are variables and how do we use them? When, a variable
is where you can store data. Strings (text), numbers, etc.
They're called variables because, well, they can vary. Not only
can they contain a number or something like that, but you change
them as much as you need during your program. This makes programs
very much more versatile and useful.
For example, let's rewrite that last program so that it does
actually work:
.MODEL SMALL
.STACK 200H
.DATA <--------- This is a new part! Make sure to
include this
Textstring db "I'm a string$"
.CODE
START:
Mov ax, SEG Textstring
Mov ds, ax
Mov dx, OFFSET Textstring
Mov ah, 09
Int 21h
mov ah, 4ch
mov al,00h
int 21h
END START
Wow. A lot of things to explain here. Let's start
from the top downward. You'll notice there's a new part that
should be included in the beginning. The part called .DATA
declares what variables we have.
As always, the period in front of DATA is very important. Also,
make sure that .DATA comes before .CODE, because .CODE says that
everything after it is part of the code.
Again, we put the segment into ax first, since we can't move it
straight into ds. Once very convient feature of the assembler is
that we don't have to figure out the segment and offset that our
variable is at; Which is good, because as we said, the computer
decides quite randomly - it would make it tough to find where our
variables are in memory. So, by saying SEGMENT Textstring, we
move the segment of that variable into ax instad of what's
actually in the variable. The same for OFFSET Textstring. It puts
the offset of the variable textstring into the register, instead
of the actual varible.
One more unexplained part - What's with that line after .DATA:
Textstring db "I'm a string$"
Well, Textstring is the name of the variable - we
must specify the name we want to call the variable first.
Next, db, stands for "Declare Byte(s)". It can either
be used if we want our variable to be one byte long, or multiple
bytes. In this case, it's multiple bytes, because each character
of text takes up one byte.
Finally, we tell the compiler what we want to be in the variable.
This can be changed by your program, but we just tell what we
want it to start at.
One more little detail of int 21, function 9 is that the text
your printing must have a dollar sign at the end. It doesn't
actually print a dollar sign on the screen, it just tells where
the text ends.
Go ahead and compile and run this program. Unlike the last one,
it should work.