Assembly language programming tutorial part 1: Getting started By Petter Holmberg of Enhanced Creations Edited version (original version posted in QB:tm) Hello everyone! This article is written for all of you who wants to learn how to program in assembler in order to enhance your QuickBASIC programs. I know this is a dream for many QB programmers, but they feel it's too complicated to learn and they haven't found any good sources of information to get started with. If you are one of these programmers, this article is written for you. You will find that it's not easy to learn assembly language programming, but you will probably also find that it's much easier than you first thought. This article will not delve too deeply into assembly language programming, but it will give you a solid start to work on. So what is assembler then? The early ancestors of today's computers, developed in the period of about 1940 to 1960, was a real pain to program. The circuits in these computers could perform simple arithmetic operations, they could take data as input, write data as output, and do other operations needed to solve problems for the people that had built them. In order to make the computers understand what they should do, they needed to be fed with instructions. These instructions was given to the computers as series of codes. Let's say the number 1 was the code for adding, the number 2 was the code for subtracting, and the number 3 was the code for outputting the result. The programmers would figure out a program, input it into the computer by turning switches or making holes in paper cards and feed them to the computer. If the program didn't work, the programmers had to go through each instruction again and see were the error was and then reprogram the computer again. Not very convenient, especially as the programs were all written as a series of ones and zeroes. In order to make programming easier, they started writing the programs as hexadecimal numbers instead of binary numbers. That changed 4 binary digits into one hexadecimal, making the programs shorter and easier to read. (Binary and hexadecimal numbers will be explained in detail later.) But a program was still just a sequence of numbers, hard to remember and understand for any programmer. So someone had the great idea that they would instead write the instructions as short words, that could be translated directly into numbers and fed to the computer. So instead of saying 1 for an addition, the programmers could say "add", and instead of 2 for subtraction, they could say "sub". Now they could see more clearly what the program did, and finding errors was not as hard anymore. The assembly language was invented. Later on, computer engineers found out that you could actually make programming a lot easier if you rewrote long sequences of assembly instructions into codes much more like human language. They were called high-level programming languages, and BASIC was one of the first ones. Today's microprocessors still perform their duties as a series of simple instructions, such as "add" and "sub", but programming languages like BASIC makes sure that we usually shouldn't have to worry about it. Why do I need to learn assembler? There are many reasons to use a high-level language like BASIC instead of assembler: A simple instruction such as PRINT could in assembler be more than 100 lines of code. It is therefore pretty obvious that BASIC programs are easier to write and debug, and you don't have to worry about what the processor actually does when it writes a letter on the screen. It just works. Another reason to use high-level languages is that you could easilly convert your BASIC program on your PC to work on, for example, an Amiga computer, using an Amiga BASIC compiler. If you had wrote your program in assembler you would find that the Amiga wouldn't understand it, because it's CPU doesn't work like a the CPU in your PC. There are still reasons to use assembler instead of a high-level language: QuickBASIC cannot do everything. There are sometimes things you want to do with the computer that no BASIC instruction can do, and you often find that your BASIC program needs to do so many calculations that the program gets slow. The problem is that such an instruction as PRINT takes many possibilities into account. It makes sure you have a valid string to print, it checks what screen mode you use and what color you want to print the text in and so on. Usually you know all these details when you want to print the text, and you don't need the processor to perform all these checks. The only way to remove them is to use assembler code instead of PRINT. But there's usually no point in writing a full program in assembler. Only use it when you need to do something really fast or something really low-level. What do I need to know? When you write a BASIC program, you don't really need to know much about how the computer works. In assembler you work with the computer on it's own level, and therefore you need to know what you're actually doing. You don't need to know very much to get started though, and you will learn the rest as you're learning assembler. The first thing that you will find useful to know is how to count in the binary and hexadecimal systems instead of the decimal. This is pretty easy to learn. Usually we count in the decimal system. We then have 10 numbers, ranging from 0 to 9. The lowest number we can use is 0, and as we count upwards we use the numbers 1, 2, 3, 4, 5, 6, 7, 8 and 9. That's all the numbers we have, so in order to continue we need to use two numbers. We reset the 9 to 0, and add a 1 to the right of it. The first number is now worth 10 times the second one. Then we start all over again with the number to the right, counting up to 19. Now we increase the left number to 2 and reset the right one to zero. We continue in the same way until we reach the number 99, when all combinations of two numbers has been used. Now we reset both of them and add a third number, worth 10 times more than the same number one position to the right and 100 times the one two positions to the right. This system of adding new numbers suggests that the number 1234 can be expressed as 1*10^3 + 2*10^2 + 3*10^1 + 4*10^0. See the pattern? What if you didn't have 10 numbers to play with? Well, it would work just as fine anyways. The binary system, on which computer technology is based, uses only two numbers: 0 and 1. You start counting from 0, and when you reach 1 you have used all of your numbers and need to add a second one, and you get the number 10. Each new number is worth 2 times the number to the right. The binary number 10110 can thus be expressed as 1*2^4 + 0*2^3 + 1*2^2 + 1*2^1 + 0*2^0, or 22. The hexadecimal system works with 16 different numbers. Since we only have invented 10 symbols for numbers, we use letters to represent the higher numbers. The hexadecimal system therefore uses the numbers 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E and F. The hexadecimal number F3 can therefore be expressed as 15*16^1 + 3*16^0, or 243. It's easier to understand if we put the three systems in a table for comparisation: Decimal Hexadecimal Binary 0 0 0 1 1 1 2 2 10 3 3 11 4 4 100 5 5 101 6 6 110 7 7 111 8 8 1000 9 9 1001 10 A 1010 11 B 1011 12 C 1100 13 D 1101 14 E 1110 15 F 1111 16 10 10000 As you can see, the number F in hexadecimal is the same as the number 1111 in binary, or the number 15 in decimal. This also shows why the hexadecimal system is often used in assembly language programming instead of the decimal. If you want the binary number 1111000011110000, you can write it in hecadecimal as F0F0. As you can see, it's easy to convert binary numbers to hexadecimal and hexadecimal numbers to binary. Four binary digits corresponds directly to one hexadecimal digit. The number of different digits you can use is called the base of the counting system. You can use any number as a base. If your number in any counting system is, say, 3 digits long, it can be expressed as: a*base^2 + b*base^1 + c*base^0, where a, b, and c are your three digits. The most important thing when using different bases simultaneously is to keep track of what system you use for a certain number. For example, is the number 10 the usual decimal for 10, or the binary version of the decimal number 2? If you still haven't understood this, read it again until you do or ask someone who understands it to explain it to you. It's very useful to know about this when you program in assembler. The second thing that is necessary to know when programming in assembler is the PC memory architecture. I'm not going to explain this in detail, because it's a complicated issue. A PC has 640 kilobytes of basic memory, and additional megabytes in special memory circuits that you can insert into the computer yourself. The terms EMS and XMS refers to this extra memory. That is not the memory I'm going to talk about here. The interesting thing is the basic 640 kilobytes that every PC has. You need to know how to find a certain position in the memory if you want to use it, and you need to know how to do this if you are going to be an assembly programmer. Each position in the memory have an address, a number telling the computer where to read or write data. It would have been easy if this addres would just have been a number from 0 to 640k, but that's not the system used. A memory position is described by two numbers, called the segment address and the offset address. The actual memory position is a combination of the segment and the offset address. The segment address describes the memory as groups of 16 bytes. The first byte in the memory, byte 0 if you like, has the segment address 0. The segment address 1 is the 16th byte in memory, and the segment address 2 is the 32nd byte in memory. The offset address is a number telling you how far from the segment position in memory the byte you want is. So if you want to access byte 3 in memory, you use the segment address 0, and the offset address 3. Together they form a number pointing at an exact memory position. Written as a formula this can be expressed as: actual memory address = segment*16 + offset. if you want to access byte 20 in the memory, you use the segment 1, giving you the position 16, and the offset 4, adding 4 bytes to the position, for the final number 20. But you can also use a segment address of 0, and the offset 20, giving you the same memory position! The segment and the offset address numbers can both range from 0 to 65535, giving you several possible combinations when you want to use a certain memory position. This system makes it a little complicated to understand memory addressing to beginners. You can see what segment and offset a certain BASIC variable is located at by using the functions VARSEG and VARPTR. Try it! Now you might be wondering how it is possible for both the segment and offset variables to be 65535. That gives you the biggest possible memory position of: 65535 * 16 + 65535 = 1114095, which is bigger than 640k. Well, this memory certainly exists but it is not accessible as the first 640k of memory, and I'm not going to delve deeper into this here and now. Later on, I will discuss memory access in more detail. Again, if you didn't understand this, read it again, and if that didn't help, ask someone to explain it to you. This was all for the first part of this article: A very brief introduction to what's about to come. The next time I will start describing the basics of assembler and how you use it in QuickBASIC. Make sure you understand the different numbering systems and the memory addressing scheme until then. Bye for now! Petter Holmberg