Vol. 4 Iss. 1 - January 2004


By Ethan Winer <http://www.ethanwiner.com>

Procedures in Assembly Language

All of the discussions so far have focused on how to write the instructions for an assembly language subroutine. However, none have described how these routines are added to a BASIC program, or how a complete procedure is defined. Furthermore, the previous examples have not shown a key step that is needed with all such external routines: establishing the code and data segments.

Before an external routine can be linked to a BASIC program you must establish a public procedure name that LINK can identify. I will first show the formal method for defining a procedure and its segments, and then show the newer, simplified methods that were introduced with MASM version 5.1. The simplified syntax is used for all of the remaining examples in this chapter [so don't worry if the setup details for this first example appear overwhelming].

The simplest complete subprogram you are likely to encounter is probably the PrtSc routine that follows--all it does is call Interrupt 5 to send the contents of the current display screen to LPT1.

Code    Segment Word Public 'Code'
Assume  CS:Code
Public  PrtSc
PrtSc   Proc Far       ;this is equivalent to SUB PrtSc STATIC in BASIC

Int  5                 ;call BIOS interrupt 5
Ret                    ;return to BASIC

PrtSc   Endp           ;this is equivalent to BASIC's END SUB
Code    Ends

The first three lines tell the assembler that the code is to be placed in the segment named Code, and that the name PrtSc is to be made public. The fourth line defines the start of a procedure. The actual code occupies the next two lines. Of course, you must tell the assembler where the procedure ends, which in this case is also the end of the code segment. Had several procedures been included within the same block of code, each procedure would show a start and end point, but there would only be a single code segment. The final End statement is needed to tell the assembler that this is the end of listing, although you might think that MASM would be smart enough to figure that out by itself!

Notice that there are two kinds of procedures: Far and Near. External routines that are called from BASIC are always Far, because BASIC uses what is called a *medium model*. This means the procedure does not necessarily have to be within the same code segment as the main BASIC program. The medium model allows the combined programs to exceed the usual 64k limit when linked to a final .EXE file.

When BASIC executes a CALL command, it uses a two-word address as the location to jump to. One of the words contains a segment, and the other an address within that segment. Then when your program finally returns, the 8088 must know to remove two words from the stack--a segment and an address--to find where to return to in the calling BASIC program.

A near procedure, on the other hand, calls an address that is only one word long. And when the procedure returns, only a single word is popped from the stack. Again, the assembler does the bulk of the dirty work for you. You just have to remember to use the word Far.

Simplified Directives

Fortunately, Microsoft realized what a pain dealing with segments and procedures and offsets from BP can be, and they enhanced MASM beginning with version 5.0 to handle these details automatically for you. Rather than require the programmer to define the various code and data segments, all that is needed are a few simple key words.

The first is .Model Medium, which tells MASM that the procedures that follow will be Far. Used in conjunction with .Code and .Data, .Model Medium tells MASM that any data you define should be placed into a group named DGROUP. Adding ,Basic after the .Model directive also declares your procedures as Public automatically, so BASIC can access them when your program is linked.

By using the name DGROUP, the linker automatically gathers all of your DB and DW data variables, and places them into the same segment that BASIC uses. While this has the disadvantage of impinging on BASIC's near data space, it also means that on entry to the routine the DS register (which BASIC sets to hold the DGROUP segment) hold the correct segment value for your variables as well.

To show the advantages of simplified directives, contrast the earlier PrtSc with this version that does exactly the same thing:

.Model Medium, Basic

PrtSc Proc
  Int 5

MASM 5.1 introduced additional simplified directives that let you access incoming parameters by name, rather than as offsets from BP. All of the remaining examples in this chapter take advantage of simplified directives, as the following revised listing for GetDrive illustrates.

;Syntax: CALL GetDrive(Drive%)

.Model Medium, Basic
   ;-- if variables were needed they would be placed here

GetDrive Proc, Drive:Word

  Mov  AH,19h      ;tell DOS we want the default drive
  Int  21h         ;call DOS to do it
  Mov  BX,Drive    ;put the address of Drive% into BX
  Cbw              ;clear AH to make a full word
  Mov  [BX],AX     ;then store the answer into Drive%
  Ret              ;return to BASIC

GetDrive Endp      ;indicate the end of the procedure
End                ;and the end of the source file

As you can see, this looks remarkably like a BASIC SUB or FUNCTION procedure, with the incoming parameter listed by name and type as part of the procedure declaration. This greatly simplifies maintaining the code, especially if you add or remove parameters during development. If incoming parameters are defined as shown here using Drive%, code to push BP and then move SP into BP is added for you automatically. When you refer to one of the parameters, the assembler substitutes [BP+##] in the code it generates. Note, however, that the Word identifier for Drive refers to the 2-byte size of its address, and not the fact that Drive% is a 2-byte integer.

Also notice the new Cbw command, which is used here to clear the AH register. Cbw (Convert Byte to Word) expands the byte value held in AL to a full word in AX. A full word is needed to ensure that both the high- and low-byte portions of Drive% are assigned, in case it held a previous value. If the value in AL is positive (between 0 and 127), AH is simply cleared to zero. And if AL is negative (between -128 and -1 or between 128 and 255), Cbw instead sets all of the bits in AH to be on. Thus, the sign of the original number in AL is preserved.

A complementary statement, Cwd (Convert Word to Double Word), converts the word in AX to a double-word in DX:AX. Again, if AX is positive when considered as a signed number, DX is cleared to zero. And if AX is currently negative, DX is set to FFFFh (-1) to preserve the sign. Cbw and Cwd are both one-byte instructions, so even with unsigned values they are always smaller and faster for clearing AH or DX than Mov AH,0 and Mov DX,0 which require two bytes and three bytes respectively.

Finally, the Ret command that exits the procedure is translated by MASM to include the correct stack adjustment value, based on the number of incoming parameters. If you have multiple exit points from the procedure (equivalent to EXIT SUB), the exit code will be generated multiple times. That is, each occurrence of Ret is replaced with a code sequence to pop the saved registers, and preform the 3-byte Ret # instruction. Therefore, you should always use a single exit point in a routine, and jump to that when you need to exit from more than one place.

Calling Interrupts

Chapter 11 explained how interrupts work, and mentioned that only assembly language can call an interrupt directly. An assembler program uses the Int instruction, and this tells the 8088 to look in the interrupt vector table in low memory to obtain the interrupt procedure's segment and address. Then the procedure is called as if it were a conventional subroutine.

All of the DOS and BIOS services are accessed using interrupts, though there are so many different services that you also have to pass a service number to many of them. Most of the DOS services are accessed through interrupt 21h. Where BASIC uses the &H prefix to indicate a hexadecimal value, assembly language uses a trailing letter H. If you specify a number without an H it is assumed by MASM to be regular decimal. Note that MASM doesn't care if you use upper- or lowercase letters, and knows that either means hexadecimal.

When specifying hexadecimal values to MASM, the first character must always be a digit. That is, 1234h is acceptable, but &HB800 must be entered as 0B800h. Using B800h will generate a syntax error.

DOS and BIOS Services

You have already seen how to call the BIOS routine that prints the screen and the DOS routine that returns the current drive. Let's continue and see how to call some of the other useful routines in the BIOS and DOS.

The next example program, DosVer, shows how to call the DOS service that returns the DOS version number. Like many of the assembler routines that you can use with BASIC, DosVer relies on an existing DOS service to do the real work. In this program you will also learn how to push and pop values on the stack.

The syntax for DosVer is CALL DosVer(Version%), where Version% returns with the DOS version number times 100. That is, if your PC is running DOS version 3.30, then Version% will be assigned the value 330. Manipulating floating point numbers is much more difficult than integers, and the added complexity is not justified for this routine.

The DOS service that retrieves the version number returns with two separate values--the major version number (3 in this case) and the minor number (30). These values are returned in AL and AH respectively. The strategy here is to first multiply AL by 100, and then add AH. The last step is to assign the result to the incoming parameter Version%.

Unfortunately, when you use AL for multiplication, the value 100 must be in a register or memory location. You can't just use MUL AL,100 though it would sure be nice if you could. Further, whenever AL is multiplied the result is placed into the entire AX register. Therefore, DosVer also uses BX to temporarily store the original contents of AX before the two are added together.

As you already have learned, the only register that can be multiplied is AX, or its low-byte portion, AL. MASM knows if you plan to multiply AX or AL based on the size of the argument. For example, Mul BX means to multiply AX by BX and leave the result in DX:AX. Mul CL instead multiplies AL by CL and leaves the answer in AX.

The complete DosVer routine is shown following, and comments explain each step.

;DOSVER.ASM, retrieves the DOS version number

.Model Medium, Basic

DOSVer Proc, Version:Word

  Mov  AH,30h      ;service 30h gets the version
  Int  21h         ;call DOS to do it

  Push AX          ;save a copy of the version for later
  Mov  CL,100      ;prepare to multiply AL by 100
  Mul  CL          ;AX is now 300 if running DOS 3.xx

  Pop  BX          ;retrieve the version, but in BX
  Mov  BL,BH       ;put the minor part into BL for adding
  Mov  BH,0        ;clear BH, we don't want it anymore
  Add  AX,BX       ;add the major and minor portions

  Mov  BX,Version  ;get the address for Version%
  Mov  [BX],AX     ;assign Version% from AX
  Ret              ;return to BASIC

DOSVer Endp

Notice the extra switch that is done with BH and BL. AX is saved onto the stack because multiplying the byte in AL leaves the result as a full word in AX, thus destroying AH. When the version is popped into BX, the minor part is in BH. But you are not allowed to add registers that are different sizes (AX and BH). Further, any number in the high half of a register is by definition 256 times the value of the same number in a low half. Therefore, BH is first copied to BL to reflect its true value. BH is then cleared so it won't affect the result, and finally AX and BX are added.

A better way to save AX and then restore it to BX would be to simply use Mov BX,AX immediately after the call to Interrupt 21h. I used Push and Pop just to show how this is done. As you can see, it is not necessary to pop the same register that was pushed. However, every Push instruction must always have a corresponding Pop, to keep the stack balanced. If a register or other value is on the stack when the final Ret is encountered, that value will be used as the return address which is of course incorrect.

Division also acts on AX, or the combination of DX:AX. When you use the command Div BL, the 8088 knows you want to divide AX because BL is a byte-sized argument. It then leaves the result in AL and the remainder, if any, is placed into AH. Similarly, Div DX means that you are dividing the long integer in DX:AX, because DX is a word. The result of this division is assigned to AX, with the remainder in DX.

Accessing BASIC strings in Assembly Language

As Chapter 2 explained, strings are stored very differently than regular numeric variables. BASIC lets you find the address of any variable with the VARPTR function. For integer or floating point numbers, the value VARPTR returns is the address of the actual data. But for strings, VARPTR instead returns the address of a string descriptor.

DOS employs a different method entirely for its strings, using a CHR$(0) to mark the end. This is describes separately later in the section "DOS Strings."

BASIC Near Strings

A BASIC string descriptor is a table containing information about the string--that is, its length and address. In Microsoft compiled BASIC a string descriptor is comprised of two words of information. For QuickBASIC and near strings when using BASIC PDS, the first word contains the length of the string and the second holds the address of the first character. Consider the following BASIC instructions:

   X$ = "Assembler"
   V = VARPTR(X$)

V now holds the starting address of the four-byte descriptor for X$. For the sake of argument, let's say that V is now 1234. Addresses 1234 and 1235 will together contain the length of X$ which is 9, and addresses 1236 and 1237 will contain yet another address--that of the first character in X$. You can therefore find the length of X$ using this formula:

   Length = PEEK(V) + 256 * PEEK(V + 1)

And the first character "A" can be located with this:

   Addr = PEEK(V + 2) + 256 * PEEK(V + 3)

You could then print the string on the screen like this:

   FOR C = Addr TO Addr + 8

Therefore, this is a BASIC model for how strings are located by an assembly language program. When you call an assembler routine with a string argument, BASIC first pushes the address of the descriptor onto the stack, before calling the routine. The next example is called Upper, because it capitalizes all of the characters in a string. Even though BASIC offers the UCASE$ and LCASE$ functions, these are relatively slow because they return a copy of the data that has been manipulated. Upper instead capitalizes the data in place very quickly.

The strategy is to first get the descriptor address from the stack. Then Upper puts the length into BX and the address of the string data into SI. Upper steps through the string starting at the end, decrementing BX by one for each character. When BX crosses zero, it is done. A BASIC version is shown first, followed by the assembly language equivalent.

Upper in BASIC:

SUB Upper(Work$) STATIC

  '-- load SI with the address of Work$ descriptor
  SI = VARPTR(Work$)

  '-- assign LEN(Work$) to BX
  BX = PEEK(SI) + 256 * PEEK(SI + 1)

  '-- the address of the first character goes in SI
  SI = PEEK(SI + 2) + 256 * PEEK(SI + 3)

  BX = BX - 1                'point to the end of Work$
  IF BX < 0 GOTO Exit        'no more characters to do
  AL = PEEK(SI + BX)         'get the current character
  IF AL < ASC("a") GOTO More 'skip conversion if too low
  IF AL > ASC("z") GOTO More 'or if too high
  AL = AL - 32               'convert to upper case
  POKE SI + BX, AL           'put character back in Work$
  GOTO More                  'go do it all again

Exit:                        'return to caller


Upper in assembly language:

Upper Proc, Work:Word

  Mov  SI,Work    ;load SI with Work$'s descriptor address
  Mov  BX,[SI]    ;put LEN(Work$) into BX
  Mov  SI,[SI+2]  ;SI holds address of the first character

  Dec  BX         ;point to the next prior character
  Js   Exit       ;if sign is negative BX is less than 0
  Mov  AL,[BX+SI] ;put the current character into AL
  Cmp  AL,"a"     ;compare it to ASC("a")
  Jb   More       ;jump if below to More
  Cmp  AL,"z"     ;compare AL to ASC("z")
  Ja   More       ;jump if above to More
  Sub  AL,32      ;convert AL to upper case
  Mov  [BX+SI],AL ;put AL back into Work$
  Jmp  More       ;jump to More

  Ret             ;return to BASIC

Upper Endp

What's Your Sign?

Notice that for expediency, these routines work backwards from the end of the string. There are a number of shortcuts that you can use in assembly language, and one important one is being able to quickly test the result of the most recent numeric operation. If the program worked forward through the string, it would take three lines of code to advance to the next character, and also require saving the string length separately:

   Inc  BX           ;point to the next character
   Cmp  BX,Length    ;are we done yet?
   Jne  More         ;no, continue

Notice the use of a new form of conditional jump--Js which stands for *Jump if Signed*. Here the code tests the sign of the number in BX, and jumps if it is negative. Though I haven't mentioned this yet, a conditional jump doesn't always have to follow a compare. Although a comparison will set the flags in the 8088 that indicate whether a particular condition is true, so will several other instructions. Some of these are Add, Sub, Dec, and Inc, but not Mov. So instead of having to include an explicit comparison:

   Dec  BX           ;decrement BX
   Cmp  BX,0         ;compare it to zero
   Jl   More         ;jump if less to More

All that is really needed is this:

   Dec  BX
   Js   More

The Dec instruction sets the Sign Flag automatically, just as if a separate compare had been performed.

Conditional Jump Instructions

Besides Je, Jne, and Js, there are a few other forms of conditional jump instructions you should understand. Figure 12-6 lists all of the ones you are likely to find useful.

Command   Meaning
อออออออ   ออออออออออออออออออออออออออออออออออออออ
  Je      Jump if equal
  Jne     Jump if not equal
  Ja      Jump if above (unsigned basis)
  Jna     Jump if not above (unsigned basis)
  Jb      Jump if below (unsigned basis)
  Jnb     Jump if not below (unsigned basis)
  Jg      Jump if greater (signed basis)
  Jng     Jump if not greater (signed basis)
  Jl      Jump if less (signed basis)
  Jnl     Jump if not less (signed basis)
  Jc      Jump if Carry Flag is set
  Jnc     Jump if Carry Flag is clear
  Js      Jump if sign flag is set
  Jns     Jump if sign flag is not set
  Jcxz    Jump if CX is zero
Figure 12-6: The 8088 conditional jump instructions.

You should know that Je and Jne also have an alias command name: Jz and Jnz. These stand for *Jump if Zero* and *Jump if Not Zero* respectively, and they are identical to Je and Jne. In fact, though I didn't mention this earlier, the Repe and Repne string repeat prefixes are sometimes called Repz and Repnz.

Because Je and Jz cause MASM to generate the identical machine code bytes, they may be used interchangeably. In some cases you may want to use one instead of the other, depending on the logic in your program. For example, after comparing two values you would probably use Je or Jne to branch if they are equal or not equal. But after testing for a zero or non-zero value using Or AX,AX you would probably use Jz or Jnz. This is really just a matter of semantics, and either version can be used with the same results.

Also, please understand that Jnb is not the same as Ja. Rather, the case of being Not Below is the same as being Above Or Equal. In fact, MASM recognizes Jae (Jump if Above or Equal) to mean the same thing as Jnb. Likewise, Jbe (Jump if Below or Equal) is the same as Jna, Jge (Jump if Greater or Equal) is the same as Jnl, and Jle (Jump if Less or Equal) is identical to Jng. Again, which form of these instructions you use will depend on how you are viewing the data and comparisons.

Note the special form of conditional jump, Jcxz. Jcxz stands for Jump if CX is Zero, and it combines the effects of Cmp CX,0 and Je label into a single fast instruction. Jcxz is also commonly used prior to a Loop instruction. When you use Loop to perform an operation repeatedly, CX must be assigned initially to the number of times the loop is to be executed. But if CX is zero the loop will execute 65536 times! Thus, adding Jcxz Exit avoids this undesirable behavior if zero was passed accidentally.

Finally, you must be aware that a conditional jump cannot be used to branch to a label that is more than 128 bytes earlier, or 127 bytes farther ahead in the code. A condition jump instruction is only two bytes, with the first indicating the instruction and the other holding the branch distance. If you need to jump to a label farther away than that you must reverse the sense of the condition, and jump to a near label that skips over another, unconditional jump:

   Cmp  AX,BX             ;we want to jump to Label: if AX is greater
   Jna  NearLabel         ;so jump to NearLabel if it's NOT greater
   Jmp  Label             ;this goes to Label: which is farther away

As used here, the unconditional Jmp instruction can branch to any location within the current code segment. There is also a short form of Jmp, which requires only two bytes of code instead of three. If you are jumping backwards in the program and the address is within 128 bytes, MASM uses the shorter form automatically. But if the jump is forward, you should specify Short explicitly: Jmp Short Label. Some non-Microsoft assemblers do not require you to specify Short; the newest MASM version 6.x also adjusts its generated code to avoid the extra wasted byte.

DOS Strings

When string information is passed to a DOS routine, for example when giving a file or directory name, the string must end with a CHR$(0). In DOS terminology this is called an ASCIIZ string. (Do not confuse this with a CHR$(26) Ctrl-Z which marks the end of a file.) Unlike BASIC, DOS does not use string descriptors, so this is the only way DOS can tell when it has reached the end. By the same token, when DOS returns a string to a calling program, it marks the end with a trailing zero byte.

When passing a string to a DOS service from BASIC you must either concatenate a CHR$(0) manually, or add extra code within the assembler routine to copy the name into local storage and add a zero byte to the copy. From BASIC you would therefore use something like this:

   CALL Routine(FileName$ + CHR$(0))

BASIC Fixed-Length Strings

Fixed-length strings and the string portion of a TYPE variable do not use a string descriptor, which you might think would require a different strategy to access them. But whenever a fixed-length string is used as an argument to an assembler routine or BASIC subprogram, BASIC first copies it into a temporary conventional string, and it is the temporary string that is passed to the routine. When the routine returns, BASIC copies the characters back into the original fixed-length string. Thus, any routine written in assembly language that expects a descriptor will work correctly, regardless of the type of string being sent.

Of course, this copying requires BASIC to generate many extra bytes of assembler code for each call. If you do not want BASIC to create a temporary string copy from one of a fixed-length, you must first define the string as a TYPE like this:

   TYPE Flen
     S AS STRING * 20
   DIM FString AS FLen

Though this appears to be the same as defining FString as a string with a fixed length of 20, there is an important difference: declaring it as a TYPE tells BASIC not to make a copy. That is, BASIC does not treat FString as a string, as long as the ".S" portion that identifies it as a string is not used. Here's an example based on the FLen TYPE that was defined above:

   DIM FString AS FLen           'FString is a TYPE variable
   FString.S = "This is a test"  'assign the string portion
   CALL Routine(FString)         'call the routine without .S

Here, the address of the first character in the string is passed to the routine, as opposed to the address of a temporary string descriptor. We have told BASIC to call Routine, and pass it the entire FString TYPE but without interpreting the .S string component. This next example does cause BASIC to create a temporary copy:

   CALL Routine(FString.X)

The short assembly language routine that follows expects the address of a fixed-length string with a length of 20, as opposed to the address of a string descriptor. The routine then copies the characters to the upper-left corner of a color monitor.

   Push BP         ;access the stack as usual
   Mov  BP,SP
   Mov  SI,[BP+6]  ;SI points to the first character
   Mov  DI,0       ;the first address in screen memory
   Mov  AX,0B800h  ;color monitor segment when in text mode
   Mov  ES,AX      ;move into ES through AX
   Mov  CX,20      ;prepare to copy 20 characters
   Cld             ;clear the direction flag to copy forward

   Movsb           ;copy a byte to screen memory
   Inc  DI         ;skip over the attribute byte
   Loop More       ;loop until done
   Pop  BP         ;restore BP
   Ret  2          ;return to BASIC

Recall that the color monitor segment value of 0B800h must be assigned to ES through AX, because it is not legal to assign a segment register from a constant. Also, notice the way that DI is cleared to zero. Although Mov DI,0 indeed moves a zero into DI, this is not the most efficient way to clear a register. Any time a numeric value is used in a program (0 in this case), that much extra space is needed to store the actual value as part of the instruction. A preferred method for clearing a register is with the Xor instruction. That is, Xor DI,DI gives the same result as Mov DI,0 except it is one byte shorter and slightly faster.

When Xor is performed on any two values, only those bits that are different are set to 1. But since the same register is used here for both operands, all of the result bits will be cleared to 0. The code for using Xor is decidedly less obvious, but you'll see Xor used this way very often in assembly listings in magazines and books. Another, equally efficient way to clear a register is to subtract it from itself using Sub AX,AX.

Far Strings in BASIC PDS

Accessing near strings in QuickBASIC and BASIC PDS is a relatively simple task, because both the descriptor and the string data are known to be in near DGROUP memory. But BASIC PDS also supports far strings, where the data may be in a different segment. The composition of a far string descriptor was shown in Chapter 2; however, you do not need to manipulate these descriptors yourself directly.

BASIC PDS includes two routines--StringLength and StringAddress--that do the work of locating far strings for you. Further, because Microsoft could change the way far strings are organized in the future, it makes the most sense to use the routines Microsoft supplies. If the layout of far string descriptors changes, your program will still work as expected.

StringLength and StringAddress expect the address of the string descriptor, and they return the string's length and segmented address respectively. Note that while far string data may be in nearly any segment, the descriptors themselves are always in DGROUP. Also note that these routines are not very well-behaved. In particular, registers you may be using are changed by the routines. To solve this problem and also to let you get all of the information in a single call, I have written the StringInfo routine. StringInfo is contained in the FAR$.ASM file on the accompanying disk.

;from an idea originally by Jay Munro
.Model Medium, Basic
  Extrn StringAddress:Proc ;these are part of PDS
  Extrn StringLength:Proc

StringInfo Proc Uses SI DI BX ES

  Pushf                    ;save the flags manually

  Push ES                  ;save ES for later
  Push SI                  ;pass incoming descriptor
  Call StringAddress       ;call the PDS routine

  Pop  ES                  ;restore ES for StringLength
  Push AX                  ;save offset and segment
  Push DX                  ;  returned by StringAddress

  Push SI                  ;pass incoming descriptor
  Call StringLength        ;get the length
  Mov  CX,AX               ;copy the length to CX

  Pop  DX                  ;retrieve the saved Segment
  Pop  AX                  ;and the address

  Popf                     ;restore the flags manually
  Ret                      ;restore registers and return

StringInfo Endp

StringInfo is called with DS:SI pointing to the string descriptor, and it returns the length in CX and the address of the string data in DX:AX. Although StringInfo could be designed to return the segment in DS or ES, it is safer to assign the segment registers yourself manually.

Notice the Uses clause--this tells MASM that the named registers must be preserved, and generates additional code to push those registers upon entry to the procedure, and pop them again upon exit.

Also notice the new Extrn directive at the beginning of the source file. These tell the assembler that the stated routines are not in the current source file. MASM then places the external name in the object file header, with instructions to LINK to fill in the address portion of the Call. Data must also be declared as external if it is not in the same source file as the routine being assembled. When a data item is to be made available to other modules, you must also have a corresponding Public statement in that file for the same reason:

   .Model Medium, Basic
     Public MyData
     MyData DW 12345

Accessing Arrays

As you have seen, a conventional variable is passed to an assembly language subroutine by placing its address onto the stack. If the variable is a string, then the address passed is that of its descriptor, and the string data address is read from there. Accessing array elements is only slightly more involved, because array elements are always stored in adjacent memory locations. Let's look first at integer arrays.

When BASIC encounters the statement DIM X%(100) in your program, it allocates a contiguous block of memory 202 bytes long. (Unless you first used the statement OPTION BASE 1, dimensioning an array to 100 means 101 elements.) The first two bytes in this block hold the data for X%(0), the next two bytes hold X%(1), and so forth. When you ask VARPTR to find X%(0), the address it returns is the start of this block of memory.

The address of subsequent array elements may then be easily computed from this base address. But with a dynamic array, the segment that holds the array may not be the same as the segment where regular variables are stored. Also, huge arrays that span more than 64K require extra care when crossing a 64K segment boundary.

String arrays are structured in a similar fashion, in that each element follows the previous one in memory. For each string array element that is dimensioned, four bytes are set aside. These bytes comprise a table of descriptors which contain the length and address words for each element in the array. But the important point is that once you know where one element or string descriptor is located, it is easy to find all of those that are adjacent. Following is a QuickBASIC example that shows how to locate Array$(15), based on the VARPTR address of Array$(0).

DIM Array$(100)
Array$(15) = "Find me"

Descriptor = VARPTR(Array$(0))
Descriptor = Descriptor + (4 * 15)

Length = PEEK(Descriptor) + 256 * PEEK(Descriptor + 1)
PRINT "Length ="; Length

Addr = PEEK(Descriptor + 2) + 256 * PEEK(Descriptor + 3)
PRINT "String = ";
FOR X = Addr TO Addr + Length - 1

Dynamic Arrays

Most of the routines shown so far manipulated variables that are located in near memory. BASIC can store numeric, TYPE, and fixed-length string arrays in far memory, and additional steps are needed to read from and write to those arrays.

When an assembly language routine receives control after a call from BASIC, it can access your regular variables because they are in the default data segment. Most memory accesses assume the data is in the segment held in the DS register. For example, the statement Mov [BX],AX assigns the value in AX to the memory location identified by BX within the segment held in DS. Likewise, Sub [DI+10],CX subtracts the value held in CX from the memory address expressed as DI+10, where that address is again in the default data segment.

It is also possible to specify a segment other than the current default. One way is with a *segment override* command, like this:

   Mov ES:[BX],AX

Here, the segment held in ES is used instead of DS. A segment override adds only one byte of code, so it is quite efficient. If you plan to access data in a different segment many times, you can optionally set DS to that segment. However, it is mandatory that you reset DS to its original value before returning to BASIC. You must also understand that changing DS means you no longer have direct access to DGROUP anymore. In that case you could use the stack segment as an override, since the stack segment is always the same as the data segment in a BASIC program. The next short example shows this in context.

   Push DS                ;save DS
   Mov DS,FarSegment      ;now DS points to your far data
    .                     ;access that far data here
   Mov AX,SS:[Variable]   ;access Variable in DGROUP
    .                     ;access more far data here
   Pop DS                 ;restore DS before returning

When Microsoft introduced QuickBASIC version 2.0, one of the most exciting new features it offered was support for dynamic numeric arrays. Unlike QuickBASIC near strings, string arrays, and non-array variables, these arrays are always located outside of BASIC's near 64K data segment. This means that an assembler routine needs some way to know both the address and the segment for an array element that is passed to it.

In general, routines you design that work on an entire array will be written to expect a particular starting element. The routine can then assume that all of the subsequent elements lie before or after it in memory. Unfortunately, this does not always work unless you add extra steps. If you call an assembly language routine passing one element of a far-memory dynamic array like this:

   CALL Routine(Array(1))

BASIC makes a copy of the array element into a temporary variable in near memory, and then passes the address of that copy to the routine. Thus, while the routine can still receive an array element's value, it has no way to determine its true address. And without the address, there is no way to get at the rest of the array.

Since being able to pass an entire array is obviously important, BASIC supports two options to the CALL command--SEG and BYVAL. The SEG keyword indicates that both the address and the segment are to be passed on the stack, and it also tells BASIC not to make a copy of the array element. SEG is used with an array element (or any variable, for that matter) like this:

   CALL Routine(SEG Array%(1))

You could also send the segment and address manually, like this:

   CALL Routine(BYVAL VARSEG(Array%(1)), BYVAL VARPTR(Array%(1)))

In both cases, BASIC first pushes the segment where the element resides onto the stack, followed by the element's address within that segment. By pushing them in this order the routine can conveniently use either Lds (Load DS) or Les (Load ES) to get both the segment and address in one operation:

   Les DI,[BP+6]       ;if using manual stack addressing


   Les BX,[StackArg]   ;if using MASM's simplified directives

Les loads four bytes in one operation, placing the lower word at [BP+6] into the named register (DI in the first example case), and the higher word at [BP+8] into ES. Lds works the same, except the higher word is instead moved into DS. Once the segment and address are loaded, you can access all of the array elements:

   Push DS              ;save DS
   Lds  SI,[BP+6]       ;now DS:SI points at first element
   Mov  [SI],AX         ;assign Array%(1) from AX
   Add  SI,2            ;now SI points at the next element
   Mov  [SI],BX         ;assign Array%(2) from BX
   Pop  DS              ;restore DS
    .                   ;continue

If Les were used instead of Lds, then an ES: override would be needed to assign the elements. Although you must always preserve the contents of DS regardless of the version of BASIC, some registers need to be saved only when using BASIC PDS far strings. Other registers do not need to be saved at all. Figure 12-7 shows which registers must be preserved based on the version of BASIC.

 QuickBASIC and       BASIC PDS
PDS near strings     far strings
อออออออออออออออ      ออออออออออ
      DS                 DS
      SS                 SS
      BP                 BP
      SP                 SP
Figure 12-7: The registers that must be preserved in an assembly language subroutine.

Besides having to save and restore the registers shown in Figure 12-7, you must also be sure that the Direction Flag is cleared to forward before returning to BASIC. The Direction Flag affects the 8088 string operations, and is by default set to forward. You can usually ignore the direction flag unless you set it to backwards explicitly with the Std instruction. In that case, you must use a corresponding Cld command.

Huge Arrays

A huge array is one that spans more than one 64K segment, and as you can imagine, it requires extra steps to access all of the elements. That is, the assembler routine must know which elements are in what segment, and manually load those segments as needed. The following code fragment shows how to walk through all of the elements in a huge integer array, and just for the sake of the example adds each element to determine the sum of all of them.

A simple setup example and call syntax for this routine is as follows:

   REDIM Array&(1 TO 30000)
   FOR X% = 1 TO 30000
     Array&(X%) = X%

   CALL SumArray(SEG Array&(1), 30000, Sum&)
   PRINT "Sum& ="; Sum&

And here's the code for the SumArray routine:

.Model Medium, Basic

SumArray Proc Uses SI, Array:DWord, NumEls:Word, Sum:Word

  Push DS          ;save DS so we can restore it later
  Push SI          ;PDS far strings require saving SI too

  Xor  AX,AX       ;clear AX and DX which will accumulate
  Mov  DX,AX       ; the total

  Mov  BX,NumEls   ;get the address for NumElements%
  Mov  CX,[BX]     ;read NumElements% before changing DS
  Lds  SI,Array    ;load the address of the first element
  Jcxz Exit        ;exit if NumElements = 0

  Add  AX,[SI]     ;add the value of the low word
  Adc  DX,[SI+2]   ;and then add the high word
  Add  SI,4        ;point to the next array element

  Or   SI,SI       ;are we beyond a 32k boundary?
  Jns  More        ;no, continue

  Sub  SI,8000h    ;yes, subtract 32k from the address
  Mov  BX,DS       ;copy DS into BX
  Add  BX,800h     ;adjust the segment to compensate
  Mov  DS,BX       ;copy BX back into DS

  Loop Do          ;loop until done

  Pop  SI          ;restore SI for BASIC
  Pop  DS          ;restore DS and gain access to Sum&
  Mov  BX,Sum      ;get the DGROUP address for Sum&
  Mov  [BX],AX     ;assign the low word
  Mov  [BX+2],DX   ;and then the high word

  Ret              ;return to BASIC

SumArray Endp

The segment bounds checking is handled by the six lines that start with Or SI,SI. The idea is to see if the address is beyond 32767, subtract 32768 if it is, and then adjust the segment to compensate. The most direct way would have been with Cmp SI,32767 and then Ja More, but Cmp used this way generates three bytes of code, whereas Or creates only two bytes. Since Or sets the Sign flag if the number is negative (above 32767), you can use it to know when the address adjustment is needed.

Because it is not legal to add or subtract a segment register, DS is first copied to BX, 800h is added to that, and the result is then copied back to DS. 800h is used instead of 8000h (32768) because a new segment begins every 16 bytes. [That is, adding 800h to a segment value is the same as adding 8000h to the address.]

SumArray also introduces a new instruction: Adc means Add with Carry, and it is used to add long integer values that by definition span two words. When you add two registers--say, AX and BX--if the result exceeds 65535 only the remainder is saved. However, the Carry Flag is set to indicate the overflow condition. Adc takes this into account, and adds one extra to its result if the Carry Flag is set. Therefore, whenever two long integers are added you'll use Add to combine the lower words, and Adc for the high words. Similarly, subtracting long integers requires that you use Sub to subtract the lower words and then Sbb (Subtract with Borrow) on the upper words.

Although the details are hidden from you, when more than one parameter is passed to an assembly language routine it is the last in the list that is at [BP+6] on the stack. The previous argument is at [BP+8], and the one before that is at [BP+10]. Because the stack grows downward as new items are pushed onto it, each subsequent item is at a lower address.

Finally, in a real program this routine would probably be designed as a function. Using a function avoids having to pass the Sum& parameter to receive the returned value, and helps reduce the size of the program.

Assembler Functions

Designing a procedure as a function lets you return information to a program, but without the need for an extra passed parameter. Functions are also useful because BASIC performs any necessary data type conversion automatically. For example, if you have written a function that returns an integer value, you can freely assign the result to a single precision variable.

You can also test the result of a function directly using IF, display it directly with PRINT, or pass it as a parameter to another procedure. Some typical examples are shown here:

   SingleVar! = MyFunction%

   IF YourFunction&(Argument%) > 1004 THEN ...

   PRINT HisFunction$(Any$)

Beginning with QuickBASIC version 4.0, functions written in assembly language may be added to a BASIC program. To have a function return an integer value, simply place the value into the AX register before returning to BASIC. If the function is to return a long integer, both DX and AX are used. In that case, DX holds the higher word and AX holds the lower one.

String Functions

String functions are only slightly more complicated to design. A string function also uses AX as a return value, but in this case AX holds the address of a string descriptor you have created. The complete short string function that follows accepts an integer argument, and returns the string "False" if the argument is zero or "True" if it is not.

;DECLARE FUNCTION TrueFalse$(Argument%)
;Answer$ = TrueFalse$(Argument%)

.Model Medium, Basic
  DescLen DW 0
  DescAdr DW 0
  True    DB "True"
  False   DB "False"

TrueFalse Proc, Argument:Word

  Mov  DescLen,4            ;assume true
  Mov  DescAdr,Offset True

  Mov  BX,Argument          ;get the address for Argument%
  Cmp  Word Ptr [BX],0      ;is it zero?
  Jne  Exit                 ;no, so we were right
  Inc  DescLen              ;yes, return five characters
  Mov  DescAdr,Offset False ;and the address of "False"

  Mov  AX,Offset DescLen    ;show where the descriptor is
  Ret                       ;return to BASIC

TrueFalse Endp

Although the function is declared using a dollar sign in the name, the actual procedure omits that. [The dollar sign merely tells BASIC what type of information will be returned. It is not part of the actual procedure name.] TrueFalse begins by defining a string descriptor in the .Data segment. It is also possible to store strings and other data in the code segment and access it with a CS: segment override. However, data that is returned as a function must be in DGROUP, and so must the descriptor.

The first two statements assign the descriptor to an output string length of four characters, and the address of the message "True". Then, the address of Argument is obtained from the stack, and its value is compared to zero. If it is not zero, then the descriptor is already correct and the function can proceed. Otherwise, the descriptor length is incremented to reflect the correct length, and the address portion is reassigned to show where the string "False" begins in memory. In either case, the final steps are to load AX with the address of the descriptor, and then return to BASIC.

MASM also lets you access data using simple arithmetic. For example, the descriptor could have been defined as a single pair of words with one name, and the second word could be accessed based on the address of the first one like this:

     Descriptor DW 0, 0
     True       DB "True"
     False      DB "False"

     Inc  Descriptor
     Mov  Descriptor+2,Offset False

Far String Functions

Far string functions require more work to write than near string functions, because of the added overhead needed to support far strings. Fortunately, BASIC includes routines that simplify the task for you. Actually, the routines to create and assign strings have always been included; it's just that Microsoft never documented how to do it before BASIC 7.0. Later in this chapter I'll show code to create strings that works with all versions of BASIC 4.0 or later.

The StringAssign routine expects six arguments on the stack, for the segment, address, and length of both the source and destination strings. StringAssign can assign from or to any combination of fixed- and variable- length strings. If the length argument for either string is zero, then StringAssign knows that the address is that of a descriptor. Otherwise, the address is of the data in a fixed-length string.

Because of the added overhead of obtaining values and pushing them on the stack, I have created a short wrapper program that does this for you. MakeString accepts the same arguments as StringAssign, but they are passed using registers rather than on the stack. Of course, calling one routine that in turn calls another takes additional time. But the savings in code size when MakeString is called repeatedly will overshadow the very slight additional delay.

MakeString is called with DX:AX holding the segmented address of the source string, and CX holding its fixed length. If the source is a conventional string, CX is set to zero to indicate that. The destination address is identified with DS:DI, using BX to hold the length. Again, BX holds zero if the destination is not a fixed-length string.

;from an idea originally by Jay Munro
.Model Medium, Basic

MakeString Proc Uses DS

  Push DX           ;push the segment of the source string
  Push AX           ;push the address of the source string
  Push CX           ;push the string length
  Push DS           ;push the segment of the destination
  Push DI           ;push the address of the destination
  Push BX           ;push the destination length

  Call STRINGASSIGN ;call BASIC to assign the string

MakeString Endp

Now, with the assistance of MakeString, TrueFalse$ can be easily modified to work with BASIC 7 far strings:

.Model Medium, Basic
  Extrn MakeString:Proc        ;this is in FAR$.ASM

  Descriptor DW 0, 0           ;the output string descriptor
  True       DB "True"
  False      DB "False"

TrueFalse Proc Uses ES DS SI DI, Argument:Word

  Mov  CX,4             ;assume true
  Mov  AX,Offset True

  Mov  BX,Argument      ;get the address for Argument%
  Cmp  Word Ptr [BX],0  ;is it zero?
  Jne  @F               ;no, so we were right

  Inc  CX               ;yes, assign five characters
  Mov  AX,Offset False  ;and use the address of "False"

  Mov  DX,DS                ;assign the segment and address
  Mov  DI,Offset Descriptor ;  of the destination descriptor
  Xor  BX,BX                ;assign to a descriptor
  Call MakeString           ;let MakeString do the work

  Mov  AX,DI            ;AX = address of output descriptor
  Ret                   ;return to BASIC

TrueFalse Endp

Notice the introduction of the new at-symbol (@) assembler directive. The at-symbol and double at-symbol label are quite useful, because they let you avoid having to create unique label names each time you specify the target of a jump. As with BASIC, creating many different label names is a nuisance, and also impinges on the assembler's working memory. When a label is defined using @@: as a name, you can jump forward to it using @F or backwards using @B. Multiple @@: labels may be used in the same program, and @F and @B always branch to the nearest one in the stated direction.

Floating Point Functions

Single and double precision functions are handled in yet another manner. Although a single precision value could be returned in the DX:AX register combination, a double precision result would need four registers, which is impractical. Further, a floating point number is most useful to BASIC if it is stored in a memory location, rather than in registers.

When BASIC invokes a floating point function it adds an extra, dummy parameter to the end of the list of arguments you pass. If no parameters are being used, it creates one. This parameter is the address into which your routine is to place the outgoing result. Because of this added parameter, it is essential that you account for it when returning to BASIC. Thus, a function without arguments must use Ret 2, a function with one argument needs Ret 4, and so forth. Since we're using MASM's simplified directives, all that is needed is to create an extra parameter name.

The short double precision function that follows squares a double precision number much faster than using Value# ^ 2, and also shows how to perform simple floating point math using assembly language. You will declare and invoke Square like this:

   DECLARE FUNCTION Square#(Variable#)
   Result = Square#(Variable#)
;SQUARE.ASM, squares a double precision number
;WARNING: This file must be assembled using /e (emulator).

.Model Medium, Basic
.8087                   ;allow 8087 instructions

Square Proc, InValue:Word, OutValue:Word

  Mov  BX,InValue       ;get the address for InValue
  FLd  QWord Ptr [BX]   ;load InValue onto the 8087 stack
  FMul QWord Ptr [BX]   ;multiply InValue by itself

  Mov  BX,OutValue      ;get the address for OutValue
  FStp QWord Ptr [BX]   ;store the result there
  FWait                 ;wait for the 8087 to finish

  Mov  AX,BX            ;return DX:AX holding the full
  Mov  DX,DS            ;  address of the output value
  Ret                   ;return to BASIC

Square Endp

This Square function illustrates several important points. The first is the use of MASM's /e switch, which lets an assembly language routine share BASIC's floating point emulator. When a BASIC program begins, it looks to see if an 8087 coprocessor is installed in the host PC. If so, it uses one set of library routines; otherwise it uses another.

The library routines that use an 8087 simply modify the caller's code to change the floating point interrupts that BASIC generates into actual 8087 instructions. It then returns to the instruction it just created and executes it. Although this adds to the time needed to perform a floating point operation, the code is patched only once. Thus, statements within a FOR or DO loop operate very quickly after the first iteration. This is very much like the method used by the BRUN library described in Chapter 1.

When no coprocessor is detected, the floating point interrupts that BASIC generates are used to invoke routines in BASIC's floating point software emulator. As its name implies, an emulator imitates the behavior of a coprocessor using assembly language commands. A coprocessor can perform a variety of floating point operations, including addition, multiplication, and rounding, as well as some transcendental functions such as logarithms and arctangents.

When you use the /e switch, MASM adds extra information to the object file header that tells LINK where to patch your 8087 instructions. LINK can then change your code to the equivalent floating point interrupts, similar to the way BASIC patches its own code to change the interrupts to 8087 instructions. Therefore, when you write floating point code that will be called from BASIC, your routine can tie into BASIC's emulator, and use it automatically if no coprocessor is installed.

Also, notice the .8087 directive which tells MASM not to issue an error message when it sees those instructions. Other, similar directives are .80287 and .80387, and also .80286 and .80386. These directives inform MASM that you are intentionally using advanced commands that require these processors, and have not made a typing error.

The actual body of the Square function is fairly simple. First, the address of the incoming value is retrieved from the system stack, and then the data at that address is loaded onto the coprocessor's stack using the FLd (Floating point Load) instruction. Since this is a double precision value, QWord Ptr (Quad Word Pointer) is needed to indicate the size of the data. Had the incoming value been single precision, DWord Ptr (Double Word Pointer) would be used instead. One important feature of an 8087 or software emulator is that a number may be converted from one numeric format to another simply by loading it as one data type, and then saving it as another.

The next instruction, FMul (Floating point Multiply), multiplies the value currently on the 8087 stack by the same address. Since the original value is still present, there's no need to make a new copy. Next, the destination address is placed into BX, and the result now on the 8087 stack is stored there. The trailing letter p in the FStp instruction specifies that the value loaded earlier is to be popped from the coprocessor stack.

A complete discussion of 8087 instructions and how the coprocessor stack operates goes beyond what I can hope to cover here. When in doubt about what instruction is needed, I suggest that you code a similar sample in BASIC, and then examine the code BASIC generates using CodeView. There are also several books that focus on writing floating point instructions in assembly language.

The last 8087 instruction is FWait, and it tells the 8088 to wait until the coprocessor has finished, before continuing. Because an 8087 is a true coprocessor, it operates independently of the main 8088 CPU. Once a value is loaded and the 8087 is instructed to perform an operation, the 8087 returns immediately to the program that issued the instruction and continues to process the numbers in the background. If Square exited immediately and BASIC read the returned value, there's a good chance that the 8087 did not finish and the value has not yet been stored! In that case, whatever happened to be in memory at that time would be the value that BASIC uses, which is obviously incorrect.

Experienced 8087 programers know how long the various coprocessor instructions take to complete, and with careful planning the number of FWait commands can be kept to a minimum. However, the code that BASIC generates always finishes with an FWait. Of course, there is no need to wait when the emulator is in use. In fact, an FWait is patched by BASIC to do nothing (Mov AX,AX), rather than waste time invoking an empty interrupt handler repeatedly.

As shown, Square can be added to a Quick Library for use with either QuickBASIC or BASIC PDS. Unfortunately, the information link needs to patch 8087 instructions is available only with BASIC PDS. Therefore, the following file is included in the libraries on the accompanying disk, to supply the external data that LINK requires.

;FIXUPS.ASM, deciphered by Paul Passarelli

  FIARQQ  Equ 0FE32h
  FJARQQ  Equ 04000h
  FICRQQ  Equ 00E32h
  FJCRQQ  Equ 0C000h
  FIDRQQ  Equ 05C32h
  FIERQQ  Equ 01632h
  FISRQQ  Equ 00632h
  FJSRQQ  Equ 08000h
  FIWRQQ  Equ 0A23Dh

  Public  FIARQQ
  Public  FJARQQ
  Public  FICRQQ
  Public  FJCRQQ
  Public  FIDRQQ
  Public  FIERQQ
  Public  FISRQQ
  Public  FJSRQQ
  Public  FIWRQQ

These values are added to the floating point instruction bytes during the linking process, and the addition converts those statements into equivalent BASIC floating point interrupt commands. For example, the 8087 statement Fld DWord Ptr [1234h] is represented in memory as the following series of Hexadecimal bytes:

   9B D9 06 34 12

After LINK adds the value FIDRQQ (5C32h) to the first two bytes of this command the result is:

   CD 35 06 34 12

And when disassembled back to assembler mnemonics, the CD35h displays as Int 35h. The three bytes that follow are always left unchanged, and they specify the type of operation--DWord Ptr on a memory location--and the address of that location.

Floating Point Comparisons

At the core of any sorting or searching routine is an appropriate comparison function. Previous chapters showed how to compare string data, and as you can imagine comparing floating point values is much more complex. But now that you know how to tap into BASIC's floating point routines it is almost trivial to effect a floating point comparison. The routines that follow let you compare either single- or double precision values, by passing them as arguments.

;COMPAREFP.ASM, compares floating point values

;WARNING: This file must be assembled using /e (emulator)

.Model Medium, Basic
  Extrn B$FCMP:Proc   ;BASIC's FP compare routine

.8087                 ;allow coprocessor instructions

CompareSP Proc, Var1:Word, Var2:Word

  Mov  BX,Var2        ;get the address of Var1
  Fld  DWord Ptr [BX] ;load it onto the 8087 stack
  Mov  BX,Var1        ;same for Var2
  Fld  DWord Ptr [BX]
  FWait               ;wait until the 8087 says it's okay
  Call B$FCMP         ;compare the values, (and pop both)

  Mov  AX,0           ;assume they're the same
  Je   Exit           ;we were right
  Mov  AL,1           ;assume Var1 is greater
  Ja   Exit           ;we were right
  Dec  AX             ;Var1 must be less than Var2
  Dec  AX             ;decrement AX to -1

  Ret                 ;return to BASIC

CompareSP Endp

CompareDP Proc, Var1:Word, Var2:Word

  Mov  BX,Var2        ;as above
  Fld  QWord Ptr [BX]
  Mov  BX,Var1
  Fld  QWord Ptr [BX]
  Call B$FCMP

  Mov  AX,0
  Je   Exit
  Mov  AL,1
  Ja   Exit
  Dec  AX
  Dec  AX


CompareDP Endp

Like the Compare3 function shown in Chapter 8, CompareSP and CompareDP are integer functions that return -1, 0, or 1 to indicate if the first value is less than, equal to, or greater than the second. Therefore, to use these from BASIC you would invoke them like this:

   IF CompareSP%(Value1!, Value2!) = -1 THEN
     'the first value is smaller than the second

And to test if the first is equal to or greater than the second you would instead do this:

   IF CompareSP%(Value1!, Value2!) >= 0 THEN
     'the first value is equal or greater

You can also use these functions from assembly language. But if you do this, I suggest a simple modification. A comparison routine meant to be called from another assembler routine would not generally return the result in the registers. Rather, it would leave the flags set appropriately for a subsequent Ja or Jne branch.

Fortunately, BASIC's B$FCMP routine already does this. Therefore, you will make a copy of the COMPAREF.ASM source file, and delete the six lines between the call to B$FCMP and the Ret instruction. You can also remove the Exit: label if you like, although its presence causes no harm. Of course, the code itself is so simple that the best solution may be to simply duplicate the same instructions inline in your routine.

Exploiting MASM'S Features

Each example I have shown so far introduced another useful MASM feature. For example, you learned how MASM lets you establish data memory with an initial value, so you don't have to assign it explicitly. But there are several other features you should know about as well. One is conditional assembly.

Conditional Assembly

With conditional assembly you can specify that only certain portions of a file are to be assembled. This makes it easier to maintain two different versions of a routine, for example one for near strings and one for far strings. If you had to create two separate copies of the source file, any improvements or bug fixes that you add would have to be done twice.

There are two ways that a section of code can be optionally included or excluded. One is to define a constant at the beginning of the source file, and then test that constant using a form of IF and ELSE test. Like BASIC, MASM lets you define constant values using meaningful names. The problem with this method--albeit a minor one--is that you must alter the code prior to assembling each version. The example that follows shows how this kind of conditional assembly is employed.

   MyConst = 1
   IF MyConst
          ;do whatever you want here
   ELSE   ;the ELSE is optional
          ;do whatever else you want here

The idea is that if you want the code that follows the IF test to be assembled, you would use a non-zero value for MyConst. If you wanted to create an alternate version using the code within the optional ELSE block, you would change the value to be zero.

You can also use IFE (If Equal to zero) to test if a constant is zero. And this brings up another interesting MASM feature. There are actually two types of constants you can define. The constant MyConst shown above is called a *redefinable* constant, because you can actually change its value during the course of a program. The other type of constant is defined using the Equ (Equate) directive, and may not be changed:

   YourConst Equ 100

Redefinable constants are often used in repeating macros, and macros are discussed later in this section.

The other way to tell MASM that it is to assemble just a portion of the file is with IFDEF. IFDEF (If Defined) tests if a constant has been defined at all, as apposed to comparing for a specific value. The value of this approach is that you can define a constant on the MASM command line when you run it. The first example below tells MASM to assemble the code within the IFDEF block, and the second tells it to not to.

   C:\ASM\> masm program /def myconst ;

   C:\ASM\> masm program ;

Here's the portion of the routine that is being assembled conditionally:

   IFDEF MyConst
     ;do something optional here

Likewise, IFNDEF (If Not Defined) tests if a constant has not been defined when reversing the logic is more sensible to you. MASM includes a great number of such conditional tests, and only by reading that section of the MASM manual will you become familiar with those that are the most useful.

Comment Blocks

Another useful MASM feature that I personally would love to see added to BASIC is multi-line comment blocks. The Comment command accepts any single character you choose as a delimiter, and considers everything thereafter to be comments until the same character is encountered. Many programmers use a vertical bar, because it is not a common character:

   Comment |
   This program is intended to blah blah blah, and it works
   by loading AX with blah blah blah.

Besides avoiding the need to place an explicit semicolon on each comment line, this also makes it easy to remark out large sections of code while you are debugging a routine.

Quoted Strings

Yet another useful feature is MASM's willingness to use either single or double quotes to indicate ASCII text and individual characters. In BASIC, if you want to specify a double quote you must use CHR$(34)--it simply is not legal to use """, where the quote in the middle is the character being defined. [With the introduction of VB/DOS triple quotes may now be used for this purpose.] If you need to define a double quote simply surround it with apostrophes like this:

   SomeData DB '"'
   Mov  AH, '"'

Or you can place a single quote within double quotes like this:

   Add DL, "'"

MASM can use either convention as needed, which is a feature I personally like a lot.

Length and Address Self-Calculation

Whenever MASM sees the dollar sign ($) operator it interprets that to mean *here*, or the current address. This can be used both for data and code, though it is more common with data as the example below illustrates.

     Descriptor DW MsgLen, Address
     Message    DB "This is a message."
     Address =  Offset Message
     MsgLen  =  $ - Address

The expression $ - Address tells the assembler to take the current data address, and subtract from that the address where Message begins. This is a very powerful concept because it frees the programmer from many tedious calculations. In particular, if the string contents are changed at a later time, the new length is recalculated by MASM automatically.

Defining Data Structures

To assist you in manipulating data structures, MASM offers the Struc directive. This is identical to BASIC's TYPE statement, whereby you define the organization of a collection of related data items. The example below shows how to define a custom data structure using BASIC, followed by an equivalent MASM Struc definition.


   TYPE MyType
     LastName  AS STRING * 15
     FirstName AS STRING * 12
     ZipCode   AS STRING * 5
     RecordPtr AS LONG
   DIM MyVar AS MyType

   Struc MyStruc
     LastName  DB 15 Dup (?)
     FirstName DB 12 Dup (?)
     ZipCode   DB  5 Dup (?)
     RecordPtr DD  ?
   MyStruc Ends
   MyVar DB Size MyStruc Dup (?)

Like BASIC, defining a structure merely establishes the number and type of data items that will be stored; memory is not actually set aside until you do that manually. In BASIC, you must use DIM to establish the memory that will hold the TYPE variable. In assembly language you instead use DB in conjunction with the Size directive, to set aside the appropriate number of bytes.

Each component of the Structure is defined using an identifying name and a corresponding data type. Then, whenever a structure member is referenced in your assembler routine, MASM replaces it with a number that shows how far into the structure that member is located. MASM uses the same syntax as BASIC, with a period between the data name and the structure identifier. Here are a few examples:

   Mov  AL,[BX+MyVar.LastName]   ;same as Mov AL,[BX+15]
   Les  DI,[MyVar.RecordPtr]     ;loads ES:DI from RecordPtr

Minimizing DGROUP Usage

In many cases you will store the variables your routines need in DGROUP using the .Data directive. As with static subprograms and functions in BASIC, this data will not change between subroutine calls. But this also means that these variables are combined into the same 64k segment that is shared with BASIC. When there are many variables or many different routines each with their own variables, this can significantly reduce the amount of near memory available to BASIC. There are two effective solutions to this problem.

Local Variables

One way to reduce the DGROUP impact of many variables is to place some of them onto the system stack. MASM lets you do this automatically with its Local directive, or you can do it manually by subtracting the requisite number of bytes from SP. Of course, there is only so much room on the stack, so this approach is most useful when there are many routines and each has less than 1K or so of data. Stack variables are also useful when programming for OS/2 or Windows. These operating systems require that all of your procedures be reentrant so static variables cannot be used.

The example below creates room for fifty words of local storage on the stack, and then clears the variables to zero.

   Routine Proc Uses ES DI, Param1:Word, Param2:Word
     Sub  SP,100         ;50 words = 100 bytes
     Push SS             ;assign ES from SS
     Pop  ES
     Mov  DI,SP          ;point DI to the start of storage
     Xor  AX,AX          ;fill with zeros
     Mov  CX,50          ;clear fifty words
     Rep  Stosw          ;store AX CX times at ES:[DI]
      .                  ;the routine continues
     Add  SP,100         ;restore SP to what it had been
     Ret                 ;return to BASIC
   Routine Endp

MASM can also do this automatically for you using Local like this:

   Routine Proc Uses ES DI, Param1:Word, Param2:Word
     Local Buffer [100]:Byte
     Lea  DI,Buffer      ;clear the stack variables here
      .                  ;the routine continues
     Ret                 ;return to BASIC
   Routine Endp

As you can see, Local lets you refer to the start of the local stack data area by name. Notice how Lea is required here, because the address of Buffer is expressed as an offset from BP. That is, MASM translates the Lea instruction to Lea DI,[BP-100]. You cannot use Mov DI,Offset Buffer because Buffer's address (which is based on the current setting of the stack pointer) is not known when the routine is assembled or linked.

In this case only one local block is defined, so you could also use Mov DI,SP to set DI to point to the start of the data. It is not strictly necessary to clear the stack space before using it, but it is important to understand that whatever junk happened to be in memory at that time will still be there after using Local.

It is also important to be aware of a number of bugs with the Local directive. I have found that limiting the use of Local to a single set of data as shown here is safe with all MASM versions through 5.1. Using multiple Local directives defined with data structures can result in the wrong part of the stack being written to when a structure member is accessed by name.

Storing Data in the Code Segment

Another time-honored technique for conserving DGROUP memory is to place selected variables into the code segment. In most cases storing data for a routine in the code segment will make your programs slightly larger and slower, because of the need for an added CS: segment override. But when large amounts of data must be accommodated, this can be very valuable indeed. One advantage to using the code segment is that you can establish initial values for the data, which is not possible when using the stack.

As an example of this technique, I have written a string function called Message$ that stores a series of messages in the code segment. In this case only a single CS: segment override is needed, so the impact of using the code segment for data is insignificant. Message$ is designed to be declared and invoked as follows:

   DECLARE FUNCTION Message$(BYVAL MsgNumber%)
   Result$ = Message$(AnyInt%)

Message$ is table driven, which makes it simple to modify the routine to change or add messages without having to make any changes to the function's structure. As shown here, Message$ is designed to return the name of a weekday, given a value between one and seven. You can easily modify it to return other strings of nearly any length.

.Model Medium, Basic
  Extrn B$ASSN:Proc         ;BASIC's assignment routine

  Descriptor DD 0           ;the output string descriptor
  Null$      DD 0           ;use this to return a null
                            ;  (needed for BASIC PDS only,
.Code                       ;  but okay with QuickBASIC)

Message Proc Uses SI, MsgNumber:Word

  Mov  SI,Offset Messages   ;point to start of messages
  Xor  AX,AX                ;assume an invalid value

  Mov  CX,MsgNumber         ;load the message number
  Cmp  CX,NumMsg            ;does this message exist?
  Ja   Null                 ;no, return a null string
  Jcxz Null                 ;ditto if they pass a zero

Do:                         ;walk through the messages
  Lods Word Ptr CS:0        ;load and skip over this message's length
  Dec  CX                   ;show that we read another
  Jz   Done                 ;this is the one we want

  Add  SI,AX                ;skip over the message text
  Jmp  Short Do             ;continue until we're there

  Or   AX,AX                ;are we returning a null?
  Jz   Null                 ;yes, handle that differently
  Push CS                   ;no, pass the source segment

  Push SI                   ;and the source address
  Push AX                   ;and the source length

  Push DS                   ;pass the destination segment
  Mov  AX,Offset Descriptor ;and the destination address
  Push AX
  Xor  AX,AX                ;0 means assign a descriptor
  Push AX                   ;pass that as well

  Call B$ASSN               ;let B$ASSN do the dirty work
  Mov  AX,Offset Descriptor ;show where the output is
  Ret                       ;return to BASIC

  Push DS                   ;pass the address of Null$
  Mov  SI,Offset Null$
  Jmp  Short Done2

Message Endp

;----- DefMsg macro that defines messages
DefMsg Macro Message
  LOCAL MsgStart, MsgEnd    ;;local address labels
  NumMsg = NumMsg + 1       ;;show we made another one
  IFB              ;;if no text is defined
    DW 0                    ;;just create an empty zero
  ELSE                      ;;else create the message
    DW MsgEnd - MsgStart    ;;first write the length
    MsgStart:               ;;identify the starting address
      DB Message            ;;define the message text
    MsgEnd Label Byte       ;;this marks the end

Messages Label Byte         ;the messages begin here
NumMsg = 0                  ;tracks number of messages
                            ;DO NOT MOVE this constant
DefMsg "Sunday"
DefMsg "Monday"
DefMsg "Tuesday"
DefMsg "Wednesday"
DefMsg "Thursday"
DefMsg "Friday"
DefMsg "Saturday"

After declaring BASIC's B$ASSN routine as being external, Message$ defines two string descriptors in the Data segment. The first is used for the function output when returning a normal message, and the second is used only when returning a null string. In truth, the need for a separate output descriptor and the slight added steps to detect the special case of a null output string is needed only with BASIC PDS far strings. And this brings up an important point.

It is impossible to write one assembly language subroutine that can work with both QuickBASIC and BASIC PDS far strings using the normal, documented methods. To create a string function for use with QuickBASIC and PDS near strings, you define and fill in a string descriptor in DGROUP, and assign its address in AX before returning to BASIC. And to return a far string as a function for PDS requires calling the internal STRINGASSIGN routine that Microsoft provides with PDS. STRINGASSIGN works with both near and far strings in PDS, but is not available in QuickBASIC.

The trick is to use the *undocumented* name B$ASSN, which is really the same thing as STRINGASSIGN. The big difference, though, is that B$ASSN is available in all versions of BASIC 4.0 and later. When near strings are used the B$ASSN routine is extracted from the near strings library. When linking with far strings a different version is used, extracted by LINK from the far strings library. This is a powerful concept to be sure, and one we will use again for other examples later on in this chapter.

Message$ begins by loading SI with the starting address of a table of messages. These messages are located at the end of the source file in the code segment, and each is preceded with the length of the text. Although it may not be obvious from looking at the source listing, the message data is actually structured like this:

   DW 6
   DB "Sunday"
   DW 6
   DB "Monday"

Next, AX is cleared to zero just in case the incoming string number is illegal. Later in the program AX holds the length of the output string; clearing it here simply makes the program's logic more direct.

CX is then loaded with the message number the caller asked for. If CX is either higher than the available number of messages or zero, the program jumps to the code that returns a null string. Otherwise, a small loop is entered that walks through each message, decrementing CX as it goes. When CX reaches zero, SI is pointing at the correct message and AX is holding its length. Otherwise, the current length is added to SI, thus skipping over that data.

Notice the unusual form of the Lodsw statement, to allow it to work with a CS: override. MASM has a number of quirks that are less than intuitive, and this is but one of them. Normally you would use either Lodsb or Lodsw, to indicate loading either a byte into AL or a word into AX. But when you use a segment override MASM requires omitting the "b" or "w" Lods suffix, and you must state Byte Ptr or Word Ptr explicitly. Then, a dummy argument must be placed after the override colon.

MASM Macros

The last new feature this listing introduces is the use of macros. The most basic use of MASM macros is to define a block of code once, and then repeat it multiple times with a single statement. This is not unlike keyboard macro programs such as Borland's SuperKey, that let you assign a string of text to a single key. For example, you could press Alt-S and SuperKey will type "Very truly yours", five Enter keys, and then your name.

MASM macros also offer many other interesting and useful capabilities, including the ability to accept arguments. [I should mention that the main point of the DefMsg macro is to make this function easy to modify, so you can create other, similar string functions from this same routine.] Before attempting to explain the DefMsg (Define Message) macro I designed for use with Message$, let's consider some macro basics.

Say, for example, you find that a particular routine needs to push the same five registers many times during the course of a procedure. To simplify this task you could define a macro--perhaps named PushRegs--that performs the code sequence for you. Such a macro definition would look like this:

   PushRegs Macro
     Push AX
     Push BX
     Push SI
     Push DS
     Push ES
   PushRegs Endm

Now, each time you want to execute this series of instructions you would simply use the command PushRegs. Please understand that a macro is not the same as a called subroutine. The assembler still places each Push command in sequence into your source code each time the macro is invoked. But a simple macro like this can reduce the amount of typing you must do, and minimize errors such as pushing registers in the wrong order. And in some cases Macros also make your code easier to read.

As I mentioned, a MASM macro can accept arguments, and it can even be designed to accept a varying number of them. If you need to push three registers but which ones may change, you would define PushRegs like this:

   PushRegs Macro Reg1, Reg2, Reg3
     Push Reg1
     Push Reg2
     Push Reg3

Then to push AX, SI, and DI you would invoke PushRegs as follows:

   PushRegs AX, SI, DI

Of course, a corresponding PopRegs macro would be defined similarly. Once a macro has been defined you can pass any legal argument to it. For example, you could also use this:

   PushRegs AX, Word Ptr [BP-20], IntVar

Here, you are pushing AX, the word 20 bytes below where BP points to on the stack, and the integer variable named IntVar.

A useful enhancement to this macro would let you pass it a varying number of parameters. The PushM macro that follows accepts any number of arguments (up to eight), and pushes each in sequence.

   PushM Macro A,B,C,D,E,F,G,H     ;;add more place-holders to suit
     IRP CurArg,  ;;repeat for each argument
       IFNB                ;;if this arg is not blank
         Push CurArg               ;;push it
     Endm                          ;;end of repeat block
   Endm                            ;;end of this macro

From this you can create a complementary PopM macro by changing the name, and also changing the Push instruction to Pop.

The IRP command works much like a FOR/NEXT loop in BASIC, and tells MASM to repeat the following statements for each argument that was given. IFNB (If Not Blank) then tests each argument to see if it was in fact present in the incoming list of parameters. In this case, CurArg assumes the name of the argument, and the Push instruction is expanded to specify that name.

There is no disputing that the syntax of a MASM macro is confusing at best. Having to enclose some arguments in angle brackets but not others requires frequent visits to the MASM manual. Further, a MASM macro is virtually impossible to debug. If you write a macro incorrectly or create a syntax error, MASM reports an error at the line where the macro was invoked, rather than at the line containing the error in the macro. It is not uncommon to receive a number of errors all pointing to the same source line, with no indication whatsoever where the error really is.

Now consider how the DefMsg macro operates. DefMsg begins by defining a single incoming parameter named Message. Two local labels--MsgStart and MsgEnd--are defined, and these are needed so MASM can calculate the length of the messages. Although labels within a macro do not have to be declared as local, you would get an error if the macro were used more than once. Like BASIC, the assembler requires that each label have a unique name. By using local labels MASM generates a new, unique internal name for each macro invocation, instead of the actual label name given.

The next statement increments a MASM variable named NumMsg. To avoid an error caused by calling Message$ with an invalid message number, it compares the number you pass to the number of messages that are defined. This test occurs in the fourth line of the procedure, at the Cmp CX,NumMsg statement. NumMsg is a constant, except it may be redefined within the routine. (When a constant is assigned using the word Equate, its value may not be changed by either your source code or by a macro.) But when a variable is defined using an equals sign (=), MASM allows it to be altered as it assembles your program. Understand that the resulting number is added to your program as a constant. However, its value can be changed during the course of assembly. Therefore, each time DefMsg is invoked, it increments NumMsg. MASM places the final value into the Cmp instruction, as if you had defined it using a fixed known value.

The IFB (If Blank) test checks to see if DefMsg was given a parameter when it was invoked. In most cases you will probably want to define a series of consecutive messages. As it is used here, seven different day names are returned in sequence. But there may be times when you want to leave a particular message number blank. For example, you could create a series of messages that correspond to BASIC's error numbers. BASIC file error numbers range from 50 through 76, but there are no messages numbers 60, 65, or 66. You could therefore leave those blank, and invoke a modified copy of Message$ like this:

   CALL DOSMessage$(51 - ERR)

When DefMsg is used with no argument, it merely creates a zero word at that point in the code segment. Otherwise, the length of the message is stored, followed by the message text. The statement DW MsgEnd - MsgStart is replaced with the difference between the addresses, which MASM calculates for you. This is similar to the earlier example that showed how a dollar sign ($) can simplify defining strings that may change.

The last macro I will describe here is Rept, which means "Repeat the following statements a given number of times". In the simplest sense, Rept could be used to generate a series of the same instructions:

   Rept 100
     Xor  AX,AX
     Push AX
     Call SomeProc

A Rept macro is not invoked by name; rather, it is added inline to a program (or included within a macro that is called by name). In most cases you would use a coding loop to repeat a block of code, since a Rept macro actually generates the same code repeatedly in the program. But there are situations where timing is very critical, and a loop is always somewhat slower than a sequence of inline instructions.

Another good use for Rept is in conjunction with redefinable equates, such as this example which defines the letters of the alphabet:

   Char = 0
   Rept 26                ;;do this 26 times
     DB "A" + Char        ;;define ASC("A") + Char
     Char = Char + 1      ;;increment Char

Although the MASM manual states that you must use double semicolons for remarks within a macro as shown here, I have used a single semicolon without problems.

There are other macro commands and features I will not describe here, because I have not found them to be particularly useful. However, macros can be recursive, multiple macros may be nested, and even redefined on the fly. I urge you to refer to the documentation that Microsoft provides for more information on those advanced features.