QB CULT MAGAZINE
Vol. 2 Iss. 4 - October 2001

BASIC Techniques and Utilities, Chapter 9
Program Optimization

Throughout the preceding chapters I have shown a variety of tips and techniques that can help to improve the efficiency of your programs. For example, Chapter 6 explained that processing files in large pieces reduces the time needed to save and load data. Likewise, Chapter 8 discussed the improvement that SWAP often provides over conventional assignments. Some optimizations, however, do not fit into any of the well-defined categories that have been used to organize this book. In this chapter I will share several general optimization techniques you can employ to reduce the size of your programs and make them run faster.

The material in this chapter is organized into three principle categories: programming shortcuts and speed improvements, miscellaneous tips and techniques, and benchmarking. Each section addresses BASIC programming ideas and methods that are not immediately obvious in most cases.

Programming Shortcuts and Speed Improvements

Chapter 3 discussed the use of AND, OR, and the other logical operations that can be used for both logical (IF and CASE) tests and also bit operations. But there are a few other related points that are worth mentioning here. When you need to know if a variable is zero or not, you can omit an explicit test for zero like this:

   IF Variable THEN...

You might be tempted to think that two variables could be tested for non- zero values at one time in the same way, using code such as this:

   IF Var1 AND Var2 THEN...

However, that will very likely fail. The expression Var1 AND Var2 combines the bits in these variables, which could result in a value of zero even when both variables are non-zero. As an example, if Var1 currently holds a value of 1, its bits will be set as follows:

   0000 0000 0000 0001

Now, if Var2 is assigned the value 2, its bits will be set like this:

   0000 0000 0000 0010

Since no two bits are set in the same position in each variable, the result of Var1 AND Var2 is zero. An effective solution is IF Var1 * Var2 THEN to ensure that neither variable is zero. And to test if either variable is non-zero you'd use OR. Whatever follows the test IF Var1 OR Var2 THEN will be executed as long as one (or both) variables are not zero. These are important short cuts to understand, because the improvement in code size and execution speed can be significant.

Each of the AND, OR, and multiplication tests shown here generates only 11 bytes of code. Contrast that to the 28 bytes that BC creates for the alternative: IF Var1 <> 0 AND Var2 <> 0 THEN. Because of the improved method of expression evaluation in BASIC PDS, this last example generates only 14 bytes when using that version of BC. None the less, if you can avoid explicit comparisons to zero you will go a long way toward improving the efficiency of your code.

This short cut is equally appropriate with LOOP comparisons as well as IF tests. In the BufIn function shown in Chapter 6, INSTR was used to see if a CHR$(13) carriage return was present in the buffer. In the statement CR = INSTR(BufPos, Buffer$, CR$), CR receives either 0 if that character is present, or a non-zero position in the string where it was found. The LOOP statement that surrounded the buffer searching uses LOOP WHILE CR, which continues looping as long as CR is not zero.

When an integer variable is compared in a LOOP WHILE condition, seven bytes of code are generated whether it is compared to zero or not. But when a long integer is used to control the LOOP WHILE condition, omitting the explicit test for zero results in 11 bytes of compiled code where including it creates 20 bytes. Note that with floating point values identical code is generated in either case, because an explicit comparison to zero is required and added by the compiler.

Predefining Variables

Another important point is illustrated in the same code fragment that uses INSTR to search for a carriage return. There, the CR$ string variable had been assigned earlier to CHR$(13). Although the BufIn code could have used CR = INSTR(BufPos, Buffer$, CHR$(13)) instead of a previously defined string variable to replace the CHR$(13), that would take longer each time the statement is executed. Since CHR$ is a function, it must be called each time it is used. If CR$ is defined once ahead of time, only its address needs to be passed to INSTR. This can be done with four bytes of assembly language code.

If CHR$(13) will be used only once in a program, then the only savings afforded by predefining it will be execution speed. But when it is needed two or more times, several bytes can be saved at each occurrence by using a replacement string variable. Other common CHR$ values that are used in BASIC programs are the CHR$(34) quote character, and CHR$(0) which is often used when accessing DOS services with CALL Interrupt.

Likewise, you should avoid calling any functions more than is absolutely necessary. I have seen many programmers use code similar to the following, to see if a drive letter has been given as part of a file name.

   IF INSTR(Path$, ":") THEN 
     Drive$ = LEFT$(INSTR(Path$, ":") - 1)
   END IF

A much better approach is to invoke INSTR only once, and save the results for subsequent testing:

   Found% = INSTR(Path$, ":")   'save the result from INSTR
   IF Found% THEN
     Drive$ = LEFT$(Path$, Found%) - 1)
   END IF

The same situation holds true for UCASE$, MID$, and all of the other BASIC functions. Rather than this:

   IF INSTR(UCASE$(MID$(Work$, 3, 22)), "/A") THEN A = True
   IF INSTR(UCASE$(MID$(Work$, 3, 22)), "/B") THEN B = True
   IF INSTR(UCASE$(MID$(Work$, 3, 22)), "/C") THEN C = True

Instead use this:

   Temp$ = UCASE$(MID$(Work$, 3, 22))
   IF INSTR(Temp$, "/A") THEN A = True
   IF INSTR(Temp$, "/B") THEN B = True
   IF INSTR(Temp$, "/C") THEN C = True

Where the first example generates 138 bytes of code, the second uses only 111. The time savings will be even more significant, because BASIC's UCASE$ and MID$ functions allocate and deallocate memory by making further calls to BASIC's string memory management routines.

Indeed, it is always best to avoid creating new strings whenever possible, precisely because of the overhead needed to assign and erase string data. Each time a string is assigned, memory must be found to hold it; add to that the additional code needed to release the older, abandoned version of the string.

This has further ramifications with simple string tests as well. As Chapter 3 explained, testing for single characters or the first character in a string is always faster if you isolate the ASCII value of the character first, and then use integer comparisons later. In the example below, the first series of IF tests generates 60 bytes of code. This is much less efficient than the second which generates only 46, even though the steps to obtain and assign the ASCII value of Answer$ comprise 12 of those bytes.

PRINT "Abort, Retry, or Fail? (A/R/F) ";
DO
  Answer$ = UCASE$(INKEY$)
LOOP UNTIL LEN(Answer$)

'----- Method 1:
IF Answer$ = "A" THEN
  REM
ELSEIF Answer$ = "R" THEN
  REM
ELSEIF Answer$ = "F" THEN
  REM
END IF


'----- Method 2:
A% = ASC(Answer$)
IF A% = 65 THEN
  REM
ELSEIF A% = 82 THEN
  REM
ELSEIF A% = 70 THEN
  REM
END IF

Another prime candidate for speed enhancement is when you need to create a string from individual characters. The first example below reads the 80 characters in the top row of display memory, and builds a new string from those characters.

   Scrn$ = ""
   FOR X = 1 TO 80
     Scrn$ = Scrn$ + CHR$(SCREEN(1, X))
   NEXT

Since we already know that 80 characters are to be read, a much better method is to preassign the destination string, and insert the characters using the statement form of MID$, thus:

   Scrn$ = SPACE$(80)
   FOR X% = 1 TO 80
     MID$(Scrn$, X%, 1) = CHR$(SCREEN(1, X%))
   NEXT

An informal timing test that executed these code fragments 100 times using QuickBASIC 4.5 showed that the second example is nearly twice as fast as the first. Moreover, since BASIC's SCREEN function is notoriously slow, the actual difference between building a new string and inserting characters into an existing string is no doubt much greater.

Integer and Long Integer Assignments

Another facet of compiled BASIC that is probably not immediately obvious is the way that integer and long integer assignments are handled by the compiler. When many variables are to be assigned the same value--perhaps cleared to zero--it is often more efficient to assign one of them from that value, and then assign the rest from the first. To appreciate why this is so requires an understanding of how BASIC compiles such assignments.

Normally, assigning an integer or long integer variable from a numeric constant requires the same amount of code as assigning from another variable. The BASIC statement X% = 1234 is compiled to the following 6- byte assembly language statement.

   C7063600D204  MOV WORD PTR [X%],1234

Assigning the long integer variable Y& requires two such 6-byte instructions--one for the low word and another for the high word:

   C7063600D204  MOV  WORD PTR [Y&],1234 ;assign the low word
   C70638000000  MOV  WORD PTR [Y&+2],0  ;then the high word

The 80x86 family of microprocessors does not have direct instructions for moving the contents of one memory location to another. Therefore, the statement X% = Y% is compiled as follows, with the AX register used as an intermediary.

   A13800  MOV  AX,WORD PTR [Y%]     ;move Y% into AX
   A33600  MOV  WORD PTR [X%],AX     ;move AX into X%

Assigning one long integer from another as in X& = Y& is handled similarly:

   A13A00    MOV  AX,WORD PTR [Y&]     ;move AX from Y& low
   8B163C00  MOV  DX,WORD PTR [Y&+2]   ;move DX from Y& high
   A33600    MOV  WORD PTR [X&],AX     ;move X& low from AX
   89163800  MOV  WORD PTR [X&+2],DX   ;move X& high from DX

You may have noticed that instructions that use the AX registers require only three bytes to access a word of memory, while those that use DX (or indeed, any register other than AX) require four. But don't be so quick to assume that BASIC is not optimizing your code. The advantage to using separate registers is that the full value of Y& is preserved. Had AX been used both times, the low word would be lost when the high word was transferred from Y& to X&.

When assigning one variable to many in a row, BASIC is smart enough to remember which values are in which registers, and it reuses those values for subsequent assignments. The combination BASIC and assembly language code shown below was captured from a CodeView session and edited slightly for clarity. It shows the actual assembly language code bytes generated for a series of assignments.

Plain integer assignments:

   A% = 1234
       C7063600D204   MOV  WORD PTR [A%],&H04D2
   B% = 1234
       C7063800D204   MOV  WORD PTR [B%],&H04D2
   C% = 1234
       C7063A00D204   MOV  WORD PTR [C%],&H04D2
   D% = 1234
       C7063C00D204   MOV  WORD PTR [D%],&H04D2
   E% = 1234
       C7063E00D204   MOV  WORD PTR [E%],&H04D2

Plain long integer assignments:

   V& = 1234
       C7064000D204   MOV  WORD PTR [V&],&H04D2
       C70642000000   MOV  WORD PTR [V&+2],0
   W& = 1234
       C7064400D204   MOV  WORD PTR [W&],&H04D2
       C70642000000   MOV  WORD PTR [W&+2],0
   X& = 1234
       C7064800D204   MOV  WORD PTR [X&],&H04D2
       C70642000000   MOV  WORD PTR [X&+2],0
   Y& = 1234
       C7064C00D204   MOV  WORD PTR [Y&],&H04D2
       C70642000000   MOV  WORD PTR [Y&+2],0
   Z& = 1234
       C7065000D204   MOV  WORD PTR [Z&],&H04D2
       C70642000000   MOV  WORD PTR [Z&+2],0

Assigning multiple integers from another:

   A% = 1234
       C7063600D204   MOV  WORD PTR [A%],&H04D2
   B% = A%
       A13600         MOV  AX,WORD PTR [A%]
       A33800         MOV  WORD PTR [B%],AX
   C% = A%
       A33A00         MOV  WORD PTR [C%],AX
   D% = A%
       A33C00         MOV  WORD PTR [D%],AX
   E% = A%
       A33E00         MOV  WORD PTR [E%],AX

Assigning multiple long integers from another:

   V& = 1234
       C7064000D204   MOV  WORD PTR [V&],&H04D2
       C70642000000   MOV  WORD PTR [V&+2],0
   W& = V&
       A14000         MOV  AX,WORD PTR [V&]
       8B164200       MOV  DX,WORD PTR [V&+2]
       A34400         MOV  WORD PTR [W&],AX
       89164600       MOV  WORD PTR [W&+2],DX
   X& = V&
       A34800         MOV  WORD PTR [X&],AX
       89164A00       MOV  WORD PTR [X&+2],DX
   Y& = V&
       A34C00         MOV  WORD PTR [Y&],AX
       89164E00       MOV  WORD PTR [Y&+2],DX
   Z& = V&
       A35000         MOV  WORD PTR [Z&],AX
       89165200       MOV  WORD PTR [Z&+2],DX

The first five statements assign the value 1234 (04D2 Hex) to integer variables, and each requires six bytes of code. The next five instructions assign the same value to long integers, taking two such instructions for a total of 12 bytes for each assignment. Note that a zero is assigned to the higher word of each long integer, because the full Hex value being assigned is actually &H000004D2. Simple multiplication shows that the five integer assignments generates five times six bytes, for a total of 30 bytes. The long integer assignments take twice that at 60 bytes total.

But notice the difference in the next two statement blocks. The first integer assignment requires the usual six bytes, and the second does as well. But thereafter, any number of additional integer variables will be assigned with only three bytes apiece. Likewise, all but the first two long integer assignments are implemented using only seven bytes each. Remembering what values are in each register is yet one more optimization that BASIC performs as it compiles your program.

Short Circuit Expression Evaluation

Many programming situations require more than one test to determine if a series of instructions are to be executed or a branch taken. The short example below tests that a string is not null, and also that the row and column to print at are legal.

   IF Work$ <> "" AND Row <= 25 AND Column <= 80 THEN
     LOCATE Row, Column
     PRINT Work$
   END IF

When this program is compiled with QuickBASIC, all three of the tests are first performed in sequence, and the results are then combined to see if the LOCATE and PRINT should be performed. The problem is that time is wasted comparing the row and column even if the string is null. When speed is the primary concern, you should test first for the condition that is most likely to fail, and then use a separate test for the other conditions:

   IF Work$ <> "" THEN
     IF Row <= 25 AND Column <= 80 THEN
       LOCATE Row, Column
       PRINT Work$
     END IF
   END IF

This separation of tests is called *short circuit expression evaluation*, because you are bypassing--or short circuiting--the remaining tests when the first fails. Although it doesn't really take BASIC very long to determine if a string is null, the principle can be applied to other situations such as those that involve file operations like EOF and LOF. Further, as you learned in Chapter 3, a better way to test for a non-null string is IF LEN(Work$) THEN. However, the point is to perform those tests that are most likely to fail first, before others that are less likely or will take longer.

Another place where you will find it useful to separate multiple tests is when accessing arrays. If you are testing both for a legal element number and a particular element value, QuickBASIC will give a "Subscript out of range" error if the element number is not valid. This is shown below.

   IF Element <= MaxEls AND Array(Element) <> 0 THEN

Since QuickBASIC always performs both tests, the second will cause an error if Element is not a legal value. In this case, you *have* to implement the tests using two separate statements:

   IF Element <= MaxEls THEN
     IF Array(Element) <> 0 THEN
      .
      .
   END IF
 END IF

You may have noticed the I have referred to QuickBASIC here exclusively in this discussion. Beginning with BASIC 7.0, Microsoft has added short circuit testing to the compiler as part of its built-in decision making process. Therefore, when you have a statement such as this one:

   IF X > 1 AND Y = 2 AND Z < 3 THEN

BASIC PDS substitutes the following logic automatically:

   IF X <= 1 THEN GOTO SkipIt
   IF Y <> 2 THEN GOTO SkipIt
   IF Z >= 3 THEN GOTO SkipIt
    .
    .
   SkipIt:

Speaking of THEN and GOTO, it is worth mentioning that the keyword THEN is not truly necessary when the only thing that follows is a GOTO. That is, IF X < 1 GOTO Label is perfectly legal, although the only savings is in the program's source code.

This next and final trick isn't technically a short circuit expression test, but it can reduce the size of your programs in a similar fashion. Chapter 3 compared the relative advantages of GOSUB routines and called subprograms, and showed that a subprogram is superior when passing parameters, while a GOSUB is much faster and smaller. An ideal compromise in some situations is to combine the two methods.

If you have a called subprogram (or function) that requires a large number of parameters and it is called many times, you can use a single call within a GOSUB routine. Since a GOSUB statement generates only three bytes of code each time it is used, this can be an ideal way to minimize the number of times that the full CALL is required. Of course, GOSUB does not accept parameters, but many of them may be the same from call to call. In particular, some third-party add-on libraries require a long series of arguments that are unlikely to change. This is shown below.

Row = 10
Column = 20
Message$ = "Slap me five"
GOSUB DisplayMsg
 .
 .
DisplayMsg:
CALL ManyParams(Row, Column, Message$, MonType, NoSnow, FGColr, BGColr, _
  HighlightFlag, VideoMode, VideoPage)
RETURN

In many cases you would have assigned permanent values for the majority of these parameters, and it is wasteful to have BASIC create code to pass them repeatedly. Here, the small added overhead of the three assignments prior to each GOSUB results in less code than passing all ten arguments repeatedly.

Miscellaneous Tips and Techniques

There are many tricks that programmers learn over the years, and the following are some of the more useful ones I have developed myself, or come across in magazines and other sources.

Formatting and Rounding

One frequent requirement in many programs is having control over how numbers are formatted. Of course, BASIC has the PRINT USING statement which is adequate in most cases. And Chapter 6 also showed how to trick BASIC's file handling statements into letting you access a string formatted by PRINT USING. But there are other formatting issues that are not handled by BASIC directly.

One problem for many programmers is that BASIC adds leading and trailing blanks when printing numbers on the screen or to a disk file. The leading blank is a placeholder for a possible minus sign, and is not added when the number is in fact negative. Avoiding the trailing blank is easy; simply use PRINT STR$(Number). And the easiest way to omit the leading blank for positive numbers is to use LTRIM$: PRINT LTRIM$(STR$(Number)).

PRINT USING is notoriously slow, because examines each character in a string version of the number, and reformat the digits while interpreting the many possible options specified in a separate formatting string. But in many cases all that is really needed is simple right justification. To right-align an integer value (or series of values) you can use RSET to assign the numbers into a string, and then print that string as shown below.

   Work$ = SPACE$(10)
   REST Work$ = STR$(Number)
   PRINT TAB(15); Work$

In this case, Work$ could also have been dimensioned as a fixed-length string. Adding leading zeros to a number is also quite easy using RIGHT$ like this:

   PRINT RIGHT$("00000" + LTRIM$(STR$(Number)), 6)

You will need at least as many zeros in the string as the final result requires, less one since STR$ always returns at least one digit. Trailing digits are handled similarly, except you would use LEFT$ instead of RIGHT$.

Rounding numbers is an equally common need, and there are several ways to handle this. Of course, INT and FIX can be used to truncate a floating point value to an integer result, but neither of these perform rounding. For that you should use CINT or CLNG, which do round the number to the closest integer value. For example, Value = CINT(3.59) will assign 4 to Value, regardless of whether Value is an integer, single precision, or whatever.

Some BASICs have a CEIL function, which returns the next *higher* integer result. That is, CEIL(3) is 3, but CEIL(3.01) returns the value 4. This function can be easily simulated using Ceil = -INT(-Number).

Rounding algorithms are not quite so simple to implement, as you can see in the short DEF FN function below.

   DEF FnRound# (Value#, Digits%)
     Mult% = 10 ^ Digits%
     FnRound# = FIX((Mult% * Value#) + (SGN(Value#)) * .5#) / Mult%
   END DEF

Another important math optimization is to avoid exponentiation whenever possible. Whether you are using integers or floating point numbers, using Number ^ 2 and Number ^ 3 are many times slower than Number * Number and Number * Number * Number respectively.

String Tricks and Minimizing Parameters

There are a few string tricks and issues worth mentioning here too. The fastest and smallest way to clear a string without actually deleting it is with LSET Work$ = "". Another clever and interesting string trick lets you delete a string with only nine bytes of code, instead of the usual 13.

In Chapter 6 you learned that the assembly language routines within BASIC's runtime library are accessible if you know their names. You can exploit that by using the B$STDL (string delete) routine, which requires less code to set up and call than the more usual Work$ = "". When a string is assigned to a null value, two parameters--the address of the target string and the address of the null--are passed to the string assignment routine. But B$STDL needs only the address of the string being deleted. You might think that BASIC would be smart enough to see the "" null and call B$STDL automatically, but it doesn't. Here is how you would declare and call B$STDL:

   DECLARE SUB DeleteStr ALIAS "B$STDL" (Work$)
   CALL DeleteStr(Any$)

As with the examples that let you call GET # and PUT # directly, DeleteStr will not work in the QB environment unless you first create a wrapper subprogram written in BASIC, and include that wrapper in a Quick Library. And this brings up an important point. Why bother to write a BASIC subprogram that in turn calls an internal routine, when the BASIC subprogram could just as easily delete the string itself? Therefore, the best solution--especially because it's also the easiest--is to write DeleteStr in BASIC thus:

   SUB DeleteStr(Work$)
     Work$ = ""
   END SUB

This is an important concept to be sure, because it shows how to reduce the number of parameters when a particular service is needed many times. Other similar situations are not hard to envision, whereby multiple parameters that do not change from call to call can be placed into a subprogram that itself requires only one or two arguments.

This technique can be extended to several BASIC statements that use more parameters than might otherwise be apparent. For example, whenever you use LOCATE, additional hidden parameters are passed to the B$LOCT routine beyond those you specify. The statement LOCATE X, Y generates 22 bytes of code, even though other called routines that take two parameters need only 13. (Every passed parameter generates four bytes of code, and the actual CALL adds five more. This is the same whether the routine being called is an internal BASIC statement, a BASIC subprogram or function, or an assembly language routine.) Therefore, if you use LOCATE with two arguments frequently in a program, you can save nine bytes for each by creating a BASIC subprogram that performs the LOCATE:

   SUB LocateIt(Row, Column) STATIC
     LOCATE Row, Column
   END SUB

Similarly, if you frequently turn the cursor on and off, you should create two subprograms--perhaps called CursorOn and CursorOff--that invoke LOCATE. Since no parameters are required, the savings will add up quickly. Calling either of the subprograms below generates only five bytes of code, as opposed to 18 for the statement LOCATE , , 1 and 20 for LOCATE , , 0.

   SUB CursorOn STATIC
     LOCATE , , 1
   END SUB

   SUB CursorOff STATIC
     LOCATE , , 0
   END SUB

The COLOR statement also requires more parameters than the number of arguments you give. Where COLOR FG, BG generates 22 bytes of compiled code, CALL ColorIt(FG, BG) creates only 13. CLOSE is yet another BASIC statement that accepts multiple arguments, and it too requires hidden parameters. Using CLOSE #X compiles to 13 bytes, and CALL CloseIt(X) is only nine.

The reason that BASIC sends more parameters than you specify is because these routines need extra information to know which and how many arguments were given. In the case of LOCATE, each argument is preceded with a flag that tells if the next one was given. CLOSE is similar, except the last parameter tells how many file numbers were specified. Remember, you can use CLOSE alone to close all open files, or CLOSE 1, 3, 4 to close only those files numbers. Therefore, BASIC requires some way to tell the CLOSE statement how many file numbers there are.

Another place where several statements can be consolidated within a single procedure is when peeking and poking memory. BASIC's PEEK and POKE are limited because they can access only one byte in memory at a time. But many useful memory locations are in fact organized as a pair of bytes, as you will see in Chapter 10. Instead of using code to combine or separate the bytes each time memory is accessed, you can use the following short routines that let you peek and poke two bytes at once.

DECLARE FUNCTION PeekWord%(Address%)
  PeekWord% = PEEK(Address%) + 256 * PEEK(Address% + 1)
END FUNCTION

DECLARE SUB PokeWord(Address%, Value%)
  POKE Address%, Value% AND 255
  POKE Address% + 1, Value% \ 256
END SUB

Because these routines use BASIC's PEEK and POKE, you still need to use DEF SEG separately. Of course, the segment could be added as another parameter, and assigned within the routines:

DECLARE FUNCTION PeekWord%(Segment%, Address%)
  DEF SEG = Segment%
  PeekWord% = PEEK(Address%) + 256 * PEEK(Address% + 1)
END FUNCTION

Word Wrapping

A string handling technique you will surely find useful is implementing word wrapping. There are a number of ways to do this, and the following code shows one that I have found to be very efficient.

DEFINT A-Z
SUB WordWrap (X$, Wide, LeftMargin)

  Length = LEN(X$)   'remember the length
  Pointer = 1        'start at the beginning of the string
  IF LeftMargin = 0 THEN LeftMargin = 1

  'Scan a block of Wide characters backwards, looking for a blank.  Stop
  '  at the first blank, or upon reaching the beginning of the string.
  DO
    FOR X = Pointer + Wide TO Pointer STEP -1
      IF MID$(X$, X, 1) = " " OR X = Length + 1 THEN
        LOCATE , LeftMargin
        PRINT MID$(X$, Pointer, X - Pointer);
        Pointer = X + 1
        WHILE MID$(X$, Pointer, 1) = " "
          Pointer = Pointer + 1
        WEND
        IF POS(0) > 1 THEN PRINT
        EXIT FOR
      END IF
    NEXT
  LOOP WHILE Pointer < Length
END SUB

The WordWrap subprogram expects the text for display to be in a single long string. You pass it that text, a left margin, and a width. You could certainly add enhancements to this routine such as a color parameter, or the ability to format the text and send it to a printer or disk file.

Unusual Ways to Access Display Memory

If you ever tried to print a character in the lower-right corner of the display screen, you probably discovered that it cannot be done [with many BASIC versions] without causing the screen to scroll up. The only solution I am aware of is to use POKE to assign the character (and optionally its color) to display memory directly as shown below.

   DEF SEG = &HB800     'use &HB000 for a monochrome display
   POKE 3998, 65        'ASCII code for the letter "A"
   POKE 3999, 9         'bright blue on black

The second trick also uses display memory in an unconventional manner. All video adapters contain at least 4096 bytes of on-board memory. Even though a 25 line by 80 column text mode screen uses only 4000 bytes (2000 characters plus 2000 colors), memory chips are built in multiples of 1,024 bytes. Therefore, you can use the last 96 bytes on the display adapter in your programs. If the adapter supports multiple video pages, then you can use the last 96 bytes in each 25-line page.

One use for this memory is to provide a way to communicate small amounts of information between separate programs. When you don't want to structure an application to use CHAIN, the only other recourse is to use a disk file to pass information between the programs. But if all that is needed is a file name or drive letter, using a file can be awkward and slow, especially if the program is running from a floppy disk.

One way to access this video memory is with PEEK and POKE. But PEEK and POKE are awkward too, and can access only one byte at a time. A better approach is to use an assembly language routine to copy one contiguous memory block to another location. The MemCopy routine below is designed to do exactly this.

;MEMCOPY.ASM, copies a block of memory from here to there

.Model Medium, Basic
.Code

MemCopy Proc Uses DS ES SI DI, FromAdr:DWord, ToAdr:DWord, NumBytes:Word

  Cld               ;copy in the forward direction

  Mov  SI,NumBytes  ;get the address for NumBytes%
  Mov  CX,[SI]      ;put it into CX for copying below

  Les  DI,FromAdr   ;load ES:DI with the source address
  Lds  SI,ToAdr     ;load DS:SI with destination address

  Shr  CX,1         ;copy words instead of bytes for speed
  Rep  Movsw        ;do the copy
  Adc  CX,CX        ;this will set CX to either 0 or 1
  Rep  Movsb        ;copy the odd byte if necessary

  Ret               ;return to BASIC

MemCopy Endp
End

MemCopy may be declared and called in two different ways. The first uses SEG and is most appropriate when you are copying data between variables, for example from a group of elements in one array to elements in another. The second lets you specify any arbitrary segment and address, and it requires the BYVAL modifier either in the DECLARE statement, the CALL, or both. Each method is shown below.

   DECLARE SUB MemCopy(SEG AnyVar1, SEG AnyVar2, Numbytes%)
   CALL MemCopy(AnyVar1, AnyVar2, NumBytes%)

   DECLARE SUB MemCopy(BYVAL Seg1%, BYVAL Adr1%, BYVAL Seg2%, _
     BYVAL Adr2%, NumBytes%)
   CALL MemCopy(SourceSeg%, SourceAdr%, DestSeg%, DestAdr%, NumBytes%)

You may also use a combination of these, perhaps with SEG for the source argument and BYVAL for the second. For example, to copy a 20-byte TYPE variable to the area just past the end of video memory on a color display adapter you would do this:

   CALL MemCopy(SEG TypeVar, BYVAL &HB800, BYVAL 4000, 20)

In many cases you may need to use MemCopy in more than one way in the same program. For this reason it is probably better not to declare it at all. Once a subprogram or function has been declared, BASIC will refuse to let you change the number or type of parameters. But if you don't include a declaration at all, you are free to use any combination of SEG and BYVAL, and also any type of variable.

It is important to understand that numeric and TYPE variables should be specified using SEG, so MemCopy will know the full address where the variable resides. You could use a combination of BYVAL VARSEG(Variable) and BYVAL VARPTR(Variable), but that is not quite as efficient as SEG. Copying to or from a conventional string using QuickBASIC requires SADD (string address) instead of VARPTR; far strings in BASIC 7 require SADD, and also SSEG (string segment) instead of VARSEG.

Rebooting a PC

Another simple trick that is not obvious to many programmers is how to reboot a PC. Although most PC technical reference manuals show an interrupt service for rebooting, that simply does not work with most computers. However, every PC has a BIOS routine that is at a fixed address, and which may be called directly like this:

   DEF SEG = &HFFFF
   CALL Absolute(0)

The Absolute routine is included in thee QB and QBX libraries that come with BASIC. If a cold boot with the full memory test and its attendant delay is acceptable, then the code shown above is all that you need. Otherwise, you must poke the special value &H1234 in low memory as a flag to the BIOS routine, so it will know that you want a warm boot instead:

   DEF SEG = 0
   POKE &H473, &H12
   POKE &H472, &H34
   DEF SEG = &HFFFF
   CALL Absolute(0)

Integer Values Greater Than 32K

As you learned in Chapter 2, an integer variable can hold any value between -32768 and 32767. When this range of numbers is considered, the integer is referred to as being a signed number. But the same range of values can also be treated as unsigned numbers spanning from 0 through 65535. Since BASIC does not support unsigned integers, additional trickery is often needed to pass values between 32768 and 65535 to assembler routines and DOS and BIOS services you invoke with CALL Interrupt. One way to do this is to use a long integer first, and add an explicit test for values higher than 32767:

   Temp& = NumBytes&
   IF Temp& > 32767 THEN
     IntBytes% = Temp& - 65536
   ELSE
     IntBytes% = Temp&
   END IF

To reverse the process you would test for a negative value:

   IF IntBytes% < 0 THEN
     NumBytes& = IntBytes% + 65536
   ELSE
     NumBytes& = IntBytes%
   END IF

Although this method certainly works, it is inefficient because of the added IF testing. When you merely need to pass a variable to a called routine, you can skip this testing and simply pass the long integer directly. This may appear counter to the rule that you must always pass the same type of variable that a subroutine expects. But as long as the arguments are not being passed by value using BYVAL, this method works and adds no extra code.

When a parameter is passed to a subprogram or function, BASIC sends the address of its first byte as shown in Figure 9-1.

Figure 9-1: Passing a long integer where a regular integer is expected.

Here, B1, B2, and so forth refer to the Bytes 1 through 4 of a long integer variable. Since the assembly language routine is expecting a regular integer, it looks at just the first two bytes of the variable. Thus, a long integer can be used even when a conventional integer is expected. Of course, any excess greater than 65535 will be ignored by the routine, since the bits that hold the excess are in the third and fourth bytes.

Benchmarking

Throughout this book I have emphasized the importance of writing code that is as small and fast as possible. And these goals should be obvious to all but the most novice programmer. But it is not always obvious how to determine for yourself which of several approaches yields code that is the smallest or fastest. One way is to use Microsoft CodeView, which lets you count the bytes of assembler code that are generated. This is how I obtained the byte counts stated throughout this book.

But smaller is not always faster. Further, the code that BASIC generates is not the whole story. In many cases BASIC makes calls to its runtime library routines, and you would have to trace through those as well to know the total byte count for a given statement. It is not impossible to trace through the BASIC runtime using CodeView, but it certainly can be tedious. Many of BASIC's internal routines are very convoluted--especially those that allocate and deallocate string and other memory. Often it is simpler to devise a test that executes a series of statements many times, and then time how long the test took.

As an example for this discussion, I will compare two different ways to print three strings in succession and show how to tell which produces less code, and which is faster. The first statement below prints each string separately, and the second combines the strings and then prints them as one.

   1: PRINT X$; Y$; Z$
   2: PRINT X$ + Y$ + Z$

Since the length of each string will certainly influence how long it takes to print them, each of the strings is first initialized to 80 characters as follows:

   X$ = STRING$(80, "X")
   Y$ = STRING$(80, "Y")
   Z$ = STRING$(80, "Z")

It is important to understand that the PRINT statement itself will be a factor, since it takes a certain amount of time to copy the characters from each string to display memory. Worse, if the screen needs to be scrolled because the text runs past the bottom of the display, that will take additional time. To avoid the overhead of scrolling, the test program uses LOCATE to start each new print statement at the top of the screen. Of course, using LOCATE adds further to the overhead, but in this case much less than scrolling would. To prove this to yourself, disable the line that contains the LOCATE statement. Here's the complete benchmark program:

CLS
X$ = STRING$(80, "X")    'create the test string
Y$ = STRING$(80, "Y")
Z$ = STRING$(80, "Z")

Synch! = TIMER           'synchronize to TIMER
DO
  Start! = TIMER
LOOP WHILE Start! = Synch!

FOR X = 1 TO 1000        '1000 times is adequate
  LOCATE 1
  PRINT X$; Y$; Z$
NEXT

Done! = TIMER            'calculate elapsed time
Test1! = Done! - Start!

Synch! = TIMER           'as above
DO
  Start! = TIMER
LOOP WHILE Start! = Synch!

FOR X = 1 TO 1000
  LOCATE 1
  PRINT X$ + Y$ + Z$
NEXT

Done! = TIMER
Test2! = Done! - Start!

PRINT USING "##.## seconds using three strings"; Test1!
PRINT USING "##.## seconds using concatenation"; Test2!

Notice the extra step that synchronizes the start of each test to BASIC's TIMER function. As you probably know, the PC's system time is updated approximately 18 times per second. Therefore, it is possible that the test loop could begin just before the timer is about to be incremented. In that case the elapsed time would appear to be 1/18th second longer than the actual time. To avoid this potential inaccuracy, the DO loop waits until a new time period has just begun. There is still a similar accuracy loss at the end of the test when Done! is assigned from TIMER. But by synchronizing the start of the test, the error is limited to 1/18th second instead of twice that.

When you compile and run this program using QuickBASIC 4.5, it will be apparent that the first test is more than three times faster than the second. However, with BASIC 7.1--using either near or far strings--the second is in fact slightly faster. Therefore, which is better depends on the version of your compiler, and there is no single best answer. Now let's compare code size.

The disassemblies shown below are valid for both QuickBASIC 4.5 and BASIC 7.1. By counting bytes you can see that printing the strings using a semicolon generates 27 bytes, while first concatenating the strings requires 29 bytes.

PRINT X$; Y$; Z$
  B83600          MOV   AX,X$   ;get the address for X$
  50              PUSH  AX      ;pass it on
  9AD125FF4A      CALL  B$PSSD  ;print with a semicolon
  B83A00          MOV   AX,Y$   ;as above for Y$
  50              PUSH  AX
  9AD125FF4A      CALL  B$PSSD
  B83E00          MOV   AX,Z$
  50              PUSH  AX
  9AD625FF4A      CALL  B$PESD  ;print with end of line

PRINT X$ + Y$ + Z$
  B83600          MOV   AX,X$   ;get the address for X$
  50              PUSH  AX      ;pass it on
  B83A00          MOV   AX,Y$   ;get the address for Y$
  50              PUSH  AX      ;pass that on too
  9AD728FF4A      CALL  B$SCAT  ;call String Concatenate
  50              PUSH  AX      ;pass the combined result
  B83E00          MOV   AX,Z$   ;get the address for Z$
  50              PUSH  AX      ;pass it on
  9AD728FF4A      CALL  B$SCAT  ;combine that too
  50              PUSH  AX      ;pass X$ + Y$ + Z$
  9AD625FF4A      CALL  B$PESD  ;print with end of line

Even though the first example uses a single PRINT statement, BASIC treats it as three separate commands:

   PRINT X$;
   PRINT Y$;
   PRINT Z$

The second example that concatenates the strings requires slightly more code because of the repeated calls to the B$SCAT (string concatenate) routine. Therefore, if you are using QuickBASIC it is clear that printing the strings separately is both smaller and faster. BASIC PDS users must decide between slightly faster performance, or slightly smaller code.

These tests were repeated 1000 times to minimize the inaccuracies introduced by the timer's low resolution. Since this method of timing can be off by as much as 1/18th second (55 milliseconds), for test results to be accurate to 1% the test must take at least 5.5 seconds to complete. In most cases that much precision is not truly necessary, and other factors such as the time to use LOCATE will prevent absolute accuracy anyway.

It is important that any timing tests you perform be done after compiling the program to an .EXE file. The BASIC editor is an interpreter, and is generally slower than a stand-alone program. Further, the reduction in speed is not consistent; some statements are nearly as fast as in a compiled program, and some are much slower.

To obtain more accurate results than those shown here requires some heavy ammunition; I recommend the Source Profiler from Microsoft. This is a utility program that times procedure calls within a running program to an accuracy of one microsecond. The Source Profiler supports all Microsoft languages including QuickBASIC and BASIC PDS.

To time a program you must compile and link it using the /zi and /co CodeView switches. This tells BASIC and LINK to add symbolic information that shows where variables and procedures are located, and also relates each logical line of source code to addresses in the .EXE file. The Source Profiler then uses this information to know where each source-language statement begins and ends.

You should also understand that there's a certain amount of overhead associated with the timing loop itself. Any FOR/NEXT loop requires a certain amount of time just to increment the counter variable and compare it to the ending value. Fortunately, this overhead can be easily isolated, using an empty loop with the same number of iterations. The short complete program that follows shows this in context.

Synch! = TIMER
DO
  Start! = TIMER
LOOP WHILE Start! = Synch!

FOR X& = 1 TO 50000
NEXT

Done! = TIMER
Empty! = Done! - Start!
PRINT USING "##.## seconds for the empty loop"; Empty!

Synch! = TIMER
DO
  Start! = TIMER
LOOP WHILE Start! = Synch!

FOR X& = 1 TO 50000
  X! = -Y!
NEXT

Done! = TIMER
Assign! = Done! - Start!
PRINT USING "##.## seconds for the assignments"; Assign!

Actual! = Assign! - Empty!
PRINT USING "##.## seconds actually required"; Actual!

Summary

In this chapter you learned a variety of programming shortcuts and other techniques. You saw firsthand how it is more efficient to avoid using CHR$ and other BASIC functions repeatedly, in favor a single call ahead of time when possible. In a similar vein, you can reduce the size of your programs by consolidating multiple instances of UCASE$, MID$, LTRIM$, and other functions once before a series of IF tests, rather than use them each time for each test.

You also learned that assigning multiple variables in succession from another often results in smaller code than assigning from the same numeric constant. Short circuit expression evaluation was described, and examples showed you how that technique can improve the efficiency of a QuickBASIC program. But since BASIC PDS already employs this optimization, multiple AND conditions are not needed when using that version of compiler.

This chapter explained the importance of reducing the number of parameters you pass to a subprogram or function, and showed how you can use GOSUB to invoke a central handler that in turn calls the routine. Likewise, when using BASIC statements such as LOCATE, COLOR, and CLOSE that require additional arguments beyond those you specify, a substantial amount of code can be saved by creating a BASIC subprogram wrapper. Examples for turning the cursor on and off were shown, and these can save 13 and 15 bytes per use respectively.

Several programming techniques were shown, including a word wrap subprogram, a numeric rounding function, and a simple way to reboot the PC. You also learned how small amounts of data can be safely stored in the last 96 bytes of video memory, perhaps for use as a common data area between separately run [non-chained] programs.

Finally, this chapter included a brief discussion of some of the issues surrounding benchmarking, and explained how to obtain reasonably accurate statement timings. To determine the size of the compiler-generated code requires disassembling with CodeView.

Chapter 10 continues with a complete list of key addresses in low memory you are sure to find useful, and discusses each in depth along with accompanying examples.

�

QB CULT MAGAZINEVol. 2 Iss. 4 - October 2001

BASIC Techniques and Utilities, Chapter 9Program Optimization