QB CULT MAGAZINE
Issue #8 - January 2001

BASIC Techniques and Utilities, Chapter 4
Debugging Strategies

There are many individual components which contribute to a completed application. The logical flow of the program must be determined, the user interface must be designed, and appropriate algorithms must be selected. But no matter how much effort you devote to the design and implementation of a program, the bottom line is it must also work correctly.

In an ideal scenario, you would begin writing a program by first jotting down some notes that describe its operation. Next, you would create an outline listing each of the program's major components. You would then determine all of the subroutines and functions that are needed, and perhaps even create a flow chart showing each of the paths that could be taken. Properly prepared for any situation that might arise, you finally write the actual code and find that it works perfectly. Now, what's wrong with this picture? Few people actually program that way!

In practice, many programmers simply start coding with little forethought and no detailed plan. They begin with the first statement and continue to the last, occasionally reworking portions into subroutines as necessary. After all, planning is not nearly as much fun as programming, and everyone knows that fun is the most important part. Believe it or not, I agree. There's nothing really wrong with plodding through a program, stabbing here and there until it works. Indeed, some great algorithms developed out of aimless doodling. I have personally never drawn a flow chart, and I have no plans to start now.

What I will address here is how to find and correct problems when they do occur. There are more things that can go wrong with a program than can go right, and tracking down an elusive "Illegal function call" error that appears only occasionally is definitely not much fun. How quickly you can solve these problems is directly related to your understanding of programming in general, and to your familiarity with the tools available.

In this chapter you will learn how to identify problems in your programs, and also how to solve them. Programming errors, or bugs, can be as simple as a misspelled variable name, and as complex and ornery as an internal flaw in BASIC itself. The BASIC editing environment provides a wealth of powerful debugging features, and understanding how to use them will help you produce programs that are reliable and error free.

Common Programming Errors

There are three distinct types of programming errors: simple misspellings and other naming or syntax errors, incorrect logic such as misunderstanding or incorrectly coding an algorithm, and failing to understand some of the finer points of the BASIC language. No matter how carefully you type, no matter how much forethought you apply to a particular problem, and no matter how often you read the BASIC manuals, it is impossible to completely avoid making mistakes.

The first category includes those errors caused by simple mistakes such as misspelling a variable or procedure name. Trying to call a subprogram that doesn't exist will be immediately obvious, because BASIC gives you an error message before the program can be run. But an incorrect variable name will return the wrong results with no warning.

Passing the wrong number of arguments to a procedure may or may not be reported, depending on whether the routine has been declared. Assembly language routines in a Quick Library can be particularly pesky in this regard. Although BASIC automatically generates a DECLARE statement for BASIC subprograms and functions you have loaded in source form, it does not do this for routines in a Quick Library. If you call an assembly language routine incorrectly, you will probably crash the PC. However, it is also possible to corrupt string memory and not know it. Worse, a "String space corrupt" error is often not reported until much later in the program. If you run the short program below in the QuickBASIC 4.5 editor, it will appear to operate correctly.

X$ = SPACE$(1000)       'create a string
POKE SADD(X$) - 2, 100  'corrupt string memory
PRINT "Testing"
X% = 1
PRINT "More testing"
X% = 2
PRINT "Yet more testing"
X% = 3

Here, the POKE statement is overwriting the back pointer that belongs to X$, which is one type of string corruption that can occur. But QuickBASIC doesn't know that this has happened, because it has no reason to check the integrity of its string memory until another string assignment is made. However, adding the statement PRINT FRE("") anywhere after the POKE command causes BASIC to check string memory, and report the error. Even if your program does not use POKE, calling a procedure incorrectly can cause it to overwrite memory in this fashion.

Another simple error is inadvertently using the same variable name twice, or omitting a type declaration character from a variable name. For example, if you are using a variable named Bytes& to track how many bytes of a file have been read, accidentally using Bytes later on will give the wrong results. If a DEFINT statement is in effect, then Bytes will be an integer variable. Otherwise, it will be single precision which is also incorrect. Unless you use the DIM...AS statement to declare a variable explicitly, BASIC lets you have different variables with the same name. That is, Var%, Var!, and Var# can all coexist in the same program, and each is a unique variable.

Similarly, using the wrong variable entirely will cause your program to operate incorrectly, and again with no error message displayed. More than once I have had a program with one FOR loop nested within another, and used the outer loop counter variable when I meant to use the inner one.

Another common situation is caused by changing the name of a variable during the course of writing a program. For example, you may have a variable named BPtr that tracks where you are reading within a buffer. If you later decide to change that name to BufPointer because it is more meaningful, you must also remember to change all occurrences of the name. Of course, BASIC's search and replace feature minimizes that problem. More important, though, you must make a mental note to use the new name as you continue to develop the program.

Forgetting to declare a function can also lead to incorrect results that produce no warning. If an integer function is not declared, then BASIC will dimension an array with that name if the function expects a numeric argument. When BASIC encounters the statement X = FuncName%(Y%) it assumes that FuncName% is an integer array, and create an array containing the default 11 elements. In this case X will be assigned a value of zero, or you will receive a "Subscript out of range" error if Y% is not between 0 and 11. I once observed an unexplainable "Out of string space" error that was caused by the statement Size = ScreenSize%(ULRow, ULCol, LRRow, LRCol). ScreenSize% was a function present in a Quick Library, but without a DECLARE statement BASIC created a 4-dimensional integer array.

Logic Errors

The second cause of bugs is logic errors, and these include adding when you meant to subtract, or using the wrong variable altogether. Programs that manipulate pointers (variables that hold the addresses of other variables) are particularly prone to errors in logic. Another common logic error is forgetting to trim the leading or trailing blanks from a file or directory name before using it. If the operator enters " c:\thisfile.dat" and you try to open that file, BASIC will report a "Bad file name" error.

Another cause of logic errors is failing to consider all of the things a user may enter. An inexperienced operator is likely to enter data that you as the programmer would never consider, or select menu items in an order that makes no sense. Indeed, never underestimate the value of beta testers. After you have exhausted all of the possibilities you can think of, give the program to a 4 year old child, and ask him or her to try it while you watch. Your uncle Ernie would be a good beta tester too, and the less he knows about your program, the more valuable his contribution will be. People who know absolutely nothing about computers have an uncanny knack for creating "Illegal function call" errors in a program that you just know is perfect.

Similarly, you must consider all of the possible error conditions that could happen in a program. In an error handler that has a CASE statement for each possibility you anticipate, also include a CASE ELSE clause for those you haven't thought of. The short listing that follows shows a typical error handler that incorporates this added safety measure.

ON ERROR GOTO HandleErr
  ...
  ...
HandleErr:
  SELECT CASE ERR
    CASE 7, 14
      PRINT "Out of memory"
    CASE 24, 25, 27
      PRINT "Fix the printer"
    CASE 53
      PRINT "File not found"
    CASE ELSE
      PRINT "Error number"; ERR
  END SELECT
  ...
  ...

The CASE ELSE clause lets you accommodate any possibility, and your user can then at least report to you what the error number was. This simple example doesn't include all of the possibilities, but you can certainly see the general concept.

Another common logic error is using the same file number twice. When a file has been opened as #1, that number remains in use until the file is closed. This can be problematical when writing reusable modules, since there is no way to know which files may be in use by the main program. Some programmers use #99 or another unlikely number in a routine that will be reused in many programs. But even that approach is flawed, because you have to remember which numbers are used by which routines.

BASIC's FREEFILE function is intended to solve this problem, and it returns the next available file number. Be sure to save the results FREEFILE returns, however, since the value will change as soon as the next file is opened. The code below shows both the wrong and right ways to use FREEFILE.

Wrong:

  OPEN "accounts.dat" FOR INPUT AS #FREEFILE
  INPUT #FREEFILE, X$    'FREEFILE has changed! 
  CLOSE #FREEFILE

Right:

  FileNum = FREEFILE    'get and save the number
  OPEN "accounts.dat" FOR INPUT AS #FileNum
  INPUT #FileNum, X$
  CLOSE #FileNum

In the first example if FREEFILE returns, say, a value of 2, then it will return 3 at the INPUT statement which is of course incorrect. Therefore, you must save the value FREEFILE returns, and use that for all subsequent file accesses. This situation also occurs with INKEY$, because once a character has been returned it is no longer available unless you saved it.

Two other frequent problems are attempting to use LSET to assign characters into a string that does not exist, and failing to clear a counter variable within a static subprogram or function. The second problem can be especially frustrating, because the routine will work correctly the first time it is invoked. In the function below, a counter returns the number of embedded control characters it finds in a string.

FUNCTION CtrlCount%(Work$) STATIC

  FOR X% = 1 TO LEN(Work$)
    IF ASC(MID$(Work$, X%, 1)) < 32 THEN
      Count% = Count% + 1
    END IF
  NEXT

  CtrlCount% = Count%    'return the count

END FUNCTION

The problem here is that Count% retains its value between function invocations. Therefore, each time CtrlCount% is used it will return ever higher values. One solution is to add the statement Count% = 0 at the beginning of the function. Another is to omit the STATIC option from the function definition.

Understanding BASIC's Quirks

The third type of error is caused by not understanding some of BASIC's finer points and quirks. For example, some people do not realize that omitting the third argument from MID$ causes it to return all of the remaining characters in a string. To see if a drive letter was given as part of a file name and if so extract it, you might use a statement such as IF MID$(FileName$, 2) = ":" THEN Drive$ = LEFT$(FileName$, 1). But since the number of characters was not specified to MID$, it returned all but the first character in the string. Unless the string was a drive letter and colon only ("C:"), the test for a colon could never work. The solution, of course, is to use MID$(FileName$, 2, 1).

Another instance in which an intimate knowledge of BASIC's idiosyncracies comes into play can affect the earlier example of a file name that contains leading blanks. Most programmers do not use INPUT to accept information, unless the program is very simple and it will be used only occasionally. However, asking for a file name with INPUT is one way to avoid that problem, because INPUT strips all leading and trailing blank spaces, as well as CHR$(9) tab characters. The more useful LINE INPUT, on the other hand, does not strip leading blanks and tabs. Most programmers would never be so foolish as to enter a file name with leading blanks. So this is yet another situation where it is important to consider all of the possibilities.

It is also possible to crash a program by using the ASC function when the string might be null. Again, *you* would never press Enter alone in response to a prompt for a file name or other mandatory information, but someone else might.

Another BASIC quirk is caused by rounding errors. As you saw in Chapter 2, adding or multiplying many numbers in succession can produce results that are not precisely correct. Instead of checking to see if a value is zero, it is often better to compare it to a very small number. That is, instead of IF Value# = 0 you would use IF Value# < .000001 or IF Value# < .000001 AND Value# > -.000001 or something similar. Also, some numbers simply cannot be represented at all. If you try to enter the statement X# = .00000000001 in the QuickBASIC 4.5 editor, the value will be converted to 9.999999999999999D-12 as soon as you press Enter.

Although not technically a BASIC quirk, many programmers forget that variables within a DEF FN function are by default global. Unless you include an explicit STATIC statement listing each variable that is to be local to the function, it is likely that an unexpected change will be made to a variable in the main program.

Some programming situations require that you obtain the address of a string variable using SADD. However, SADD is not legal for use with a fixed-length string or the string portion of a TYPE variable. More important, when using BASIC PDS far strings you must also remember to use SSEG to get the string's data segment. Using VARSEG will not create an error; however, the program will not work correctly.

Related to that, it is important to remember that strings and dynamic arrays move around in memory--often at unexpected times. The program below appends a zero character to one string for each zero that is found in another string. Since BASIC may move Work$ during the course of assigning Zero$, this code will fail eventually:

Address = SADD(Work$)
FOR Y = Address TO Address + LEN(Work$) - 1
  IF PEEK(Y) = 48 THEN Zero$ = Zero$ + "0"
NEXT

Another particularly insidious bug can result if you inadvertently add parentheses around a variable that is passed to a subprogram or function. In the example below, a subprogram that intentionally modifies a parameter has been declared and is then called without the CALL keyword.

DECLARE SUB Square(Param%)
Square (Value%)

SUB Square(Value%) STATIC
  Value% = Value% * Value%
END SUB

Because of the unnecessary and incorrect use of parentheses, a copy of the argument is sent to Square instead of the argument itself, with the result that Value% is never actually changed. The fix is to either remove the parentheses, or add the word CALL. Another, related issue is placing a DEFINT after DECLARE statements. In the example below, the parameters X, Y, and Z are assumed by BASIC to be single precision, even though this is clearly not what was intended.

DECLARE SUB (X, Y, Z)  'X, Y, and Z are singles!
DEFINT A-Z
 .
 .

The final issue I want to address here is potential overflow errors. The statement IF IntVar% * 14 > 1000000 can never be true, because BASIC performs integer math assuming an integer range only. Unless you compile your program using the /d debug option, the error will be unreported in a compiled program. If this statement is executed within the QB environment, BASIC will report an overflow error, even though the instruction certainly appears to be legal. But since integer math assumes an integer result, the product of IntVar% times 14 will overflow the range of integer values if IntVar% is greater than 2,340.

One solution is to use a long integer for IntVar, and BASIC will then use the range of long integers for the comparison. Using a long integer wastes memory, however, and calculations on long integers are slower and require more code to implement. A much better solution is to use CLNG (Convert to Long), which tells BASIC to assume a long integer result.

The statement IF CLNG(IntVar%) * 14 > 1000000 will create a long integer version of IntVar%, and then multiply the result times 14 and use that for the subsequent comparison. Unlike the copies that BASIC makes which steal DGROUP memory, the long integer conversion in this instance is handled within the CPU's registers. CLNG when used this way is really just a compiler directive, as opposed to a called library routine. Another solution is to add an ampersand after the constant 14, thus: IF IntVar% * 14& > 1000000. Again, no additional DGROUP memory is used to handle 14 as a long integer value.

Another interesting use of CLNG and CINT--unrelated to debugging but worth mentioning none the less--is to reduce the size of comparison code. When you use a statement such as IF X% > VAL(Some$), a floating point comparison is performed even if Some$ holds an integer value. By replacing that example with IF X% > CINT(VAL(Some$)) 6 bytes of code can be saved. The CINT tells BASIC that it will not have to perform any floating point rounding when it compares the two values.

Debugging and Testing Techniques

When you are developing a large application that is comprised of many individual modules, there are several useful debugging techniques you can employ. One is to create short test-bed programs that exercise each subprogram and function. Finding an error in a complex program with many interdependencies between subroutines can be a tedious prospect at best. If you instead create a small program whose sole purpose is to test a particular subprogram, you will be better able to focus on just that routine.

Another useful technique for detecting and preventing sporadic errors is to test your code on "boundary conditions". If you have a routine that reads and process a file in 4K (4096 byte) increments, test it with a file that is exactly 4096 bytes long, as well as with other test files that are 4095 and 4097 bytes long.

Perhaps nothing is more frustrating than having a program fail with the message "xxx at line No line number". This message is a throw-back to the days when all BASIC programs had to use line numbers. Now that line numbers are not required in modern compiled BASIC, most programmers do not use them, opting instead for more descriptive line labels when labels are needed at all. When an error does occur and the program has been compiled with /d, BASIC reports the number of the nearest numbered line preceding the line in which the error occurred.

A good solution to track down the cause of such errors is to use a variant on a hardware debugging technique known as the "cut in half" method. In a complex electronic circuit that does not work, using this technique means that the circuit is first checked at its mid-point for the correct signal. If the circuit tests correctly at that point, then the error is in the second half. Therefore, the test engineer would "cut in half" again, and test at a point halfway between the middle and the end. If the test fails there, then the problem must lie between the middle of the circuit and that point.

In a purely software situation, you would add a line number to a line that falls approximately half-way through the program. If that number is reported, then the problem is occurring in the second half of the program. An enhancement to this technique that I recommend is to add, say, ten line numbers in evenly spaced increments throughout the program. This will let you quickly isolate the problem to a much smaller portion of the program.

Besides the line number (or lack of line number) that BASIC reports, the segment and address at which the error occurred is also reported. This is information is frankly useless in a purely BASIC environment. You must either use CodeView to identify the line that is associated with the error, or view the assembly language output that BC can optionally generate. These will be described in the section on advanced debugging later in this chapter.

Finally, it is important to point out that you should never use ON ERROR while a program is being developed. ON ERROR can hide programming errors that you need to know about. As an example, a LOCATE statement with incorrect values will generate an "Illegal function call" error. But if ON ERROR is in effect and your program uses RESUME NEXT for errors it is not expecting, you may never even know that an error occurred. If you run the complete program below you can see that there is no indication that an error occurred at the obviously illegal LOCATE statement.

CLS
ON ERROR GOTO HandleErr
LOCATE 100, -90
PRINT "My program seems to work fine."
END

HandleErr:
RESUME NEXT

Using The QB and QBX Editing Environments

The single most powerful debugging feature that is available to you is the BASIC editing environment. More than just an editor that you can use to enter program statements, the QB environment is exactly that: a complete editing environment for developing and testing BASIC programs. The BASIC editor lets you enter program statements, single-step through a program, examine variable values, and much more. Besides being able to execute commands singly and in sequence, you can also trace into subroutines and functions, and even run your program in reverse.

The primary advantage of using the QB environment instead of a separate editor is the enhanced debugging capabilities. In most high-level languages, you first write a program using an editor, and then compile and run it to see if it works correctly. If an error occurs, you must start the editor again, load your program, and study the code to see what went wrong. In contrast, QB lets you run your program at the same time it is being edited. You can even modify the program while it is running and then resume execution, view and change variable values, and change the order in which statements are executed.

Further, BASIC can be instructed to stop and return to the edit mode when the program reaches a certain statement, or when a particular logical condition becomes true. For example, you can tell BASIC to halt the program when a variable takes on a specified value. These are extremely powerful debugging tools which have no equal in any other language. In the sections that follow, I will describe each of these capabilities in detail.

Step and Trace Debugging

Early versions of Microsoft BASIC offered a very primitive trace capability that displayed the line numbers of the currently executing statements. Although this was better than nothing, interpreting a blur of line numbers flashing by on the screen required a lot of mental effort. When Microsoft introduced QuickBASIC version 3.0 they added greatly improved debugging in the form of a step and trace feature. To activate step and trace you would enter a STOP statement at a selected point in the source code. When the program reached that point you could then execute each statement in sequence by pressing a function key. QuickBASIC 3 also provided the ability to display continuously the value of a single variable in a window at the top of the screen.

QuickBASIC 4.0 offered an improved version of this feature, using additional function keys to control how a program proceeds. This method has been continued with little change through current versions of QuickBASIC and BASIC PDS. Of course, the primary reason you would want to step through a program one statement at a time is to determine why it is not working. For example, if you have code that opens a file for output but the file is never created, you would step through that portion of the code to see which statements are being executed and which are not. In particular, stepping through a program lets you see which path an IF or CASE test is taking.

Two function keys are used to single-step through a program, and four additional options are available to assist program debugging. Each time the F10 key is pressed, the current statement is executed and the program advances to the next statement. If you have just loaded the program being tested, you will press F10 once to get to the first instruction. Pressing F10 again executes that statement, and continues to the next one. If the current statement is related to screen activity, the screen is switched momentarily to display the program's output rather than the source code. The screen is also switched during a CALL statement or function invocation, in case that routine performs screen output. You can optionally toggle between viewing the output and edit screens manually by pressing F4.

In some cases you may want to treat a subroutine as a single statement, which is what F10 does. That is, CALL MySub is handled as single statement, and all of the statements within the routine are executed as one operation. In other cases, however, you may need to trace into a subprogram, GOSUB routine, DEF FN, or function, to step through its statements as well. This is what F8 is for. When F8 is pressed at a CALL or GOSUB statement or function invocation, BASIC traces into the procedure and lets you watch as it executes each statement individually.

Two additional capabilities let you navigate a program more quickly. Pressing F7 tells BASIC to execute all of the statements up to the current cursor location. This way, you are spared from having to watch a long sequences of commands that you know are working correctly. For example, stepping through a FOR/NEXT loop that initializes 1000 elements in an array is usually pointless. Therefore, when you reach that spot in the program you would manually move the cursor to the statement following the NEXT, and press F7.

It is also possible to force execution to a particular point in the program using the "Set next statement" option of the Debug menu. Unlike F7, though, the statements that precede the selected line will not be executed. Therefore, this option is equivalent to adding a temporary GOTO to the program, causing it to jump to the specified line.

One of the most powerful features of the BASIC editor is that you can actually modify your program, then resume execution. In earlier versions of QuickBASIC, making even the slightest change to a program--even if only to a single comment--the entire program would have to be recompiled. BASIC can now preserve variable values and indeed the entire program state during most types of editing operations.

The last important step operation I want to mention now is the History feature. This too must be selected from a menu, and using it will slow your program's operation considerably. When the History option is selected from the Debug menu, BASIC remembers the last 25 program statements, and lets you step through your program in reverse. For example, if a variable has taken on an incorrect value, you can walk backwards through the program to see what statements caused that to happen. Where F8 steps forward through your program, Shift-F8 instead steps backward.

Watch Variables and Break Points

As powerful as BASIC's single-step feature is, it is only half of the story. Equally important is the Watch capability that lets you view a program's variables in real time. One or more variables may be placed into a special Watch window at the top of the editing screen, and their values will be displayed and updated after each statement is executed. Between the Step and Watch features, you can observe all aspects of your program's operation as it is executing.

Besides watching variable values, you can also monitor complex expressions and function results. For example, you could watch the value of X% * Y% + Z%, ASC(Work$), or the result of a function such as StrFunction$(Array$(), Count%). Because each variable or expression is updated after every program statement, your program will run more slowly when many items are displayed in the watch window. However, this is seldom a problem in a debugging situation, and the ability to see precisely what is happening far outweighs the minor speed penalty.

Being able to watch the results of expressions as well as simple variables offers some useful and interesting techniques. As an example, suppose you are watching a string variable named Buffer$. If Buffer$ is very long, you can use LEFT$ or MID$ to watch just a portion of the string: MID$(Buffer$, CurPointer%, 70). This expression displays the 70-character portion of Buffer$ that is currently pointed to by CurPointer% (assuming, of course, you are using variables with those names).

Likewise, if you are observing a string but nothing is showing in the watch window, you could watch "{" + Work$ + "}". This displays "{}" if the string is null, and shows if there are leading or trailing blanks or CHR$(0) bytes. Adding braces also lets you see if the string contains characters that begin past the edge of the visible window.

One particularly powerful use of BASIC's Watch capability is related to the fact that all of the expressions are evaluated anew at each statement. Earlier I mentioned how insidious "String space corrupt" errors can be, because BASIC checks the integrity of its string memory only when a string is being assigned. Therefore, watching the expression FRE(Any$) tells BASIC to evaluate string memory after every source line. Thus, as soon as string memory is corrupted it will be immediately reported. This technique can be extended to identify a "Far heap corrupt" error as well, by watching the expression FRE(-1).

Besides the Step and Watch capabilities, there are two additional features you should understand: Break Points and Watch Points. When a program is very large and complex, it becomes impractical to step and trace through every statement. Also, in some cases you may not know at which statement an error is occurring.

Pressing F9 sets up a Break Point which tells BASIC to halt when it reaches that point in the program, regardless of how it arrived there. You can have multiple break points, and the program will run normally until the specified statement is about to be executed. Simply place the cursor on the line at which the program is to stop, and press F9. That line will be highlighted to show that it is currently a Break Point. Pressing F9 again removes the Break Point.

A Watch Point tells BASIC to execute the program, until a certain condition becomes true. Some examples of Watch Points are X% = 100, ABS(Total#) > 1000, and FRE("") < 1000. In the first example you are telling BASIC to stop the program and return to the editor when X% equals 100. The second example will stop the program when the absolute value of Total# exceeds 1000, and the third halts it when there are less than 1000 bytes of string space remaining.

Considered together, these debugging features are extremely powerful. You can tell BASIC, in effect, "Run until the value of Count% hits 14; then stop the program, and let me walk backwards through the program to see how that happened."

Using /D to Detect Errors

Another very powerful debugging solution at your disposal is to compile your program with the /d debug option. When creating an .EXE file in the BASIC environment from the Run menu, you would select the "Produce debug code" option. Compiling with /d tells BC to add three important safeguards to the code it generates. Some of these debugging issues were described in Chapter 1, but they deserve elaboration here.

The first code addition is a call to a central event handler prior to every BASIC program statement, to detect if Ctrl-Break was pressed. Normally, a compiled BASIC program is immune from pressing Ctrl-Break and Ctrl-C, unless the program is processing an INPUT statement. BASIC adds break checking to let you get out of an endless loop or other similar situation, without having to reboot your computer.

The second addition is an overflow test following each integer and long integer addition, subtraction, and multiplication, to detect results that exceed the range of legal values. If you have a statement such as X% = Y% * Z% and the result after multiplying is greater than 32767, the overflow test will detect that and produce an error message. Otherwise, X% would be assigned an erroneous value and your program would have no way to detect it. Floating point operations do not need any additional testing, because overflows are detected and reported whether or not /d is used.

The last additional code that BASIC adds when /d is used is array element bounds checking. If you have dimensioned an array and attempt to assign an element that doesn't exist, a compiled BASIC program will normally ignore the error. For example, if an array has been dimensioned using DIM Array%(1 TO 100) and you then have the statement Array%(200) = 12, BASIC will store the value 12 at what would have been the 200th element. This can lead to disastrous consequences such as overwriting an element in another array, or corrupting string memory. When /d is used BASIC adds additional code to check every array element referenced, and reports an error if that element does not exist.

Because of the added checking for overflow errors and illegal element numbers, a program compiled with /d will be larger and run more slowly than one in which /d is not used. Therefore, you should not release a program for general use that has been compiled with the debug option. One exception worth noting is that QuickBASIC versions 4.0 and 4.5 contain a bug that generates incorrect code for certain long integer array operations. The only solution when that happens is to use /d. This way, the routine that calculates element addresses and checks for illegal element numbers is used, rather than the incorrect in-line code that BC produces directly.

You could also compile with the /ah (huge array) switch, which uses the same routine to calculate and check array element addresses. Using /ah has an advantage over /d in this case, because your program will not be halted if Ctrl-Break is pressed. Using /ah also avoids the extra code and time to check for overflow errors. However, /ah affects dynamic arrays only, and errors with static arrays will not be prevented.

When a program is run in the BASIC editor, the same protection that /d provides is employed. This added debug testing within the editor is one more contributor to its slowness when compared to a fully compiled program.

Advanced Debugging

Although being able to step through your program and watch its variables in the BASIC editing environment is very powerful, there are still some limitations inherent in that process. For example, it is possible that a program will work perfectly in the editor, but not when it has been compiled to an .EXE program. Microsoft has tried to make the BASIC editor as compatible with BC as possible, but the editor is an interpreter and not a true compiler. There are bound to be some differences in how the program runs. Another limitation is that some programs are just too large to be run within the editor. Finally, if you receive an error message from an executable program that lists only a segment and address, there is no way to determine where the error occurred using the editor.

In these cases you will need to work with the actual compiled program. To relate an error address to the original BASIC source statement you must be able to see the assembly language code that BC generates, along with the original BASIC source. One way to do this is with the Microsoft CodeView debugger. CodeView comes with BASIC PDS [and VB/DOS Professional Edition] as well as with Microsoft's Macro Assembler. CodeView provides a debugging environment that is similar to the QB editor, except it is intended for tracing through a program that has already been compiled.

Another way is to instruct BC to generate an assembly language source listing as it compiles your program. This listing shows a mix of BASIC source statements and the resultant assembly language code and addresses. However, the listing is not as clear or easy to follow as the display that CodeView presents. But if you do not have CodeView, this is your only choice. I will describe this method first.

Creating an Assembly Language Source Listing

To create an assembly language list file you use the compiler's /a switch, and then specify a list file name. The syntax is shown below, followed by a sample list file that is generated.

You enter this:

bc program /a [/other options] , , listfile;

LISTFILE.LST contains this:

                                       PAGE   1
                                       25 June 91
                                       14:28:08
  Microsoft (R) QuickBASIC Compiler Version 4.50

Offset Data  Source Line

 0030  0006  CLS
 0030  0006  INPUT Count%
 0030   **     I00002: mov   ax,0FFFFh
 0033   **             push  ax
 0034   **             call  B$SCLS
 0039   **             mov   ax,offset <const>
 003C   **             push  ax
 003D   **             call  0000h
 0040   **             pop   ax
 0041   **             add   ax,000Dh
 0044   **             push  cs
 0045   **             push  ax
 0046   **             call  B$INPP
 004B   **             jmp   $+04h
 004D   **             dw    0002h
 004F   **             db    00h
 0050   **             db    02h
 0051   **             mov   bx,offset COUNT%
 0054   **             push  ds
 0055   **             pop   es
 0056   **             push  es
 0057   **             push  bx
 0058   **             call  B$RDI2
 005D  0008  IF Count% < 100 THEN
 005D  0008     Count% = 100
 005D  0008  END IF
 005D   **             call  B$PEOS
 0062   **             cmp   word ptr COUNT%,64h
 0067   **             jl    $+03h
 0069   **             jmp   I00003
 006C   **             mov   COUNT%,0064h
 0072  0008  PRINT Count%
 0072  0008  END
 0072  0008
 0072  0008
 0072   **     I00003: push  COUNT%
 0076   **             call  B$PEI2
 007B   **             call  B$CEND
 0080   **             call  B$CENP
 0085  0008

43981 Bytes Available
43643 Bytes Free

    0 Warning Error(s)
    0 Severe  Error(s)

Here, the list file shows the original BASIC source code, as well as the generated assembly language instructions. The column at the left holds the code addresses, and these correspond to the addresses that BASIC displays when a program crashes with an error message. Unfortunately, several BASIC statements are grouped together, so it is not immediately apparent which address goes with which source statement. For example, after the BASIC statement INPUT Count%, the earlier assembly language instructions that clear the screen are shown. Similarly, the call to B$PEOS is actually part of the INPUT code, although it is listed following the IF test.

When BASIC displays an error message and ends your program by displaying a segmented address, only the address portion is meaningful. The segment in which a program is running will depend on many factors, including the DOS version (and thus its size), the FILES= and BUFFERS= values specified in CONFIG.SYS, and whether TSR programs and device drivers are loaded. Each of these factors cause the program to be loaded at a higher segment, although the addresses within that segment never change. Also, in a multi-module program, a different segment is used for each module's source file. Therefore, if the message is "Illegal function call in module XYZ at address 3456:1234", you would compile XYZ.BAS to create a list file instead of the main program. The code in the vicinity of address 1234 will be where the error occurred.

Using Microsoft CodeView

Although compiling with the /a switch lets you view the assembly language code that BASIC creates, there is little you can actually do with that information. CodeView is a much more powerful debugging tool, and it lets you step through an .EXE file as it is running. This lets you follow the compiled program's execution path, and also view its assembly language instructions. Further, CodeView can trace into BASIC's library routines, as well as calls to C or assembly language routines that you have written.

CodeView can also be used to see how many bytes of code are generated for each BASIC statement. This is a good way to compare the relative efficiency of different programming methods, to see which ones produce less code. It is important to understand that the size of the assembly language code generated for a given BASIC statement is a combination of two factors: the number of bytes the compiler generates for each occurrence of the statement, and the size of the called routine within BASIC's runtime library. Of course, the called routine is added to your program only once. However, the code that sets up and calls the routine is added each time the statement is encountered.

Compiling a program for use with CodeView is very simple, and merely requires the addition of special compiler and linker option switches. Note that you cannot compile a program for CodeView from within the QuickBASIC editor; you must compile and link manually from the DOS command line, as shown below. Also notice that the BASIC program must be saved as ASCII text, and not with the special "Fast Load" method that QB optionally uses.

bc program /zi [/other options];
link program /co [/other options];
cv program

The /zi option tells BC to write additional information into the object file, which is used by LINK and CodeView to relate each line of BASIC source code to its resultant assembly code. The more meaningfully named /co switch is required so LINK will know to do likewise. You may be interested to know that /zi is named after Microsoft legend Mark Zibikowski, whose initials (MZ) also appear as the first two bytes in every DOS .EXE file.

Once the program has been compiled and linked, start CodeView by entering CV followed by the file's first name (that is, without the .BAS or .EXE extension). You will then be presented with a screen very similar to that of the QB editor. Most versions of CodeView initially show the BASIC source code. In other versions, you must press Alt-R-R to "restart" the program and bring it to the first source line. I should point out that CodeView is a quirky program, and it is often referred to as the program that people "love to hate". It has some glaring omissions, many aspects of its interface are inconsistent and downright obnoxious, and I personally would be lost without it.

When the BASIC source is displayed, you may press F4, F7, F8, and F10, which perform the same functions as their BASIC editor counterparts. One important difference, however, is that you may also press F3 to show a mix of BASIC and assembly language code. Stepping through the program with F8 and F10 will execute either a single BASIC statement or a single assembler command, depending on the context. That is, if you are in the BASIC view mode, then you will step through the BASIC code. If the assembly language code is being displayed, then you will step through that instead.

Figure 4-1 [not available here, sorry] shows a screen snapshot of a short sample program as displayed by CodeView when it is first started in the BASIC view mode. Figure 4-2 [also unavailable] shows the same program after pressing F10 to execute up to the first statement, followed by F3 to view a mix of BASIC and assembly language. This screen is in a 50-line mode to allow the entire program to be displayed. Although it is not shown here, CodeView can continuously display the processor's registers in a small window at the right side of the screen. The register display is alternately activated and deactivated by pressing F2.

FIG4-1: The CodeView display when using the BASIC view mode.

FIG4-2: The CodeView display for the same program, but using the assembly language view mode.

Notice in Figure 4-2 that CodeView displays each BASIC statement indented and with a line number. This lets you identify where each BASIC command starts, and also which block of assembly language code it is associated with. The numbers at the left edge of the display show the segment and address of each instruction in hexadecimal notation. The segment value never changes within a single program module, although the addresses increase based on the number of bytes in each assembly language instruction. As you can see, some assembly language commands are as short as one byte, and others are as long as six.

In the first instruction, CLS, a value of -1 (FFFF hex) is passed to the CLS routine as a flag to show that no argument was given. Had the BASIC statement been CLS 2, then a value of 2 would have been moved into AX instead. Nine bytes of code are generated each time CLS is used, not counting the code within B$SCLS. Besides showing the B$SCLS routine name, CodeView also shows the segment and address at which B$SCLS resides. Knowing the routine's address is of little practical use in this situation, and it is displayed solely for informational purposes.

The INPUT statement is fairly complicated to set up, and I won't belabor what every assembly language instruction does. But several items are worth discussing. The first is that CodeView attempts to relate every number it encounters to a variable or procedure address. In many cases this is confusing, because some numbers are simply that, and have no relationship to a variable or procedure address.

For example, at address 39 the assembly language command MOV AX,40 is shown as MOV AX,b$STRTAB_END+10 (0040), as if there was some significance to the fact that the value 40 is an address ten bytes past the end of an internal string table. Likewise, two instructions later the value 40 is represented as being 31 bytes past the beginning of the B$LENDRW procedure. Two instructions past that the value 13 (0D hex) is added to AX, and again CodeView tries to establish a significance where none exists.

In not one of these cases are the values shown related to the named address, and you should therefore treat those named labels with skepticism. The only symbolic names that are meaningful in most cases are variable and procedure names that do not have an extra value added to them. In the instruction MOV Word Ptr [COUNT% (0036)],b$HEAP_FIRST (0064) at address 6C, the address for Count% (36) is valid, while the value 64 named b$HEAP_FIRST is meaningless. In this case, 64 hex represents the value 100 in the BASIC statement Count% = 100. Whatever b$HEAP_FIRST may represent, it has no meaning here.

I suggest that you enter this short program and then step through it one statement at a time, just to get a feel for how CodeView operates. You should also try tracing into some of the BASIC library calls, as well as into a simple subprogram or two of your own. Again, you may use either F10 or F8 to step through the code, but only F8 will trace into code that is being called. You can also use F8 to trace into some BIOS interrupts, but you should never try to trace through a DOS interrupt (21 hex). Many DOS services never return, or return in a non-standard manner, and a locked-up PC is the likely result. You will not hurt anything if you do trace into a DOS interrupt, but be prepared to press Ctrl-Alt-Del.

Besides being able to view and step through the assembly language code that BASIC creates, you can also view and modify your program's data directly. If you have pressed F2 to display the CPU's registers, CodeView will show the value currently in every memory address that is about to be accessed. For example, if the next statement to be executed is MOV Word Ptr [COUNT%],10, CodeView will show the current contents of the variable COUNT%.

A range of memory addresses may be displayed by entering commands into the immediate window at the bottom of the screen. When CodeView is first started, the cursor is placed at the bottom line in that window. As with the BASIC editor, the F6 key is used to toggle between the code output and immediate windows. Unlike the BASIC editor, however, you may type commands regardless of which window is active.

The three primary commands you will find useful are D, U, and R. The D (Dump) command tells CodeView to display a range of memory, starting at a given address. For example, D 0 means to show the 32 bytes that start at address 0 in the default data segment. Likewise, D ES:100 means to start at address 100 in the segment held in the ES register. Unfortunately, CodeView is particularly obtuse in this regard, because in some cases the numbers you enter are assumed to be decimal while in others it assumes hexadecimal. Which is which depends on your view perspective (selected with F3), and I won't even begin to offer a reason or explain the confusing rules. If you don't get what you expect, try adding an "&H" prefix to the number. And if you start by using &H and CodeView reports a syntax error, then try it without the &H.

When the contents of memory are displayed, they are shown as individual bytes, rather than as integer words which is generally more useful. In the listing below, two string constants have been displayed in response to the command D &H40. For space reasons, the segment and address which CodeView adds to the left of each row of values are instead shown above the rows.

>D &H40

5676:0040
02 00 44 00 48 69 23 00 4A 00 41 42 43 44 45 46
5676:0050
47 48 49 4A 4B 4C 4D 4E 4F 50 51 52 53 54 55 56

As you learned in Chapter 2, BASIC near strings have a 4-byte descriptor, with the first two bytes holding the string's current length, and the second two bytes its current address. Beginning with the first two numbers displayed, the 02 00 represents the length of a 2-character string, and the 44 00 indicates the address which is 44. The data itself is a CHR$(&H48) followed by a CHR$(&H61) ("Hi"), and it immediately follows the string descriptor. When two bytes are used to store an integer word, the least significant byte is kept in the lower memory address. Therefore, the value 0002 is actually listed as 02 00 (CodeView adds an extra blank between bytes for clarity).

Immediately following the six bytes for the string "Hi" and its descriptor is another descriptor. This one shows that the string has a length of 23 Hex bytes, and its data starts at address 4A Hex. Again, the value 0023 is shown as 23 00, and the address 004A is displayed as 4A 00. This string contains the data "ABCDEFGHIJKLMNOPQRSTUV".

The U (Unassemble) command can be used to show the assembly language source code at any arbitrary segment and address. The command U 2000:1000 will unassemble the code at address 2000:1000, though again you may need to use U &H2000:&H1000 in some view modes. The U command is not used that frequently, since CodeView is used most often to step through code in sequence, rather than to examine an arbitrary block of instructions.

The R command lets you change the contents of a register, and this might be useful when debugging your own assembly language subroutines. When you type, for example, RCX and press Enter, the current value of the CX register is displayed and you are prompted for a new value. Pressing Enter alone cancels the command and leaves the current register contents intact. Otherwise, the value you enter will be assigned to CX. This is similar to BASIC's immediate window, in which you can assign new values to a variable.

The last CodeView features worth describing here are Watch Variables and Watch Points, which are similar to the same features in QB. Unlike QB, though, you cannot use an expression as the target of a Watch; it must be a simple variable name, array element, or address. Watch Variables may be added using the pull-down menu, or by pressing Alt-W and then typing the variable name. If you are in the BASIC view mode you may add only BASIC variables; in the assembly language view mode you can add only assembly language variables. To monitor the contents of a memory address requires the W command. For example, W 40 will set up address 40 as the target of a Watch.

Although CodeView does support Watch points, whereby the program will run continuously until a given expression is true, you won't want to use that feature. Asking CodeView to stop when, say, CX becomes greater than 100 will cause your program to run at less than one thousandth its normal speed. Therefore, I have never found using Watch Points effective in any situation--it is always too slow.

I have avoided discussing the latest versions of CodeView, in favor of focusing on those features which are common to all versions. CodeView 3.10 which is included with BASIC 7.1 has several new convenience features, and a few new bugs as well. Many of the commands that in earlier versions have to be entered manually are now available by simply typing new values onto the display. For instance, where older versions of CodeView required you to enter Dump commands repeatedly, the new version updates the displayed values in a range of addresses constantly. And to change the address range, you may now simply move the cursor to the segment and address numbers and type new ones. An option to display memory values as words or even single and double precision values is also present in version 3.10.

Now that you have seen what CodeView is all about and how to use it, I want to conclude this chapter with a practical example. As I mentioned in Chapter 3, the amount of stack memory that is needed in a non-static subprogram or function can be difficult to determine. The calculation itself is trivial: simply add up the number of bytes needed by every variable in the routine. Each integer requires two bytes, single precision, long integer, and string variables need four bytes, and so forth. The problem, of course, is who wants to do all that counting, especially when there may be hundreds of variables. Counting is what computers are for, no?

The solution is that BASIC knows how many bytes are needed for the subprogram, and the very first thing a subprogram does when it is invoked is to call another routine that allocates the necessary stack space. So rather than use trial and error methods to increase the stack in small increments, you can use CodeView to directly see how many bytes of stack space are being requested. Here's how that's done, using the example program shown below.

DEFINT A-Z
DECLARE SUB StackTest (Dummy)
Test = 10
CALL StackTest(Test)
END

SUB StackTest(AnyVar)
  X = 100
  Y = 10
  Z = AnyVar
END SUB

Save this program as an ASCII file using the name TEST.BAS, and then compile it with the /o and /zi options. Next, link TEST.OBJ for CodeView using the /co option. Then start CodeView by entering CV TEST. Once you are in CodeView and viewing the BASIC source, press F10 to skip past BASIC's start-up code. At this point the cursor should be on the first statement, Test = 10. Finally, press F3 to show a mix of BASIC and assembly language source code. The display should look similar to that shown in Figure 4-3 [unavailable].

FIG4-3: How to determine the amount of stack memory needed for a non-static procedure.

Notice the first statement within the TestStack subprogram at line 7, where the value 6 (erroneously labeled b$STRTAB+6) is assigned to the CX register. This is the number of bytes of stack space being requested from the B$ENRA routine which is called in the next instruction. B$ENRA is the routine that actually allocates the stack memory, and it uses the value BASIC sends in CX to know how many bytes are needed. TestStack has three local variables and each is a two-byte integer, hence six bytes are required to store them on the stack.

For a very large program, the value assigned to CX will of course be much larger. Further, if one subprogram calls another, it will be up to you to add up all of the CX values to determine the total stack memory requirements. But this is very much easier than counting variables.

Summary

In this chapter you have learned how to identify and correct common programming errors. You have also learned the importance of understanding BASIC's various quirks, and how some statements do not always do exactly what you thought they would. I have shown several debugging strategies, including a software adaptation of the "cut in half" hardware technique.

Perhaps your most powerful debugging ally is the QuickBASIC and QBX editing environments. These powerful editors let you single step through a program, monitor variable values and function results, and halt your program when a specified condition occurs.

When BASIC terminates a program prematurely with an error message and a segmented address, you can either use the BC compiler's /a option to generate a source listing, or use CodeView to see where the error occurred. CodeView can also be used to step and trace through a program at the assembly language source level, and to determine the number of bytes of stack memory a non-static procedure requires.

In Chapter 5 you will learn about compiling and linking BASIC programs. I will present a complete overview of the many BC and LINK options that are available, and discuss the relative merits of each.

�

QB CULT MAGAZINEIssue #8 - January 2001

BASIC Techniques and Utilities, Chapter 4Debugging Strategies