The QBNews Page 15 Volume 1, Number 3 May 22, 1990 ---------------------------------------------------------------------- U n d e r T h e H o o d ---------------------------------------------------------------------- Fast File I/O in QuickBASIC by Ethan Winer [EDITOR'S NOTE] This article first appeared in the March 1990 issue of Programmer's Journal. Back issues can be ordered by calling 1-800-234-0386. Without doubt, one of the slowest operations a program can perform is saving and loading disk data files. In many cases, this is dictated by the physical access delay of the disk device, and the time required to transfer data based on its rotation speed. One exception, however, is when many reads or writes must be performed on small pieces of data. For example, it is quite common to save or load an entire numeric array. In the case of an integer array that contains, say, ten thousand elements, that many individual calls to the DOS file I/O services will be needed. Even though DOS is written in assembly language, it still takes a fair amount of time to process each read or write request. One obvious solution is to process the file operation as a single large read or write. Indeed, I have written assembly language routines to do just that for use in company's QuickPak Professional add-on library product. But it is also possible to call QuickBASIC's internal PUT and GET routines directly. By bypassing the QuickBASIC compiler and its syntax checking, you can coerce it to read and write up to 64K of data in a single operation. Larger files can be accommodated by processing the file in pieces. The trick is to determine the names of these routines, and the number and type of parameters they expect to receive. QuickBASIC versions 4.0 and later contain four different internal routines for reading and writing binary data. Two of these are meant for reading data from a file, with one using the current DOS SEEK location and the other accepting a long integer SEEK argument. Similarly, there are two separate routines for writing data to disk. Most of QuickBASIC's internal routines begin with the characters "B$", which are illegal in a subroutine name. Fortunately, the ALIAS keyword allows you to declare a procedure with two different names -- the name you will use when calling it from the program, and the actual name that is made public for the linker. When Microsoft introduced inter-language calling capabilities in QuickBASIC version 4.00, it needed a way to allow access to routines written in C. These routines always start with an underscore character, which is also illegal as a QuickBASIC procedure name. The example program shown in Figure 1 declares the four internal routines as follows: BigSave writes data using the current DOS file pointer position, and BigSaveS expects a SEEK argument. Likewise, BigLoad reads from the current file position, and BigLoadS requires an offset to SEEK to before reading. All four of these routines require the parameters to be passed "by value", as opposed to "by address" The QBNews Page 16 Volume 1, Number 3 May 22, 1990 which is BASIC's usual method of passing parameters. This results in code that is both faster and smaller, because an extra level of indirection is avoided. That is, the routines can obtain the values directly from the stack, rather than having to first determine an address, and then go to that address for the actual value. Even though BYVAL and SEG *look* like they would result in additional code being added to a program, they are really just directives to the compiler. Before any of these routines may be called, you must open the file to be read or written for BINARY operation. Then, the first parameter that each routine expects is the BASIC file number that was used to open the file. The address parameter is passed as a SEG value, which means that both a segment and offset are required. Notice that a file may be loaded to or saved from any area of memory, by replacing [SEG Address] with [BYVAL Segment, BYVAL Address]. When SEG is used as part of a CALL statement, the "value" of the variable's segment is pushed on the stack, followed by the value of its address. Substituting two separate arguments "by value" is functionally the same thing as far as the routines are concerned. Also notice that the internal routine names are not available within the QuickBASIC editing environment. Therefore, this example program must be compiled to disk before it may be tested. In my own informal tests, I have found this technique to be as much as ten times faster than reading or writing individual array elements using a BASIC FOR/NEXT loop. The actual savings will of course depend on the number of elements being processed and their length in bytes. Unfortunately, this method cannot be used with QuickBASIC string arrays, because they are not kept in consecutive memory locations. However, numeric arrays may be accommodated, as well as any fixed-length or user-defined TYPE array. It is important to understand that when manipulating a fixed- length string array, the SEG operator must not be used. Whenever a fixed-length string or array element is used as an argument to an external routine, QuickBASIC first makes a copy of it into a regular string variable. Then, the address of the copy is passed instead. Since the address of a copy of an array element has no relevance to the address of the actual array, we must use a different approach. In fact, there are two possible solutions. One is to create a TYPE definition that is comprised solely of a fixed-length string portion. Although the example below assumes a string length of twenty characters, you would of course use whatever is appropriate for your program. TYPE FLen S AS STRING * 20 END TYPE DIM Array(1 TO 10000) AS FLen The second solution is to use a combination of BYVAL VARSEG and BYVAL VARPTR, to pass the segment and address of the starting array The QBNews Page 17 Volume 1, Number 3 May 22, 1990 element directly. When QuickBASIC sees VARSEG or VARPTR, it realizes that you do in fact want the actual segment and address of the specified array element. Thus, you would use the following syntax when calling BigSave to save a fixed-length string array: CALL BigSave(FileNumber, BYVAL VARSEG(Array(First)), BYVAL _ VARPTR(Array(First)), NumBytes) One final note concerns saving or loading more than 32767 bytes. QuickBASIC does not support unsigned integers, so you must instead use an equivalent negative value. This is quite easy to determine, by simply subtracting 65536 from the required number of bytes. It is a common trick to avoid negative numbers when calling assembly language routines by instead substituting a long integer number or variable. However, that will not work in this case, because two extra bytes will be pushed onto the stack by the use of BYVAL. Therefore, it is essential that you specify the correct type of parameters when calling these routines. ********************************************************************** Ethan Winer is the president of Crescent Software, and the author of QuickPak Professional and P.D.Q. He can be reached by calling Cresent Software at (203) 846-2500. ********************************************************************** [EDITOR'S NOTE] Source code for this article is contained in FASTFILE.ZIP.