I wrote an answer, but I'm dumb and tired so I'm revising it.
Short answer to your problem: you don't have to search all folders and keep track of the whole directory tree starting from the root IF you parse your folders linearly (one scope at the time.) That means you only need to keep track of folders located inside parents. In your example you try to search all folders and subfolders of both "x:/qb/[directories]" and "x:/win/[directories]", but you don't need to because you won't be parsing both directories at the same time.
Keep track of the currently searched folder via an absolute path string (something that would look like X:/QB/PROJECTS/FINISHED/). This way, you know where you are and most importantly, you know the name of the folder you leave when going DOWN. For instance, if there's nothing of interest in FINISHED/, you just go down to X:/QB/PROJECTS/ and search for the next folder after FINISHED/ on your list and enter it. Here's what I'd do:
- When you enter a new folder/begin your search (going UP:)
1. Find First with your file.
- If a file exists, add to result array.
2. Now, Find First for all objects (discard files, retain folders.)
- For as long as you have folders, add them to the folder list for this scope.
3. When no more folders are found
- You found no folder at all. You already searched for your file so go DOWN to parent.
- You have some folders, go UP via the first on your list (update absolute path)
- When you leave a folder (going DOWN:)
1. Search for the folder you left in this scope's folder list.
- If there's another folder available, go UP via that folder.
- If there's no more folder available, go DOWN to parent.
Since you search for your file upon entering (going UP) a new folder, the only thing left to do is listing all directories contained within and visit each one of them. When you're going DOWN to the parent directory, you can trash the directory list: going DOWN means you parsed all directories and already looked for your file. This should keep the array size in check (each entry will be at most 12 bytes - I mean, it's DOS) and since there's only one list per scope, you don't have exponentially growing directory names.
Does it make sense?
EDIT: Alright, I gave it a fair shake and it seems to work under DOSBox although I haven't tested it extensively. There's a 20,000 bytes buffer for directories and an extra 384 bytes to keep track of parent directories composition. There's a debug system to show where the program is searching, and the buffer content (with colors so you can see how each scope is processed.)
Code: Select all
' The basics:
' 1. Only when going UP (or on initialization):
' > Get all folders from current directory, seek file
' 2. Seek next folder we should search
' > If there's a folder, enter it (going UP) and repeat all steps.
' > If there's no folder left, go to parent (going DOWN), repeat all steps.
' You're done when you're no longer allowed to go DOWN.
'$INCLUDE: 'QB.BI'
DEFINT A-Z
CONST debugLevel = 1 ' set to 0 for silent, 1 for path, 2 for path & buffer
DECLARE SUB searchFile (where AS STRING, query AS STRING)
DECLARE SUB getObjects (where AS STRING, query AS STRING)
DECLARE SUB debugBuffer ()
DECLARE FUNCTION scopeDown% ()
DECLARE FUNCTION scopeUp$ ()
DECLARE FUNCTION scopeAppend% (folder AS STRING)
DECLARE FUNCTION INSTRREV% (source AS STRING, find AS STRING)
CONST moveROOT = 0
CONST moveDOWN = -1
CONST moveUP = 1
TYPE scopeInfo ' OFS SZE DESCRIPTION
start AS INTEGER ' 000 ..2 Starting offset in pathLst (starts at 1)
count AS INTEGER ' 002 ..2 Number of folders
visit AS INTEGER ' 004 ..2 Index of last visited folder (0 for none)
END TYPE ' 6 BYTES - Scope descriptor
DIM SHARED pathLst AS STRING * 20000 ' Parent and current sub-folders
DIM SHARED pathOfs AS INTEGER ' Writing offset for pathLst
DIM SHARED levlNfo(0 TO 63) AS scopeInfo ' Root plus 63 levels deep
DIM SHARED levlNow AS INTEGER ' How deep we are
' Search for all files named "duke3d.grp" starting from "c:/games/"
' Note: the program assumes more than one file can have this name; finding a
' file won't stop the program. There's not user escape implemented.
searchFile "c:/games/", "duke3d.grp"
''
'' Display folder name buffer, DEBUG PURPOSE ONLY
''
SUB debugBuffer
DIM length AS INTEGER, offset AS INTEGER, scopeId AS INTEGER
offset = 1
DO
IF (offset = levlNfo(scopeId).start) THEN
COLOR 1 + (scopeId AND &HF)
scopeId = scopeId + 1
END IF
length = ASC(MID$(pathLst, offset, 1))
PRINT MID$(pathLst, offset + 1, length); " ";
offset = offset + length + 1
LOOP WHILE (offset < pathOfs)
COLOR 8: PRINT pathOfs
END SUB
''
'' Get all objects in current directory
''
SUB getObjects (where AS STRING, query AS STRING)
DIM DTA AS STRING * 44, MaskZ AS STRING
DIM regs AS RegTypeX, objName AS STRING
'' SETUP DTA SO WE DON'T DESTROY COMMAND$ ''
regs.ax = &H1A00 ' Set DTA function
regs.dx = VARPTR(DTA) ' DS:DX points to our DTA
regs.ds = -1 ' Use current value for DS
CALL INTERRUPTX(&H21, regs, regs) ' Do the interrupt
MaskZ = where + "*.*" + CHR$(0) ' Mask (search all)
regs.ax = &H4E00 ' FindFirst
regs.cx = 16 ' Get all object types
regs.dx = SADD(MaskZ) ' DS:DX points to ASCIIZ file mask
regs.ds = -1 ' Use current DS
'' PARSE ALL OBJECTS ''
DO
CALL INTERRUPTX(&H21, regs, regs) ' Do the interrupt
IF (regs.flags AND &H1) THEN EXIT DO ' No object left
objName = UCASE$(MID$(DTA, 31, INSTR(31, DTA, CHR$(0)) - 31))
' Folder, append to scope
IF (ASC(MID$(DTA, &H15 + 1, 1)) AND &H10) THEN
IF ((objName <> ".") AND (objName <> "..")) THEN
IF scopeAppend%(objName) THEN PRINT "Buffer overflow!": END
END IF
' File, compare with query
ELSE
IF (objName = query) THEN
PRINT "Found file in " + CHR$(34) + where + CHR$(34) + " there might be more! (PRESS ANY KEY)": SLEEP
END IF
END IF
' FindNext
regs.ax = &H4F00
LOOP
END SUB
''
'' Last occurence of a string
''
FUNCTION INSTRREV% (source AS STRING, find AS STRING)
DIM ofs AS INTEGER
DO
INSTRREV% = ofs
ofs = INSTR(ofs + 1, source, find)
LOOP WHILE ofs
END FUNCTION
''
'' Append subfolder to scope. One byte is reserved to provide the length in
'' bytes of the folder name. This takes less memory than assuming that all
'' folders have 12-byte long names. This function returns -1 if the memory
'' buffer is saturated. Returns 0 if the sub-folder was successfully added.
''
FUNCTION scopeAppend% (folder AS STRING)
DIM tmp AS STRING
IF (levlNfo(levlNow).start = 0) THEN levlNfo(levlNow).start = pathOfs
levlNfo(levlNow).count = levlNfo(levlNow).count + 1
tmp = LTRIM$(RTRIM$(UCASE$(folder)))
tmp = CHR$(LEN(tmp)) + tmp
IF ((pathOfs + LEN(tmp)) >= LEN(pathLst)) THEN
scopeAppend% = -1
ELSE
MID$(pathLst, pathOfs, LEN(tmp)) = tmp
pathOfs = pathOfs + LEN(tmp)
scopeAppend% = 0
END IF
END FUNCTION
''
'' Free current scope and go back to parent. Freeing means that we no longer
'' need the sub-folder list for this scope so we can rewrite it. We also clear
'' the descriptor so it can be re-used. This function returns -1 if root as
'' been reached (there's no parent directory.)
''
FUNCTION scopeDown%
IF levlNfo(levlNow).start THEN pathOfs = levlNfo(levlNow).start
levlNfo(levlNow).start = 0
levlNfo(levlNow).count = 0
levlNfo(levlNow).visit = 0
scopeDown% = (levlNow = 0)
END FUNCTION
''
'' Search for the next sub-folder in this scope we should be browsing. The
'' "folder" argument returns the name of the next folder to enter (variable is
'' never read, only written.) If there is no more folder to visit, "folder" is
'' empty (you have to scopeDown.)
''
FUNCTION scopeUp$
DIM offset AS INTEGER, length AS INTEGER
levlNfo(levlNow).visit = levlNfo(levlNow).visit + 1
IF (levlNfo(levlNow).visit > levlNfo(levlNow).count) THEN
scopeUp$ = "" ' nothing left to see here
ELSE
' get next folder in line to browse
offset = levlNfo(levlNow).start
FOR i% = 1 TO levlNfo(levlNow).visit
length = ASC(MID$(pathLst, offset, 1))
offset = offset + length + 1
NEXT i%
scopeUp$ = MID$(pathLst, offset - length, length)
END IF
END FUNCTION
''
'' Where is the base directory (where we start looking for the file,) it must
'' be an absolute path and be terminated with a forward slash ("/".) Query is
'' the file name we're looking for.
''
SUB searchFile (where AS STRING, query AS STRING)
DIM pathNxt AS STRING, pathAll AS STRING
DIM move AS INTEGER
pathOfs = 1 ' Always set to 1 before starting
move = moveROOT ' Pretend we're moving up (for folder list)
levlNow = 0 ' We start at level 0
pathAll = where
query = LTRIM$(RTRIM$(UCASE$(query)))
DO
IF (move <> moveDOWN) THEN
if (debugLevel = 1) then PRINT CHR$(34) + pathAll + CHR$(34)
' Append directories, search for file
IF (move) THEN levlNow = levlNow + 1
CALL getObjects(pathAll, query)
END IF
if (debugLevel = 2) then
CLS : PRINT CHR$(34) + pathAll + CHR$(34); TAB(75); levlNow: debugBuffer
end if
' Enter next sub-directory
pathNxt = scopeUp$
IF LEN(pathNxt) THEN
pathAll = pathAll + pathNxt + "/"
move = moveUP
ELSE
IF (scopeDown%) THEN EXIT DO
levlNow = levlNow - 1
pathAll = LEFT$(pathAll, INSTRREV%(LEFT$(pathAll, LEN(pathAll) - 1), "/"))
move = moveDOWN
END IF
LOOP
END SUB
This can be optimized. For instance, searching for the next folder via an index is slower than using an offset since we're parsing the same strings over and over again. I also suspect searching for the file separately could be a tiny bit faster than using the same code for both folders and file match. Using fixed-length folder names will be a lot faster, but will require more memory...
EDIT AGAIN: I made the tiny optimization I mentioned earlier (using an offset rather than an index for string parsing) and it really speeds things up when directories have many sub folders (it's easy to add and really worth it!) I tried a compiled version of the program on a disk with 43,902 files and 2,699 folders, it didn't crash. At most, only 10% of the string buffer was used (2,139 bytes out of 20,000 allocated - so really, no need for XMS or virtual memory,) and the deepest the program had to go was 11 levels (it can go up to 63 - there's no check for that so beware.) It found "ultramid.ini" 10 times in 44 seconds (DOSBox at 3,000 cycles,) and took about 3 seconds with 100% cycles.
Attaching a file containing the dirty (yet somewhat usable) program. Source included, use at your own risks, etc.
RE-EDIT AGAIN: adding another version, it supports file masking and displays the result in a somewhat orderly fashion (including file size and time stamp.) It's usable. Tested under DOSBox only.