Page 1 of 1

Help - How 2 open and read large files over 32,768 bytes?

Posted: Sat Apr 01, 2006 7:29 pm
by Android
Can anyone advise how to open and read a file larger than 32.7k that contains multiple records with one cariage return at the end of the file? The file is field delimited with tilde (~) and asterisks (*) that are showing up as "square boxes" in the sample. Here's the sample of the data...

ISA00 00 1234567 AA8901234 5678901234A004012345678900P:GSMD123456789012200603241234567X004010X091A1... etc.

I realize the 32768 size limit is an inherant QB limitation (it bombs out after reaching 32768 characters), but I'm hoping to figure out a way to have QuickBasic do it so I don't have to shell out to some other program (that I haven't yet found) to do it.

Any help will be greatly appreciated!

Android

Posted: Sat Apr 01, 2006 9:42 pm
by moneo
You mention a file grerater than 32K, but I assume that the problem is that you have records that are over 32K each. If this is true, you have a real problem.

Even if you opened the file for binary and read the records in smaller chunks, you still may have problems if you need to have an entire record in memory in order to scan it and process it.

You mention delimiters. Is it possible that each physical record has delimiters within it to separate the logical records that need to be processed individually? In other words, maybe you don't need to process an entire record all at once, only the delimited pieces one at a time.

By the way, your sample doesn't shed any light on the problem.

Where did you get this file from? Would it be possible to get the file with smaller records?
*****

Posted: Sun Apr 02, 2006 11:47 am
by {Nathan}
You could try implimenting it into a QB file as Data and '$INCLUDE: ing it,,, but then you wouldn't be able to have it all in one time...

*cough* *freebasic for dos* *cough*

Problem: Crash Opening a file w/1-record larger than 32767.

Posted: Sun Apr 02, 2006 12:29 pm
by Android
Here is the actual code we're using. The problem starts when the ZX is greater than 32767 and we try to do the FIELD statement with the ZX .

OPEN "R",1,file2$,1
FIELD 1,1 AS MEF$
ZX=LOF(1) ' this gets the file size which is also the record size since it contains only 1 record
CLOSE 1

OPEN "R",#1,file2$,zx ' this open with ZX which is the size

FIELD #1,ZX AS MEF$ ' this reads with ZX as the amount to read. If ZX is larger than 32767 it bombs
GET #1, 1

I can't post a sample file because the delimiters don't show up correctly. However, the Delimiters are specified in the first header segment, a 105 byte fixed length record.

The data element separator is byte number 4; the component element separator is byte number 105; and the segment terminator is the byte that immediately follows the component element separator.

The delimiters are:
* Asterisk Data Element Separator
: Colon Subelement Separator
~ Tilde Segment Terminator

I hope this helps to understand the problem and (maybe) someone can come up with a solution!

Android

Posted: Sun Apr 02, 2006 4:18 pm
by MystikShadows
Maybe this could solve your problem:

Code: Select all

'$DYNAMIC
DIM WorkString() AS STRING
DIM Counter AS INTEGER
DIM LineCounter AS INTEGER
DIM Character AS STRING * 1

OPEN file2$ FOR BINARY AS #1
Counter = 0
REDIM WorkString(1 to 10) AS STRING
LineCounter = 1
DO WHILE NOT EOF (#1) OR 
    Counter = Counter + 1 
    GET #1, , Character    
    IF Counter  > 32767 THEN
        LineCounter = LineCounter + 1
        Counter = 1
    End If
    WorkString(LineCounter) = WorkString(LineCounter) + character
LOOP

This should effectively get the contents of a "record" bigger than 32768 by splitting the characters read into lines of 32767 characters each. Your reading character by characters, performances just might go down , but it will do the job.

However, from what I'm reading here, it seems you don't need to read everything at the same time. It seems you only need to read between the :, * and ~ and do whatever with them. If so, it's easy to do that and you don't need a bigger variable.

Just let me know what happens with text between the :, * and ~ and I'm sure we can come up with a solution.

Hope this helps

Posted: Sun Apr 02, 2006 5:23 pm
by Android
Hey there MystikShadows! Thanks for your idea. This looks like it might work but I'm not clear on some of it and hesitate using code I don't understand (in case it needs altering for some reason later on). Could I trouble you to explain to me what each line of code is doing here?

Thanks in advance,
Android

Posted: Sun Apr 02, 2006 6:27 pm
by MystikShadows
Of course you can :-) here goes:

Code: Select all

' This declares the WorkString dynamic array.  '$DYNAMIC is needed
' to accomplish this.
'$DYNAMIC                               
DIM WorkString() AS STRING      
DIM Counter AS INTEGER     ' Character counter current length read
DIM LineCounter AS INTEGER ' Will count lines needed for reading.
DIM Character AS STRING * 1' our one character buffer to read from file

' This opens the data file as a binary file (where the data type
' of the variable used determines the number of bytes read 
' in a GET statement.
OPEN file2$ FOR BINARY AS #1

' We start the counter at 0.  It will increment by one everytime
' we read a character from the file.
Counter = 0

' Change this redim statement if you need more lines to work with
REDIM WorkString(1 to 10) AS STRING
' LineCounter starts as 1
LineCounter = 1

DO WHILE NOT EOF (#1) 
    Counter = Counter + 1              ' Increment Counter by one
    GET #1, , Character                ' Read 1 character from the file
    IF Counter  > 32767 THEN           ' If Counter is greater than 32767
                                       ' it's time to change line number 
                                       ' and reset counter to 1     
        LineCounter = LineCounter + 1
        Counter = 1
    End If
    ' We add the character read to the WorkString array we declared
    ' at LineNumber.
    WorkString(LineCounter) = WorkString(LineCounter) + Character
LOOP
Basically, I read one character at a time from the file and add that character to the array of string WorkString(). Best way to explain it is thing of the workstring array as a matrix of 32767 columns by X number of lines. but i just used a string variable which can be 32767 characters long instead of actually creating a string matrix. :-)

If I read 32767 characters, i've reached the limit of the STRING data type
so I position myself at the 2nd element of the array and set Counter back to 1 so I'm readt to start reading into a new line fo string until Counter reaches 32767 again or I reach the end of file.

Let me know if I can clarify this further for you.

Posted: Sun Apr 02, 2006 7:39 pm
by moneo
Mystic, Just out of curiosity, what does "OR" mean on the following statement that you provided?

DO WHILE NOT EOF (#1) OR

*****

Posted: Sun Apr 02, 2006 7:48 pm
by MystikShadows
I corrected it, I'm not even sure where that OR came from. might have pasted something by typing to fast ;-).

Thanks for pointing it out...:-)