OUT OF STRING SPACE

Geminga · Post by **Geminga** » Fri Sep 24, 2010 4:17 pm

Hi All!
New to QBasic. Running the QB included w/ W98|| on a W98|| system.
The quest:

Process fixed-width astronomical database files so that they load in Aladin.

I feel no attachment to QB, it just seems like a language I can handle. With a little help
from other topics in this forum, I produced a 15900-line 38-column CSV file. Great...
The source file runs to 2,501,314 lines (aka records) and 210 Bytes/line. CSVed
runs out of swap~:-(
ISTR something about loading large text files as binary to skate around LINE INPUT's limits...
The database is distributed as 30 files of at least 34000 lines each. They were put
into a directory and from DOS prompt:
COPY *.TXT I-280B.TXT
30 filenames slooowly scroll up... Done :-)

Thanks in Advance!

PS- The file at hand is The All-Sky Compiled Catalog of the 2.5 million Brightest Stars V3.
This box has 768MB RAM.

Post by **burger2227** » Fri Sep 24, 2010 5:19 pm

So what's the question?

Nice story, but lacking in detail.

Geminga · Post by **Geminga** » Sat Sep 25, 2010 9:28 am

Sorry for the ambiguity.

Is there a workaround for the LINE INPUT string space problem?
Should the file be processed as BINARY?
----------------
Since that post, I tried cutting the file into 32000-line chunks (with SPLIT.EXE). Got
79 files, each 6,720,000 Bytes. Still too large:(
Even the 3,617,044 Byte file #79 was too large.

Post by **burger2227** » Sat Sep 25, 2010 10:45 am

Why are you using LINE INPUT # to read a CSV file? Normally they are read using INPUT #.

Not sure how many variables you'd need. Depends on number of values separated by commas. String and number type values can be mixed in one line of data. Each line of data should have identical data types.

IE, read the file just like you used WRITE #1 to make them.

WRITE #1, a, b, c, d, e, f, ....
INPUT #1, g, h, i, j, k, l, m...

What are you doing with the data? QB size and memory limitations will probably not allow you to store it all in arrays so you would need to work with chunks of data. Not all at once. INPUT$(bytes, 1) can only read less than 32767 bytes at a time.

QB64 is the same as QB, but does not have those size and memory limitations. LINK is in my signature. It's written in C code for newer machines. Not 98's.

Ted

Geminga · Post by **Geminga** » Sat Sep 25, 2010 12:39 pm

Thanks for the the help:)

My main reference is a book called 'QBasic by Example'. It shows how INPUT can be used to
read the fields from a CSV.
I'm trying to *make* a CSV from a very large fixed-width file. LINE INPUT can do that, but it seems to choke when there are too many lines in the source file. Here's the code:
CLS
col1 = 12
col2 = 12
col3 = 4
col4 = 4
col5 = 6
col6 = 5
col7 = 7
col8 = 7
col9 = 5
col10 = 5
col11 = 5
col12 = 5
col13 = 4
col14 = 4
col15 = 4
col16 = 1
col17 = 1
col18 = 1
col19 = 1
col20 = 2
col21 = 1
col22 = 1
col23 = 1
col24 = 1
col25 = 20
col26 = 4
col27 = 5
col28 = 1
col29 = 6
col30 = 6
col31 = 8
col32 = 7
col33 = 5
col34 = 4
col35 = 5
col36 = 4
col37 = 5
col38 = 4
'OPEN "c:\280b\i280b.txt" FOR INPUT AS #1
OPEN "c:\windows\desktop\str1.txt" FOR INPUT AS #1
DO
OPEN "c:\windows\desktop\catalog.txt" FOR APPEND AS #2
LINE INPUT #1, jn$
PRINT #2, MID$(jn$, 1, col1) + "," + MID$(jn$, 14, col2) + "," + MID$(jn$, 27, col3) + "," + MID$(jn$, 32, col4) + "," + MID$(jn$, 37, col5) + "," + MID$(jn$, 44, col6) + "," + MID$(jn$, 50, col7) + "," + MID$(jn$, 58, col8) + "," + MID$(jn$, 66, col9) + "," + MID$(jn$, 72, col10) + "," + MID$(jn$, 78, col11) + "," + MID$(jn$, 84, col12) + "," + MID$(jn$, 90, col13) + "," + MID$(jn$, 95, col14) + "," + MID$(jn$, 100, col15) + "," + MID$(jn$, 105, col16) + "," + MID$(jn$, 106, col17) + "," + MID$(jn$, 107, col18) + "," + MID$(jn$, 108, col19) + "," + MID$(jn$, 109, col20) + "," + MID$(jn$, 111, col21) + "," + MID$(jn$, 112, col22) + "," + MID$(jn$, 113, col23) + "," + MID$(jn$, 114, col24) + "," + MID$(jn$, 116, col25) + "," + MID$(jn$, 137, col26) + "," + MID$(jn$, 141, col27) + "," + MID$(jn$, 146, col28) + "," + MID$(jn$, 148, col29) + "," + MID$(jn$, 155, col30) + "," + MID$(jn$, 162, col31) + "," + MID$(jn$, 171, col32) + "," + MID$(jn$, 179, col33) + "," + MID$(jn$, 185, col34) + "," + MID$(jn$, 190, col35) + "," + MID$(jn$, 196, col36) + "," + MID$(jn$, 201, col37) + "," + MID$(jn$, 207, col38)
CLOSE #2
LOOP UNTIL EOF(1)
CLOSE #1
PRINT
PRINT "Done"
---------------------
The COLx = X contains the desired column widths.
This code works to produce a 15900 line (aka record) CSV file... but not a
2501314 line CSV.

Post by **burger2227** » Sat Sep 25, 2010 5:30 pm

Are you sure that LINE INPUT is the culprit? Did you try running just LINE INPUT # without any PRINT #?

Why are you opening and closing #2 every loop? That can't be good.

When you start numbering variable names, it's time to think about using an array. Create an Array to hold the number of characters needed in colX.

Code: Select all


DIM SHARED col(38) AS INTEGER ' shared passes array values to any SUB without a parameter

col(1) = 12
col(2) = 12 
col(3) = 4
etc.
etc.

Do you know how to make SUB programs? Put the PRINT #2 code into a SUB. Place parenthesis around the col numbers as shown in first line of statement below because you are reading the array now:

Code: Select all

SUB CreateCSV (Jn$) 

PRINT #2, MID$(jn$, 1, col(1)) + "," + MID$(jn$, 14, col(2)) + "," + MID$(jn$, 27, col(3)) + "," + MID$(jn$, 32, col4) + "," + MID$(jn$, 37, col5) + "," + MID$(jn$, 44, col6) + "," + MID$(jn$, 50, col7) + "," + MID$(jn$, 58, col8) + "," + MID$(jn$, 66, col9) + "," + MID$(jn$, 72, col10) + "," + MID$(jn$, 78, col11) + "," + MID$(jn$, 84, col12) + "," + MID$(jn$, 90, col13) + "," + MID$(jn$, 95, col14) + "," + MID$(jn$, 100, col15) + "," + MID$(jn$, 105, col16) + "," + MID$(jn$, 106, col17) + "," + MID$(jn$, 107, col18) + "," + MID$(jn$, 108, col19) + "," + MID$(jn$, 109, col20) + "," + MID$(jn$, 111, col21) + "," + MID$(jn$, 112, col22) + "," + MID$(jn$, 113, col23) + "," + MID$(jn$, 114, col24) + "," + MID$(jn$, 116, col25) + "," + MID$(jn$, 137, col26) + "," + MID$(jn$, 141, col27) + "," + MID$(jn$, 146, col28) + "," + MID$(jn$, 148, col29) + "," + MID$(jn$, 155, col30) + "," + MID$(jn$, 162, col31) + "," + MID$(jn$, 171, col32) + "," + MID$(jn$, 179, col33) + "," + MID$(jn$, 185, col34) + "," + MID$(jn$, 190, col35) + "," + MID$(jn$, 196, col36) + "," + MID$(jn$, 201, col37) + "," + MID$(jn$, 207, col38) 

END SUB

Place the SUB code after the main program code or create it in the Edit menu. EDIT has Make SUB, just place a name in the box to make one.

Now instead of the PRINT # code in the loop, place the SUB call after the LINE INPUT.

Code: Select all

 
OPEN "c:\windows\desktop\catalog.txt" FOR APPEND AS #2  'NOT in loop!

DO UNTIL EOF(1)  ' you cannot read it if it is empty
LINE INPUT #1, text$  
CALL CreateCSV (text$)  'use a different variable name, SUB won't care

LOOP ' using EOF here might cause an error because an empty file would be read once

CLOSE   ' closes all files!

PRINT 
PRINT "Done"

END   ' or SYSTEM closes program Place SUB code after this line.

SUB calls prevent STRING errors like "Out of String Space" because every time they are called, everything is new to them.

IF parsing the string values does not work, then just use INPUT # and try reading it as a CSV file. You'll need a long statement but ANY type of values can be directly read. The code below assumes that the data is all one numerical or string type! If they are a mixture just use appropriate variable types without an array. Put this in the loop instead:

Code: Select all

DIM data(38) AS  '??? STRING, INTEGER, DOUBLE, SINGLE, LONG if used

DO UNTIL EOF(1)
INPUT #1, data(1), data(2), data(3)......data(38)
WRITE #2, data(1), data(2), data(3)....data(38) 
LOOP

If this doesn't work then QB can't work with that size of files. Try QB64 on a newer machine.

Geminga · Post by **Geminga** » Sat Sep 25, 2010 8:19 pm

Excellant!
Thank you so much:)

Anonymous · Post by **Anonymous** » Tue Oct 05, 2010 2:03 am

Could really perform this properly.
Some back up files seems to run some error and don't know why.

howtodealwithdepression.org

Post by **burger2227** » Tue Oct 05, 2010 2:07 am

Post your code and list the errors. You can get many errors with files.

When an error occurs, note the line of code it stops at. It will almost point at the error!

Theunis · Post by **Theunis** » Mon Oct 18, 2010 8:46 am

I did not notice how old this was. So I am deleting my reply. Sorry.

But as Burger said; Post your code (snippet) and errors, without it it is not possible to help you.

Pete's QBASIC Site

OUT OF STRING SPACE

OUT OF STRING SPACE

File size