Page 1 of 1
Posted: Mon 12 Aug 2019, 08:56
I'm trying to find HUGESLIB.BBC. I think I have a use for it .
It was once available via
http://tech.groups.yahoo.com/group/bb4w ... GESLIB.BBC
But this download link doesn' function anymore.
Can someone send me the latest version of the library or a functioning link ?
Posted: Mon 12 Aug 2019, 12:41
I don't, but do you need it any more? BB4W 6 and (I think) BBC-SDL now natively support huge strings.
Edit: actually I found a copy. I'm not sure what its status is (i.e. whether it would work with recent versions of BBC BASIC). It's not on the list of libraries in the "help", so I presume it's not part of the current release. I'm reluctant to post it if Richard doesn't supply it (since it is his!) If you still think you need it, I'll contact Richard to see if it's OK to post it.
Posted: Mon 12 Aug 2019, 16:50
what I 'd like to do is:
read a xls-file with a size close to 1 MB (is that even considered huge?), then treat it as a string to count the number of occurrences of a substring. Maybe other operations later on too.
I found a partial description of HUGESLIB on email@example.com
but was unable to retrieve the library itself. I was hoping to read more on how to convert a file to a string and manipulate it.
If BB4W natively already allows for huge strings then maybe this lib is now obsolete. But maybe it does contain some useful functionality and programming examples.
Posted: Tue 13 Aug 2019, 07:53
The possibility to store a whole file into a string is indeed tempting. However you have to realize that, in order to find a sub string, INSTR() will always make a copy of the string before it does the search. In addition, it needs the work space to do it. So if you use the default(?) 2MB of BB4W work space then INSTR() on a 1 MB string will result in "No room" error.
I once needed to sequentially search for all the comma's in a 1 MB CSV file in order to find al the individual values and used INSTR(). It took 4435 milliseconds in total. When I replaced it with assembly language to search for a comma, it took 15 milliseconds.
Just thought I give you this info for your consideration.
Posted: Tue 13 Aug 2019, 08:34
Very usefull considerations. Thank you!
I guess I can claim more workspace by raising HIMEM. I've never had a need for that before, so I'll read some more about it before doing so.
I'm a complete illiterate in assembly language. So I'll have to do with the classic string operations plus the STRINGLIB library. I've already done the the work to solve my problem but not by reading the file as a whole and treating it as a string. Instead I read the excel file with COMLIB (FNgetvaluestr()) cell by cell, then test it for the occurence of the substring. Not blindingly fast, but quite satisfactory nevertheless. It takes 5.5 sec for 350KB but my file will grow up to 1 MB later on.
Still, to notch it up a bit I was curious about the HUGESLIB functionality. I have no idea if it uses assembly language. But coming from Richard I imagine it has been coded the best and fastest way possible, with good functionality. I've read that the library is documented with REM statements in the lib-file itself so I can't check.
I'd like to have a look at it and see if I can speed up my program. Also I'm always curious to look at the code used (not assembly) and learn from it.
Posted: Tue 13 Aug 2019, 10:34
Assembly language is also not my strong point, however searching for a specific byte, like the comma (ASCII 44) given the memory location to start, isn't very difficult. Searching for more characters a little more tricky but doable.
If INSTR() gives you too much delay then let me know and I can provide you with the assembly code.
Posted: Tue 13 Aug 2019, 11:41
I think it should be possible to read the whole file in as a single string in BB4W6 (memory permitting!), and then just process with the standard string handling routines. I think that will probably be the best solution. The routines provided in HUGESLIB largely mirror the standard ones, plus some conversion ones. Have a look in the manual under GET/GET$, in the section on "reading from a file" - in particular, with the "BY" version, you should be able to tell BB4W to read the full extent of the file (found with EXT#).
One concern I have about HUGESLIB is that the string format changed in BB4W6: there are now 4 bytes for the string length. Prior to that, it was only 2 bytes - hence the limit was ~65,000 characters. I suspect that the library will not work with BB4W6, since it will handle the string format "wrong". I'm guessing that's why Richard has removed all mention of it from nearly everywhere (wiki, list of libraries, groups.io, etc).
Another option you could consider is using *LOAD to load the file into memory, and then just processing it yourself.
The speed question depends a bit on what/how often you are doing something: if I just want a program to do something once, I'd probably accept a 4 -5 second delay - but if it had to do it between screen refreshes it would obviously be a non-starter!
If you are really keen to see the library, post again, and I'll contact Richard. He might be happy for you to look at the code, even if it won't work with recent versions of BB4W/BBC-SDL.
Hope that's helpful.
Posted: Wed 14 Aug 2019, 12:18
Hi DDRM and Mike,
According to your advice I coded the following as a proof of concept:
Code: Select all
File$=GET$#A BY EXT#A
IF i a$=LEFT$(a$,i-1)+MID$(a$,i+LEN(o$)) c+=1
This works fine. Compared to the way I've counted the string "Toevoegen" before, by reading cell by cell with FNgetvaluestr() from COMLIB, this is about 60 x faster (and also simpler to code). I've already integrated it in my program.
One slight complication is that I needed to work with two versions of my file, one .xlsx version and one .csv version.
The .csv version is merely created to quickly search for strings, the .xlsx version is required to do more complex operations with the help of COMLIB. The code in the example above, when executed on the .xlsx version of the file just gives occurrence=0. When taking a random extract of the file,say B$=MID$(Myfile$, 500,100), then B$ is filled with all sorts of special chars that deal with the inner workings of Excel.
Still, I think I can do with just the .csv version. I'm making progress using COMLIB with the .csv version. I have a few problems to solve, but I think I can make it work.
Thank you both DDRM and Mike for both your advice,
Posted: Thu 15 Aug 2019, 08:55
Interesting! Yes, it turns out the XLSX files are zip-compressed, so it's not surprising they are junk if you read them directly. It's probably just going to be easiest to save the file as CSV, which should also load into Excel, as long as you don't need to preserve fancy stuff (like formulae, if you consider them fancy!).
Posted: Thu 15 Aug 2019, 11:14
Hi D ,
as long as you don't need to preserve fancy stuff
well yes, it so happens that I do need some fancy stuff. The xlsx file is the result of an Excel webdata query
The query starts on opening the xlsx file. I then close the file and launch my BB4W program to crunch the downloaded data. That works fine.
After copying the file to csv format the data query is lost.
So searching the csv file as a string to find occurences of a substring is not (in this specific case !!
) an efficient solution.
On the positive side : I now know how to convert a file to a huge string and how to deal with it. It will be useful in some of my other programs.