RTR logo

BBC BASIC for Windows

Format of Data in Memory



Memory map

BBC BASIC for Windows is a 32-bit program and uses 32-bit 'flat' (unsegmented) memory addressing. The user's program, data, stack and libraries occupy a contiguous block of memory, reserved by Windows™ when BASIC is started. Absolute memory addresses therefore cannot be guaranteed to be the same each time BASIC is run, and your programs should only ever access memory allocated with the DIM statement, allocated by a Windows™ API function or at addresses relative to PAGE, LOMEM, HIMEM etc.

The current BASIC program starts at PAGE and ends at the byte immediately below TOP (i.e. the length of the program is TOP-PAGE bytes). The 'dynamic data structures' (variables, arrays etc.) start at LOMEM which by default is set equal to TOP (i.e. they follow immediately after your program). This area of memory, which is called the 'heap', grows upwards as more variables are created. The stack grows downwards from HIMEM which by default is set a little less than one Megabyte (two Megabytes in BBC BASIC for Windows version 6.00a or later) above PAGE, but may be raised above this value by the user (memory permitting) either using a program statement or the Customize menu command.

HIMEM
END
LOMEM/TOP
PAGE
Up
Libraries
Stack
Down
(unused)
Up
Heap
User's Program
Top of stack
Current limit of stack
Current limit of heap
Heap base/end of program
Start of program
As your program runs, the heap expands upwards towards the stack and the stack expands downwards towards the heap. If the two should meet, you get a 'No room' error. Fortunately, there is a limit to the amount by which the stack and the heap expand.

In general, the heap only expands whilst new variables, arrays or structures are being declared. However, altering the length of string variables can result in 'dead' string space which also causes the heap to expand.

In addition to storing the 'return addresses' and other information about the nested structures in your program (loops, procedures, functions etc.), the stack is also used 'internally' by the BBC BASIC for Windows interpreter. Its size fluctuates but, in general, it expands every time you increase the depth of nesting of your program structure and every time you increase the number of local variables in use.


Memory management

Since BBC BASIC for Windows can make much more memory available to user programs than was possible with BBC BASIC (86), memory management is less of a concern. You will often not need to worry about running out of RAM: if the default amount allocated for the user's program and data is insufficient, the value of HIMEM can be increased.

However, there may still be situations where the amount of memory used by your program needs to be kept to a minimum. With careful design of your program, the size of both the stack and the heap can be reduced. Growth of the stack can be reduced by avoiding deeply nested structures and re-entrant routines. The number of LOCAL variables should be kept to a minimum (especially local arrays, which can use up a lot of stack space). Growth of the heap can be controlled by limiting the number of variables you use and by good string variable management. Again, arrays are particularly thirsty of memory.

Reducing stack usage

Some problems naturally lend themselves to a recursive solution. The classic example is the calculation of the factorial, which can be defined in a recursive fashion:
DEF FN_Factorial(N)
IF (N = 1) THEN = 1 ELSE = N * FN_Factorial(N-1)
Each time the function 'calls itself' the size of the stack increases, so the larger the number whose factorial is required the greater the amount of stack usage. By restructuring the problem in a non-recursive way the amount of stack used can be reduced, but at the expense of the size and readability of the program:
DEF FN_Factorial(N)
LOCAL I,F
F = 1
FOR I = N TO 1 STEP -1
  F = F * I
NEXT
= F
It must be stressed that this example is illustrative only. It is very unlikely that the amount of stack used in this case would be significant.

Limiting the number of variables

Each new variable occupies room on the heap. Restricting the length of the names of variables and limiting the number of variables used will limit the size of the heap. However, of the techniques available to you, this is the least rewarding. In addition, it leads to incomprehensible programs because your variable names become meaningless. If you compile your program to a standalone EXE, variable names are (by default) automatically abbreviated, so you gain the benefits without needing to modify your program.


Program storage in memory

The program is stored in memory in the format shown below. The first program line commences at PAGE.

length LS MS token       : token     &0D
Up Keyword Tokens Up CR
Line No Left Program Line Right

Line length

The line length includes the line length byte itself. The address of the start of the next line is found by adding the line length to the address of the start of the current line. The end of the program is indicated by a line length of zero and a line number of &FFFF.

Line number

The line number is stored in two bytes, LSB first. The end of the program is indicated by a line number of &FFFF and a line length of zero. Valid line numbers are from 1 (&0001) to 65535 (&FFFF); a line number of zero signifies that the line is unnumbered.

Statements

Statements begin with a keyword, one of the symbols '*', '=' or '[', or a variable name (in the case of an implied LET). Keywords are encoded as one-byte tokens wherever they occur; these have values in the ranges &80 to &FF and &01 to &10. Statements within a line are separated by colons.

Line terminator

Each program line is terminated by a carriage-return (&0D).


Variable storage in memory

Variables are held within memory as linked lists (chains). The first variable in each chain is accessed via an index which is maintained by BBC BASIC for Windows. There is an entry in the index for each of the characters permitted as the first letter of a variable name. Each entry in the index has a double-word (four bytes) address field which points to the first variable in the linked list with a name starting with its associated character. If there are no variables with this character as the first character in the name, the pointer is zero. The first four bytes of all variables holds the address of the next variable in the chain. The address in the last variable in the chain is zero. All addresses are held in the standard 80x86 format - LSB first.

The first variable created for each starting character is accessed via the index and subsequently created variables are accessed via the index and the chain. Consequently, there is some speed advantage to be gained by arranging for all your variables to start with a different character. If you compile your program to a standalone EXE, variable names are (by default) automatically distributed across the alphabet, so you gain the benefits without needing to modify your program.

Integer variable storage

Integers are held in two's complement format. They occupy 4 bytes or 8 bytes with the LSB first. Bit 7 of the MSB is the sign bit. To make up the complete variable, the address (link), the name and a separator (zero) byte are added to the value. The format of the memory occupied by a 32-bit integer variable called 'NUMBER%' is shown below. Note that since the first character of the name is found via the index, it is not stored with the variable.

LS     MS U M B E R % &00 LS     MS
Up Up Up Up Left Rest of Name Right Left Value Right
Address of next variable starting with the same letter

The format of the memory occupied by a 64-bit integer variable called 'NUMBER%%' is shown below.

LS     MS U M B E R % % &00 LS             MS
Up Up Up Up Left Rest of Name Right Left Value Right
Address of next variable starting with the same letter

The smallest amount of space is taken up by a variable with a single letter name. The static integer variables, which are not included in the variable chains, use the names A% to Z%. Thus, the only single character names available for dynamic integer variables are a% to z% plus _% and `% (CHR$(96)). As shown below, 32-bit integer variables with these names will occupy 10 bytes:

LS     MS % &00 LS     MS
Up Up Up Up Left Value Right
Address of next variable starting with the same letter

Byte variable storage

Byte variables are unsigned; they occupy one byte. To make up the complete variable, the address (link), the name and a separator (zero) byte are added to the value. The format of the memory occupied by a byte variable called 'NUMBER&' is shown below. Note that since the first character of the name is found via the index, it is not stored with the variable.

LS     MS U M B E R & &00 value
Up Up Up Up Left Rest of Name Right
Address of next variable starting with the same letter

Variant numeric storage (40 bit)

BBC BASIC for Windows version 5.95a or earlier only
Real numbers are held in binary floating point format. In the default (40-bit) mode the mantissa is held as a 4 byte binary fraction in sign and magnitude format. Bit 7 of the MSB of the mantissa is the sign bit. When working out the value of the mantissa, this bit is assumed to be 1 (a decimal value of 0.5). The exponent is held as a single byte in 'excess 127' format. In other words, if the actual exponent is zero, the value stored in the exponent byte is 127. To make up the complete variable, the address word, the name and a separator (zero) byte are added to the number. The format of the memory occupied by a real variable called 'NUMBER' is shown below.

LS     MS U M B E R &00 LS     MS exp
Up Up Up Up Left Rest of Name Right Left Mantissa Right Up
Address of next variable starting with the same letter Exponent

As with integer variables, variables with single character names occupy the least memory (however, the names A to Z are available for variant numeric variables). Whilst a real variable requires an extra byte to store the number, the '%' character is not needed in the name. Thus, integer and real variables with the same name occupy the same amount of memory. However, this does not hold for arrays, since the name is only stored once.

In the following examples, the bytes are shown in the more human-readable manner with the MSB on the left.

The value 5.5 would be stored as shown below.

Mantissa Exponent
.0011 00000000 00000000 00000000 0000 1000 0010
UpSign Bit
&3000 0000&82
Because the sign bit is assumed to be 1, this would become:
Mantissa Exponent
.1011 00000000 00000000 00000000 0000 1000 0010
&B000 0000&82
The equivalent in decimal is:
(0.5+0.125+0.0625) * 2^(130-127)
=0.6875 * 2^3
=0.6875 * 8
=5.5
BBC BASIC for Windows uses variant numeric variables which can hold either integers or floating-point values, allowing the faster integer arithmetic routines to be used if appropriate. The presence of an integer value in a variant numeric variable is indicated by the stored exponent being zero. Thus, if the stored exponent is zero, the 4 byte mantissa holds the number in normal integer format.

Depending on how it is put there, an integer value can be stored in a variant numeric variable in one of two ways. For example,

number=5
will set the exponent to zero and store the integer &00 00 00 05 in the mantissa. On the other hand,
number=5.0
will set the exponent to &82 and the mantissa to &20 00 00 00.

The two ways of storing an integer value are illustrated in the following four examples.
Example 1
number=5 & 0000000005 Integer 5
Example 2
number=5.0 & 8220000000 Real 5.0
This is treated as
& 82A0000000
=
=
=
(0.5+0.125)*2^(130-127)
0.625*8
5
because the sign bit is assumed to be 1.
Example 3
number=-5 & 00FFFFFFFB
The 2's complement gives
& 0000000005 Integer -5
Example 4
number=-5.0 & 82A0000000 Real -5.0
(The sign bit is already 1)
=
=
Magnitude =
(0.5+0.125)*2^(130-127)
0.625*8
5

If all this seems a little complicated, try using the program below to accept a number from the keyboard and display the way it is stored in memory. The program displays the 4 bytes of the mantissa in 'human readable order' followed by the exponent byte. Look at what happens when you input first 5 and then 5.0 and you will see how this corresponds to the explanation given above. Then try -5 and -5.0 and then some other numbers. The program is an example of the use of the byte indirection operator. See the Indirection section for details.

The layout of the variable 'NMBR' in memory is shown below.

LS     MS M B R &00 LS     MS exp  
Up Up Up
A% A%+3 A%+4

REPEAT
  INPUT "Enter a number: " NMBR
  PRINT "& ";
  :
  A% = ^NMBR
  REM Step through mantissa from MSB to LSB
  FOR I% = 3 TO 0 STEP -1
    REM Look at value at address A%+I%
    num$ = STR$~(A%?I%)
    IF LEN(num$)=1 num$="0"+num$
    PRINT num$;" ";
  NEXT
  :
  REM Look at exponent at address A%+4
  num$ = STR$~(A%?4)
  IF LEN(num$)=1 num$="0"+num$
  PRINT " & "+num$''
UNTIL NMBR=0

Variant numeric storage (64 bit)

BBC BASIC for Windows version 5.95a or earlier only
In *FLOAT 64 mode variant variables are stored as 64-bit numbers in 8 bytes of memory (bits 0-7 in the first byte and bits 56-63 in the last byte). Bit 63 is the sign bit (0 for positive, 1 for negative). Bits 62 to 52 inclusive are the 11-bit exponent in offset-binary ('excess 1024') format, thus a value of 1024 (&400) represents an exponent of zero; exponent values of 0 (&000) and 2047 (&7FF) are not permitted. Bits 51 to 0 inclusive are the least-significant 52 bits of the mantissa (the MSB of the mantissa, bit 52, is not stored and is assumed to be a '1').

To make up the complete variable, the address word, the name and a separator (zero) byte are added to the number. The format of the memory occupied by a 64-bit real variable called 'NMBR' is shown below.

LS     MS M B R # &00 LS             MS
Up Up Up Up Left Rest of name Right Left Value Right
Address of next variable starting with the same letter

64-bit variant numeric variables are distinguished from 40-bit variant numeric variables by the addition of a # suffix character. This character is added automatically in *FLOAT 64 mode, but can be explicitly specified by the user when references to 64-bit variables are made in *FLOAT 40 mode.

If the most-significant 24 bits of the 64-bit value are all zero then the variable is assumed to contain a 40-bit real number. If the most-significant 32-bits of the 64-bit value are all zero then the variable is assumed to contain a 32-bit signed integer. In these cases the data format accords with those described earlier for 40-bit variants or 32-bit integers respectively.

Variant numeric storage (80 bit)

In BBC BASIC for Windows version 6.00a or later variant variables are stored as 80-bit numbers occupying 10 bytes of memory. Floating-point values are stored in 'extended precision' format (64-bits mantissa with an explicit MSB, 15-bits 'excess 16383' exponent and a sign bit). If the most-significant 16 bits (exponent plus sign bit) are all zero then the variable is assumed to contain a 64-bit signed integer in the 'mantissa'.

To make up the complete variable, the address word, the name and a separator (zero) byte are added to the number. The format of the memory occupied by an 80-bit real variable called 'NUMBR' is shown below.

LS     MS U M B R &00 LS                 MS
Up Up Up Up Left Rest of name Right Left Value Right
Address of next variable starting with the same letter

String variable storage

String variables are stored as the string of characters. Since the length of the string is stored in memory an explicit terminator for the string is unnecessary. As with numeric variables, the first double-word is the address of the next variable starting with the same character. However, since BBC BASIC for Windows needs information about the length of the string and the address in memory where it starts, the overheads for a string are more than for a numeric. The format of a string variable called 'NAME$' is shown below.

BBC BASIC for Windows version 5.95a or earlier:
LS     MS A M E $ &00 LS     MS LS MS
Up Up Up Up Left Rest of name Right String start address Up Up
Address of next variable starting with the same letter Length

BBC BASIC for Windows version 6.00a or later:
LS     MS A M E $ &00 LS     MS LS     MS
Up Up Up Up Left Rest of name Right Up Address Up Up Length Up
Address of next variable starting with the same letter

The amount of memory allocated for the string depends on the current length of the string, according to the following formula:

allocated_length = 2^INT(LOG2(current_length)+1)-1
So long as the length of the string is compatible with the allocated length, the string will be stored at the same address. If the variable is set to a string longer than this maximum length there will be insufficient room in the original position for the characters of the string. When this happens, the new string will be placed in a block of the correct length taken from the string free list or, failing that, on the top of the heap; its new start address will be loaded into the address bytes. The previous space occupied by the string is added to the free list. The same thing happens when the length of the string is reduced below the minimum value corresponding to the allocated length, otherwise wasted memory ('garbage') would result.

For example if the allocated length is 2047 bytes (2^11-1) the string will be stored at the same address so long as its current length remains between 1024 and 2047 characters. If its length is reduced below 1024 characters or increased above 2047 characters the string will be moved to a new address and the 2047-byte block of memory added to the string free list.

Structure storage

The format of a structure in memory consists of three parts:

The format of the structure header is very similar to that of an ordinary variable, except that instead of a data value it includes two 32-bit pointers (links) to the format and data blocks. The header of a structure called STRU{} would be stored as follows:

LS     MS T R U { &00 LS     MS LS     MS
Up Up Up Up Left Rest of Name Right Format block address Data block address
Address of next variable starting with the same letter

Note that the stored name includes the left brace and it is this which identifies the heap entry as a structure.

The format block is of variable length, and consists of a 32-bit value containing the total data size (in bytes) followed by a linked-list of structure members. The linked list has the same format as the main variable lists (chains) except that instead of actual data values it contains 32-bit offsets into the data block. The format block for the structure STRU{a,b} would be stored in memory as follows (in BBC BASIC for Windows version 6.00a or later):

20 0 0 0 LS     MS &61 &00 0 0 0 0
Left  Total data size Right Left  Link to next Right Name Left Offset into dataRight

&00 &00 &00 &00 &62 &00 10 0 0 0
Left Final link Right Name Left Offset into dataRight

Fixed strings

You can place a string starting at a given location in memory using the indirection operator '$'. For example,
DIM S% 256
$S% = "This is a string"
would place &54 (T) at address S%, &68 (h) at address S%+1 etc. Because the string is placed at a predetermined location in memory it is called a 'fixed' string. Fixed strings are not included in the variable chains and they do not have the overheads associated with a string variable. However, since the length of the string is not stored, an explicit terminator (&0D) is used. Consequently, in the above example, byte S%+16 would be set to &0D. Fixed strings are restricted in length to 65535 bytes (65536 bytes including the terminating &0D).

If you use $$ rather than $ the string in memory is NUL-terminated (&00) rather than CR-terminated (&0D). So for example:

DIM S% 256
$$S% = "This is a string"
would set byte S%+16 to &00.

Array storage

The format of an array in memory consists of three parts:

The format of the array header is very similar to that of an integer variable, except that instead of a 32-bit data value it includes a 32-bit pointer (link) to the parameter block. The header of an array called ARRAY%() would be stored as follows:

LS     MS R R A Y % ( &00 LS     MS
Up Up Up Up Left Rest of Name Right Parameter block address
Address of next variable starting with the same letter

Note that the stored name includes the left bracket (parenthesis) and it is this which identifies the heap entry as an array.

The parameter block is of variable length, and consists of a single byte containing the number of dimensions (suffices) and four bytes for each of the dimensions, containing the size of that dimension (number of rows, columns etc). The parameter block for the array ARRAY%(10,20) would be stored in memory as follows:

2 11 0 0 0 21 0 0 0
Up Left   No. of rows  Right Left   No. of cols  Right
Number of dimensions

Note that the size of each dimension is equal to one greater than the suffix specified in the DIM statement, since the index can take any value from zero to the specified maximum suffix.

The array data follows immediately after the parameter block, the number of elements being equal to the product of the sizes of each dimension. For example in the case of ARRAY%(10,20) the data consists of 11*21 = 231 values. Each data value consists of either one byte (in the case of a byte array), four bytes (in the case of a 32-bit integer numeric array), five bytes (in the case of a 40-bit variant numeric array), six bytes (in the case of a version 5 string array), eight bytes (in the case of a 64-bit integer array, 64-bit variant/double numeric array, structure array or version 6 string array) or ten bytes (in the case of an 80-bit variant numeric array).

Left CONTENTS

CONTINUE Right


Best viewed with Any Browser Valid HTML 3.2!
© Richard Russell 2016