RTR logo

BBC BASIC for Windows

The Assembler



Introduction to the assembler

BBC BASIC for Windows includes an 80386/80486 assembler (it also accepts some Pentium instructions). This assembler is similar to the 6502 assembler on the BBC Micro and the Z80 assembler in BBC BASIC(Z80) and it is entered in the same way. That is, '[' enters assembler mode and ']' exits assembler mode. Unlike the 6502 or Z80 assemblers, the 80x86 assembler attempts to detect multiply-defined labels. If a label is found to have an existing non-zero value during the first pass of an assembly (OPT 0, 1, 8, 9), a 'Multiple label' error is reported (error code 3).

Assembler statements

An assembly language statement consists of three elements; an optional label, an instruction opcode and an operand. A comment may follow the operand field. If an instruction opcode follows a label they must be separated by at least one space. Similarly, the operand must also be separated from the instruction opcode by a space. Opcodes are not case sensitive.

Assembly language statements are terminated by a colon (:) or end of line (<RET>). When terminated by a colon, it is necessary to leave a space between the colon and a preceding segment register name otherwise it may be misinterpreted as a segment override. See the Segment Override sub-section for details.

Labels

Labels are defined by preceding them with a full stop (.). When the assembler encounters such a label, a numeric variable is created containing the current value of the Program Counter (P%). Such variables are accessible in the normal way outside of the assembler.

In the example shown later under the heading The assembly process, two labels are defined and used. Labels have the same rules as standard BBC BASIC for Windows variable names; they should start with a letter and not start with a keyword.

Comments

You can insert comments into assembly language programs by preceding them with a semi-colon (;). In assembly language, a comment ends at the end of the statement. Thus, the following example will work (but it's a bit untidy):
[;start assembly language program
etc
MOV EAX,ECX ;In-line comment : POP EBX ;start add
JNZ loop ;Go back if not finished : RET ;Return
etc
;end assembly language program:]

Differences from Intel syntax

The assembler generally conforms to Intel assembly language syntax. However, there are a number of minor differences which are described below.

Jumps, calls and returns

Unconditional jumps, calls and returns are assumed to be within the current code segment. Short (8 bit displacement) jumps, far (inter segment) calls, far jumps and far returns must be explicitly specified by using the following mnemonics:
Short jump  JMPS or JMP SHORT
Far callCALLF or CALL FAR
Far jumpJMPF or JMP FAR
Far returnRETF
Note that since BBC BASIC for Windows is a 32-bit program, and the assembler will normally be used to generate 32-bit code in a 'flat' address space, the segment size is 2^32 bytes (4 Gbytes!). You are therefore most unlikely to want to perform inter-segment jumps or calls.

Conditional jumps are assumed to be short (8-bit displacement). Near conditional jumps (32-bit displacement) must be explicitly specified by adding the NEAR prefix, for example:

JZ NEAR dest
JNC NEAR label
Note that the LOOP and JECXZ instructions (and their variants) can use only 8-bit displacements. You must ensure that the destination is within range.

Memory operands

Memory operands must be placed in square brackets in order to distinguish them from immediate operands. For example,
MOV EAX,[store]
will load the EAX register with the contents of memory location 'store'. However,
MOV EAX,store
will load the EAX register with the 32 bit value of BASIC variable 'store', i.e. the address of the memory location.

String operations

The string operations must have the data size (byte, word or double-word) explicitly specified in the instruction mnemonic as listed below.
Compare memory - byteCMPSB
Compare memory - wordCMPSW
Compare memory - double-wordCMPSD
Compare AL (byte)SCASB
Compare AX (word)SCASW
Compare EAX (double-word)SCASD
Load from memory - byteLODSB
Load from memory - wordLODSW
Load from memory - double-word LODSD
Store to memory - byteSTOSB
Store to memory - wordSTOSW
Store to memory - double-wordSTOSD
Move byteMOVSB
Move wordMOVSW
Move double-wordMOVSD

Segment override

When segment overrides are necessary, they must always be entered explicitly. The assembler will not insert them automatically. For example,
MOV EAX,CS:[data]
will load the EAX register with the contents of the address 'data' in the code segment. Since BBC BASIC for Windows is a 32-bit program and the assembler will normally be used to generate 32-bit code in a 'flat' address space, segment overrides will very rarely be required.

When assembly language statements are separated by colons, it is necessary to leave a space between the colon and a preceding segment register name. If the space is missing, the assembler will misinterpret the colon as a segment override. For example,

PUSH CS:MOV EAX,0
will give rise to an error, but
PUSH CS :MOV EAX,0
will be accepted.

Data size ambiguities

Some assembly language instructions are ambiguous as to whether a byte, word or double-word value is to be acted upon. When this is so, an explicit 'byte ptr', 'word ptr' or 'dword ptr' operator must be used. These can be simplified to 'byte', 'word' or 'dword' respectively. For example:
INC BYTE PTR [EBX]
MOV WORD PTR [count],0
ADD DWORD [ESI],offset
If this operator is omitted, BBC BASIC for Windows will issue a 'Size needed' error message (error code 2).

Loop instructions

The 'loop' instructions (loop, loope, loopne, loopnz, loopz) by default decrement (and test) the 32-bit ECX register. To specify that they should instead decrement the 16-bit CX register use the opcodes loopw, loopew, loopnew, loopnzw and loopzw. For completeness, the opcodes loopd, looped, loopned, loopnzd and loopzd are also accepted; these behave in an identical fashion to the opcodes without the final 'd' (they use the ECX register).
LOOP label
LOOPW label
LOOPD label

Based-indexed operands

The 16-bit composite based-indexed operands are only accepted in the preferred forms with the base register specified first. For example,
[bp+di], [bp+si], [bx+di], [bx+si]
are accepted, but
[di+bp], [si+bp], [di+bx], [si+bx]
are not.

This restriction does not apply to the 32-bit memory operands. For example, all the following are accepted:

[eax+2*ecx], [edx+ebx*4], [eax*3], [ebp+esi]

Indexed memory operands

Indexed memory operands with constant offsets are accepted in the following formats:
[index]+offset
[index+offset]
offset[index]
Where 'index' is an index or base register such as 'ebx', 'ebp+esi', etc, and 'offset' is a numeric expression.

Floating-point operands

(BBC BASIC for Windows version 2.00a or later only)
Locations in the floating-point register stack are referred to as ST0, ST1, ST2, ST3, ST4, ST5, ST6 and ST7. When a stack operand is required it must always be explicitly specified.

In addition to dword (or dword ptr) the data-size modifiers qword (or qword ptr) and tbyte may also be specified. These refer to a 64-bit (double float or long long integer) or 80-bit (temporary float or 18-digit BCD) data size respectively. The only instructions which can take a tbyte operand are fbld, fbstp, fld and fstp.

With BBC BASIC for Windows versions earlier than 3.00a the fadd, fsub, fmul and fdiv instructions are not accepted in their no-operand forms. Use the following equivalents instead:

Instead of fadd use faddp st1,st0
Instead of fsub use fsubp st1,st0
Instead of fmul use fmulp st1,st0
Instead of fdiv use fdivp st1,st0

Numeric and string constants

You can store constants within your assembly language program using the define byte (DB), define word (DW) and define double-word (DD) pseudo-operation commands. These will create 1 byte, 2 byte and 4 byte items respectively. Define byte (DB) may alternatively be followed by a string operand. In which case, the bytes comprising the string will be placed in memory at the current assembly location. As discussed later, this will be governed by P% or O% depending on the OPT value used.

Be careful if you use DB, DW or DD to define locations in which to store variable data rather than constants. On some modern processors writing data to a memory location in close proximity to the code can dramatically reduce execution speed (this is to support self-modifying code). If speed is important ensure that any data storage locations to which you write frequently are at least 2 Kbytes away from the code accessing them.

Define byte - DB

Byte constant

DB can be used to set one byte of memory to a particular value. For example,
.data DB 15
      DB 9
will set two consecutive bytes of memory to 15 and 9 (decimal). The address of the first byte will be stored in the variable 'data'.

String constant

DB can be used to load a string of ASCII characters into memory. For example,
JMPS continue; jump round the data
.string DB "This is a test message"
DB &D
.continue; and continue the process
will load the string 'This is a test message' followed by a carriage-return into memory. The address of the start of the message is loaded into the variable 'string'. This is equivalent to the following program segment:
JMPS continue;	jump round the data
.string;	leave assembly and load the string
]
$P%="This is a test message" REM starting at P%
P%=P%+LEN($P%)+1 REM adjust P% to next free byte
[
OPT opt; reset OPT
.continue;	and continue the program

Define word - DW

DW can be used to set two bytes of memory to a particular value. The first byte is set to the least significant byte of the number and the second to the most significant byte. For example,
.data DW &90F
will have the same result as the Byte constant example above.

Define double-word - DD

DD can be used to set four bytes of memory to a particular value. The first byte is set to the least significant byte of the number and the fourth to the most significant byte. For example,
.data DD &90F0D10
will have the same result as,
.data DB 16       .data DB &10
      DB 13   or        DB &D
      DB 15             DB &F
      DB 9              DB &9

Opcodes

The following opcodes are accepted by the assembler. Opcodes are not case-sensitive, they may be given in capitals or lower-case:

aaaaadaamaas
adcaddandbound
bsfbsrbswapbt
btcbtrbtscall
cbwcdqclccld
clicmccmpcmpsb
cmpsdcmpswcmpxchgcpuid
cwdcwdedaadas
decdiventerhlt
idivimulininc
insbinsdinswint
intoinvdinvlpgiret
iretdjaejajbe
jbjcjejge
jgjlejljmp
jnaejnajnbejnb
jncjnejngejng
jnlejnljnojnp
jnsjnzjojpe
jpojpjsjz
lahfldslealeave
leslfslgslock
lodsblodsdlodswloop
loopeloopneloopnzloopz
loopdloopedloopnedloopnzd
loopzdloopwloopewloopnew
loopnzwloopzwlssmov
movsbmovsdmovswmovsx
movzxmulnegnop
notoroutoutsb
outsdoutswpoppopa
popadpopfpopfdpush
pushapushadpushfpushfd
rclrcrrdtscrep
reperepnerepnzrepz
retretfretnrol
rorsahfsalsar
sbbscasbscasdscasw
setaesetasetbesetb
setcsetesetgesetg
setlesetlsetnaesetna
setnbesetnbsetncsetne
setngesetngsetnlesetnl
setnosetnpsetnssetnz
setosetpesetposetp
setssetzshlshld
shrshrd stcstd
stistosbstosdstosw
subtestwaitwbinvd
xaddxchgxlatxor

Floating-point opcodes

(BBC BASIC for Windows version 2.00a or later only) The following floating-point opcodes are accepted by the assembler:

f2xm1fabsfaddfaddp
fbldfbstpfchsfclex
fcomfcompfcomppfcos
fdecstpfdivfdivpfdivr
fdivrpffreefiaddficom
ficompfidivfidivrfild
fimulfincstpfinitfist
fistpfisubfisubrfld
fld1fldl2efldl2tfldlg2
fldln2fldpifldzfldcw
fldenvfmulfmulpfnclex
fninitfnopfnsavefnstcw
fnstenvfnstswfpatnfprem
fprem1fptanfrndintfrstor
fsavefscalefsinfsincos
fsqrtfstfstpfstcw
fstenvfstswfsubfsubp
fsubrfsubrpftstfucom
fucompfucomppfxamfxch
fxtractfyl2xfyl2xp1 

MMX opcodes

(BBC BASIC for Windows version 4.00a or later only) The following MMX (Multimedia extension) opcodes are accepted by the assembler:

emmsmaskmovqmovdmovntq
movqpackssdwpacksswbpackuswb
paddbpaddwpadddpaddsb
paddswpaddusbpadduswpand
pandnpavgbpcmpeqbpcmpeqw
pcmpeqdpcmpgtbpcmpgtwpcmpgtd
pextrwpinsrwpmaddwdpmaxsw
pmaxubpminswpminubpmovmskb
pmulhuwpmulhwpmullwpor
psadbwpshufwpsllwpslld
psllqpsrawpsradpsrlw
psrldpsrlqpsubbpsubw
psubdpsubsbpsubswpsubusb
psubuswpunpckhbwpunpckhwdpunpckhdq
punpcklbwpunpcklwdpunpckldqpxor


Using BASIC input/output

An assembly language program may access some of BASIC's input/output routines (e.g. the VDU drivers) by calling the following routines by name:
CALL "osbget" ; Read byte from file to AL, EBX contains channel number
CALL "osbput" ; Write byte from AL to file, EBX contains channel number
CALL "osrdch" ; Read keyboard character to AL
CALL "osasci" ; Write AL to the VDU drivers (plus LF if CR)
CALL "osnewl" ; Write LF,CR
CALL "oswrch" ; Write AL to the VDU drivers
CALL "osword" ; Read character dot pattern, EDX addresses buffer
CALL "osbyte" ; Read character at cursor position to AL
CALL "oscli" ; Various OS commands, EDX addresses string
CALL "oskey" ; Equivalent to INKEY, EAX contains timeout value
In the case of 'oscli', the EDX register should point to a CR-terminated string containing the command to be executed. In the case of 'oskey' the carry flag is cleared if no key was pressed within the timeout period.

The following assembly-language program would clear the screen (text viewport):

.clrscn
MOV AL,12 ; VDU 12 is CLS
CALL "oswrch"
RET

Calling the Windows API

An assembly language program may call Windows™ API routines by name. For example, the following program would generate the system warning sound:
.beep
push 48 ; Put the parameter on the stack
call "MessageBeep"
ret
When passing multiple parameters you must be careful to push them in 'reverse order' so they end up in the correct sequence on the stack. So for example:
SYS "SetWindowPos", @hwnd%, 0, xpos%, ypos%, 0, 0, 5
would become in assembly language:
push 5
push 0
push 0
push ypos%
push xpos%
push 0
push @hwnd%
call "SetWindowPos" 

Reserving memory

The program counter

Machine code instructions are assembled as if they were going to be placed in memory at the addresses specified by the program counter, P%. Their actual location in memory may be determined by O% depending on the value of OPT used. You must make sure that P% (or O%) is pointing to a free area of memory before your program begins assembly. In addition, you need to reserve the area of memory that your machine code program will use so that it is not overwritten at run time. You can reserve memory by using a special version of the DIM statement.

Using DIM to reserve memory

Using the special version of the DIM statement to reserve an area of memory is the simplest and safest way (see the keyword DIM for more details). For example,
DIM code 20: REM Note the absence of brackets
will reserve 21 bytes of code (byte 0 to byte 20) and load the variable 'code' with the start address of the reserved area. You can then set P% (or O%) to the start of that area. The example below reserves an area of memory 100 bytes long and sets P% to the first byte of the reserved area.
DIM sort% 99
P%=sort%

Length of reserved memory

You must reserve an area of memory which is sufficiently large for your machine code program before you assemble it, but you may have no real idea how long the program will be until after it is assembled. How then can you know how much memory to reserve? Unfortunately, the answer is that you can't. However, you can add to your program to find the length used and then change the memory reserved by the DIM statement to the correct amount.

In the example below, a large amount of memory is initially reserved. To begin with, a single pass is made through the assembly code and the length needed for the code is calculated (lines 100 to 120). After a CLEAR, the correct amount of memory is reserved (line 140) and a further two passes of the assembly code are performed as usual. Your program should not, of course, subsequently try to use variables set before the clear statement. If you use a similar structure to the example and place the program lines which initiate the assembly function at the start of your program, you can place your assembly code anywhere you like and still avoid this problem.

100 DIM free -1, code HIMEM-free-2000
110 PROC_ass(0)
120 L%=P%-code
130 CLEAR
140 DIM code L%
150 PROC_ass(0)
160 PROC_ass(2)

- - -
Put the rest of your program here.
- - -

1000 DEF PROC_ass(opt)
10010 P%=code
10020 [OPT opt
- - -
Assembler code program.
- - -

11000 ]
11010 ENDPROC

Initial setting of the program counter

The program counters, P%, and O% are initialised to zero. Using the assembler without first setting P% (and O%) is liable to crash BBC BASIC for Windows.


The assembly process

OPT

The only assembly directive is OPT. As with the 6502 assembler, 'OPT' controls the way the assembler works, whether a listing is displayed and whether errors are reported. OPT should be followed by a number in the range 0 to 15. The way the assembler functions is controlled by the four bits of this number in the following manner.

Bit 0 - LSB

Bit 0 controls the listing. If it is set, a listing is displayed.

Bit 1

Bit 1 controls error reporting. If it is set, the No such variable and Jump out of range errors are reported as normal, otherwise they are suppressed. This bit should be reset in the first pass and set in the second pass.

Bit 2

Bit 2 controls where the assembled code is placed. If bit 2 is set, code is placed in memory starting at the address specified by O%. However, the program counter (P%) is still used by the assembler for calculating the instruction addresses.

Bit 3

(BBC BASIC for Windows version 3.00a or later only)
Bit 3 controls limit checking. If bit 3 is set the code address (P% or O% as appropriate) is checked against the current value of L%. If greater than or equal to L% the Address out of range error results. For example:
DIM P% 99, L% -1
[OPT 8
will cause the assembler to issue an error if the code size exceeds 100 bytes.

Assembly at a different address

In general, machine code will only run properly if it is in memory at the addresses for which it was assembled. With BBC BASIC for Windows the memory addresses occupied by your program and data (and thus your assembly-language program) are allocated by Windows™ each time BASIC is run, so you cannot assume that they will always be the same. Thus the option of assembling to a different area of memory is of little use. However, this facility has been retained for compatibility with other versions of BBC BASIC and for special purposes (e.g. for assembling 'stand-alone' code which will eventually be programmed into PROM).

OPT summary

OPT valueLimit checkCode stored at Errors reportedListing generated
0NoP% NoNo
1NoP% NoYes
2NoP% YesNo
3NoP% YesYes
4NoO% NoNo
5NoO% NoYes
6NoO% YesNo
7NoO% YesYes
8YesP% NoNo
9YesP% NoYes
10YesP% YesNo
11YesP% YesYes
12YesO% NoNo
13YesO% NoYes
14YesO% YesNo
15YesO% YesYes

How the assembler works

The assembler works line by line through the assembly-language code. When it finds a label declared it generates a BBC BASIC for Windows variable with that name and loads it with the current value of the program counter (P%). This is fine all the while labels are declared before they are used. However, labels are often used for forward jumps and no variable with that name would exist when it was first encountered. When this happens, a 'No such variable' error occurs. If error reporting has not been disabled, this error is reported and BBC BASIC for Windows returns to the direct mode in the normal way. If error reporting has been disabled the current value of the program counter is used in place of the address which would have been found in the variable, and assembly continues. By the end of the assembly process the variable will exist (assuming the code is correct), but this is of little use since the assembler cannot 'back track' and correct the errors. However, if a second pass is made through the assembly code, all the labels will exist as variables and errors will not occur. The example below shows the result of two passes through a (completely futile) demonstration program. Twelve bytes of memory are reserved for the program (if the program was run, it would 'doom-loop' from line 50 to 70 and back again). The program disables error reporting by using OPT 1.
10 DIM code 12
20 FOR opt=1 TO 3 STEP 2
30 P%=code
40 [OPT opt
50 .jim JMP fred
60 DW &2345
70 .fred JMP jim
80 ]
90 NEXT
This is the first pass through the assembly process (note that the 'JMP fred' instruction jumps to itself):
RUN
030A18A9                                OPT opt
030A18A9 E9 FB FF FF FF       .jim      JMP fred
030A18AE 45 23                          DW &2345
030A18B0 E9 F4 FF FF FF       .fred     JMP jim
This is the second pass through the assembly process (note that the 'JMP fred' instruction now jumps to the correct address):
030A18A9                                OPT opt
030A18A9 E9 02 00 00 00       .jim      JMP fred
030A18AE 45 23                          DW &2345
030A18B0 E9 F4 FF FF FF       .fred     JMP jim
Generally, if labels have been used, you must make two passes through the assembly language code to resolve forward references. This can be done using a FOR...NEXT loop. Normally, the first pass should be with OPT 0 (or OPT 4, 8, 12) and the second pass with OPT 2 (or 6, 10, 14). If you want a listing, use OPT 3 (or 7, 11, 15) for the second pass. During the first pass, a table of variables giving the address of the labels is built. Labels which have not yet been included in the table (forward references) will generate the address of the current op-code. The correct address will be generated during the second pass.


Conditional assembly and macros

Introduction

Most machine code assemblers provide conditional assembly and macro facilities. The assembler does not directly offer these facilities, but it is possible to implement them by using other features of BBC BASIC for Windows.

Conditional assembly

You may wish to write a program which makes use of special facilities and which will be run on different types of computer. The majority of the assembly code will be the same, but some of it will be different. In the example below, different output routines are assembled depending on the value of 'flag'.
DIM code 200
FOR pass=0 TO 3 STEP 3
  [OPT pass
  .start     - - -
             - - - code - - -
             - - - :]
  :
  IF flag  [OPT  pass: - code for routine 1 -:]
  IF NOT flag [OPT pass: - code for routine 2 - :]
  :
  [OPT pass
  .more_code - - -
             - - - code - - -
             - - -:]
NEXT

Macros

Within any machine code program it is often necessary to repeat a section of code a number of times and this can become quite tedious. You can avoid this repetition by defining a macro which you use every time you want to include the code. The example below uses a macro to pass a character to the screen or the auxiliary output. Conditional assembly is used within the macro to select either the screen or the auxiliary output depending on the value of op_flag.

It is possible to suppress the listing of the code in a macro by forcing bit 0 of OPT to zero for the duration of the macro code. This can most easily be done by ANDing the value passed to OPT with 14. This is illustrated in PROC_screen and PROC_aux in the example below.

DIM code 200
op_flag=TRUE
FOR pass=0 TO 3 STEP 3
  [OPT pass
  .start   - - -
           - - - code - - -
           - - -
: 
  OPT FN_select(op_flag); Include code depending on op_flag
:
           - - -
           - - - code - - -
           - - -:]
NEXT
END
:
:
REM Include code depending on value of op_flag
:
DEF FN_select(op_flag)
IF op_flag PROC_screen ELSE PROC_aux
=pass
REM Return original value of OPT.  This is a
REM bit artificial, but necessary to insert
REM some BASIC code in the assembly code.
:
DEF PROC_screen
[OPT pass AND 14
...code...
]
ENDPROC
:
DEF PROC_aux
[OPT pass AND 14
...code...
]
ENDPROC
The use of a function call to incorporate the code provides a neat way of incorporating the macro within the program and allows parameters to be passed to it. The function should return the original value of OPT.

Left CONTENTS

CONTINUE Right


Best viewed with Any Browser Valid HTML 3.2!
© Richard Russell 2007