User Tools

Site Tools


optimising_20integer_20division

Optimising integer division

by Richard Russell, July 2010

The integer division instructions div (unsigned) and idiv (signed) are the slowest of all the integer instructions, sometimes taking tens of clock cycles. When the divisor is a constant (that is, it is known when the code is assembled) a significantly faster execution can be achieved by using a multiplication (effectively by the reciprocal of the divisor) followed by a shift.

A difficulty with this approach is that it is non-trivial to devise an instruction sequence which is guaranteed to produce the identical result to the division instruction it replaces. Listed at the end of this article are two assembler macro functions FN_udiv and FN_sdiv which automatically generate the optimum code sequence for any divisor, giving identical results to the div and idiv instructions respectively.

It is probably easiest to illustrate the use of these macros by example. Suppose you want to divide the unsigned value in the ebx register by 123; the conventional code would be something like this:

00B4174C 8B C3                          mov eax,ebx
00B4174E 33 D2                          xor edx,edx
00B41750 B9 7B 00 00 00                 mov ecx,123
00B41755 F7 F1                          div ecx

This will return the unsigned quotient in the eax register.

To speed up the operation use the FN_udiv macro as follows:

00B423E2 B8 53 08 34 85 F7 E3
         05 53 08 34 85 83 D2
         00 C1 EA 06                    OPT FN_udiv("ebx", 123)

Although the code is longer (18 bytes compared with 11) the execution will be significantly faster. Note that the quotient is returned in edx not in eax.

The actual code generated by the macro is as follows:

00B423E2 B8 53 08 34 85                 mov eax,&85340853
00B423E7 F7 E3                          mul ebx
00B423E9 05 53 08 34 85                 add eax,&85340853
00B423EE 83 D2 00                       adc edx,0
00B423F1 C1 EA 06                       shr edx,6


Suppose instead that one wishes to divide the signed value stored in the memory location [edi] by 123. The conventional code would be something like this:

00B42424 8B 07                          mov eax,[edi]
00B42426 99                             cdq
00B42427 B9 7B 00 00 00                 mov ecx,123
00B4242C F7 F9                          idiv ecx

This will return the signed quotient in the eax register.

To speed up the operation use the FN_sdiv macro as follows:

00B423EE B8 15 02 4D 21 F7 2F
         8B 07 C1 FA 04 C1 E8
         1F 03 D0                       OPT FN_sdiv("dword [edi]", 123)

Again the the quotient is returned in edx not in eax.

The actual code generated by the macro in this case is as follows:

00B423EE B8 15 02 4D 21                 mov eax,&214D0215
00B423F3 F7 2F                          imul dword [edi]
00B423F5 8B 07                          mov eax,dword [edi]
00B423F7 C1 FA 04                       sar edx,4
00B423FA C1 E8 1F                       shr eax,31
00B423FD 03 D0                          add edx,eax


In the case of FN_udiv the dividend can be any register (except eax) or a memory location. For FN_sdiv the dividend can be any register (except eax and edx) or a memory location. The divisor can be any 32-bit value except zero.

Here is the code of the macros:

        REM Generate assembler code for fast unsigned division by a constant
        REM Dividend may be a memory location or a register other than eax
        DEF FN_udiv(dividend$, d%)
        LOCAL opt%, i%, l%, f#, i#
 
        opt% = ?419 >> 4
 
        l% = FNlog2(d%)
        f# = 2.0#^(l%+32) / FNuint(d%)
        i# = INT(f#)
 
        IF f# = i# THEN
          PROCasm(opt%, "mov edx,"+dividend$)
        ELSEIF (f# - i#) < 0.5# THEN;
          i% = FNintu(i#)
          WHILE (i% AND 1)=0 AND l%>0
            i% = i% >>> 1
            l% -= 1
          ENDWHILE
          PROCasm(opt%, "mov eax,&"+STR$~i%)
          PROCasm(opt%, "mul "+dividend$)
          PROCasm(opt%, "add eax,&"+STR$~i%)
          PROCasm(opt%, "adc edx,0")
        ELSE
          i% = FNintu(i#+1)
          WHILE (i% AND 1)=0 AND l%>0
            i% = i% >>> 1
            l% -= 1
          ENDWHILE
          PROCasm(opt%, "mov eax,&"+STR$~i%)
          PROCasm(opt%, "mul "+dividend$)
        ENDIF
        IF l% PROCasm(opt%, "shr edx,"+STR$l%)
        = opt%
 
        REM Generate assembler code for fast *signed* division by a constant
        REM Dividend may be a memory location or a register other than eax or edx
        DEF FN_sdiv(dividend$, d%)
        LOCAL opt%, e%, i%, l%, j#, k#, l#, m#
 
        opt% = ?419 >> 4
 
        e% = d%
        d% = ABS(d%)
        l% = FNlog2(d%)
 
        IF (d% AND (d%-1)) = 0 THEN
          PROCasm(opt%, "mov eax,"+dividend$)
          PROCasm(opt%, "cdq")
          PROCasm(opt%, "and edx,&"+STR$~(d%-1))
          PROCasm(opt%, "add edx,eax")
          IF l% PROCasm(opt%, "sar edx,"+STR$l%)
          IF e%<0 PROCasm(opt%, "neg edx")
          = opt%
        ENDIF
 
        j# = 2.0#^31 / d%
        j# = INT(d% * (j# - INT(j#)))
        k# = INT(2.0#^(l%+32) / (2.0#^31 - j#))
        l# = INT(2.0#^(l%+32) / d%)
        m# = INT((2.0#^(l%+32) + k#) / d%)
 
        WHILE INT(l#/2)<INT(m#/2) AND l%>0
          l# = INT(l#/2)
          m# = INT(m#/2)
          l% -= 1
        ENDWHILE
 
        i% = FNintu(m#)
        PROCasm(opt%, "mov eax,&"+STR$~i%)
        PROCasm(opt%, "imul "+dividend$)
        PROCasm(opt%, "mov eax,"+dividend$)
        IF (i% AND &80000000) PROCasm(opt%, "add edx,eax")
        IF l% PROCasm(opt%, "sar edx,"+STR$l%)
        PROCasm(opt%, "shr eax,31")
        PROCasm(opt%, "add edx,eax")
        IF e%<0 PROCasm(opt%, "neg edx")
        = opt%
 
        DEF FNlog2(i%)
        LOCAL t%
        REPEAT
          i% = i% >>> 1
          t% += 1
        UNTIL i% = 0
        = t% - 1
 
        DEF PROCasm(opt%, code$)
        LOCAL chan%
        code$ = "[OPT " + STR$(3) + " : " + code$ + " : ]"
        chan% = OPENOUT(@tmp$+"ASMDIV")
        PRINT #chan%, CHR$(LEN(code$)+4) + CHR$0 + CHR$0 + code$ : BPUT #chan%,0
        CLOSE #chan%
        CALL @tmp$+"ASMDIV"
        ENDPROC
 
        DEF FNintu(N) = ((N / 2) << 1) - (INT(N / 2) <> N / 2)
        DEF FNuint(N%) = (N% >>> 1) * 2 + (N% AND 1)
This website uses cookies for visitor traffic analysis. By using the website, you agree with storing the cookies on your computer.More information
optimising_20integer_20division.txt · Last modified: 2018/04/13 10:45 by richardrussell