Author Topic: Half-Float Block conversions  (Read 4007 times)

0 Members and 1 Guest are viewing this topic.

Charles Pegge

  • Admin Support Member
  • *****
  • Posts: 4330
    • Oxygen Basic
Half-Float Block conversions
« on: May 24, 2018, 07:03:54 AM »
These procedures are included in MathUtil.inc.

Half-floats are 16bit floats, useful for storing large amounts of float data with limited precision. Recent graphics hardware (including Intel) can use them for colors, normal maps, and vector arrays.

This is an update of previous work, with additional support for +/-infinities and NANS.

When converting from single to half float, any overflows are converted to infinities.

Code: [Select]
'HALF-FLOAT

'see IEEE 754
'https://en.wikipedia.org/wiki/IEEE_754-1985
'
'TYPE     SIZE   SIGN  EXPONENT  FRACTION
'========================================
'HALF     16     1     5         10
'SINGLE   32     1     8         23
'DOUBLE   64     1     11        52
'EXTENDED 80     1     15        63


function FloatToHalfFloat(sys pf,ph,n) as sys
=============================================
'
'USE OF REGISTERS:
'edx sign transform
'eax exponent transform
'ecx fraction transform
'rsi source pointer
'rdi dest pointer
'
mov rsi,pf
mov rdi,ph
(
  dec dword n 'iterator down count
  jl exit
  mov eax,[rsi]
  mov edx,eax
  and edx,0x80000000 'hold sign bit only
  shr edx,0x10       'shift sign bit down 16
  and eax,0x7fffffff 'remove sign bit
  mov ecx,eax        'for significand
  '
  '
  'TEST FOR ZERO
  (
   cmp eax,0
   jz fwd nzero
  )
  'TEST FOR NAN
  (
   cmp eax,0x7f800000
   jle exit
   mov eax, 0x7fff 'set NAN
   jmp fwd nzero
  )
  'TEST FOR INFINITY
  (
   jl exit
   mov eax, 0x7c00 'set infinity
   jmp fwd nzero
  )
  shr eax,0x17       'shift exponent down 23
  shr ecx,0x0d       'reduce fraction 23 bits to 10 bits
  sub eax,0x70       'adjust exponent bias -112 == (15-127)
  '
  'TEST NEGATIVE EXPONENT BIAS (LOSING PRECISION)
  (
    jge exit     'exclude zero or positive bias
    cmp eax,-10
    (
      jg exit
      mov eax,0 'SET ZERO
      jmp fwd nzero
    )
    xchg ecx,eax
    neg ecx      'make positive
    shr eax,cl   'downshift fraction
    mov ecx,eax
    mov eax,0    'zero exponent
  )   
  '
  'TEST EXPONENT FOR OVERFLOW
  (
    cmp eax,0x1f
    jle exit
    mov eax,0x7c00 'CLAMP INFINITY
    jmp fwd nzero
  ) 
  shl eax,0x0a       'place exponent 10 bits up
  and ecx,0x3ff      'mask significand 10 bits
  or  eax,ecx        'combine exponent and significand
  nzero:
  or  eax,edx        'combine sign
  mov [rdi],ax       'store
  add rsi,4          'stride next float
  add rdi,2          'stride next half-float
  repeat
)
return ph
end function


function HalfFloatToFloat(sys ph,pf,n) as sys
=============================================
'
'USE OF REGISTERS:
'edx sign transform
'eax exponent transform
'ecx significand transform
'rsi source pointer
'rdi dest pointer
'
mov rsi,ph
mov rdi,pf
(
  dec dword n 'iterator down count
  jl exit
  xor eax,eax
  mov ax,[rsi]
  mov edx,eax
  and edx,0x8000     'hold sign bit only
  shl edx,0x10       'shift sign bit up 16
  and eax,0x7fff     'remove sign bit
  mov ecx,eax        'for significand
  '
  '
  'TEST FOR ZERO
  (
   cmp eax,0
   jz fwd nzero
  )
  'TEST FOR NAN
  (
   cmp eax,0x7c00
   jle exit
   mov eax, 0x7fffffff 'set NAN
   jmp fwd nzero
  )
  'TEST FOR INFINITY
  (
   jl exit
   mov eax, 0x7f800000 'set infinity
   jmp fwd nzero
  )
  shr eax,0x0A       'shift exponent down 10
  add eax,0x70       'adjust exponent bias +112 == (127-15)
  shl eax,23         'shift exponent into final position
  and ecx,0x3ff      'mask significand 10 bits
  shl ecx,0x0d       'shift significand from 10 bits to 23 bits (13)
  '
  or  eax,ecx        'combine exponent and significand
  nzero:
  or  eax,edx        'combine sign
  mov [rdi],eax      'store
  add rsi,2          'stride next half-float
  add rdi,4          'stride next float
  repeat
)
return ph
end function

« Last Edit: May 26, 2018, 04:52:59 AM by Charles Pegge »

Mike Lobanovsky

  • Hero Member
  • *****
  • Posts: 1993
Re: Half-Float Block conversions
« Reply #1 on: May 24, 2018, 02:42:17 PM »
Here is a manual on practical implementation of a fast algorithm to store and retrieve large amounts of single-precision floating-point data in a half-precision format, built around LUTs and usable in the game dev contexts. It also sheds some light on the requirements to handle corner cases such as infinities and NaNs.

I once re-implemented it in FBSL dynamic assembly and got speeds some 10 to 15% faster both ways on the same type of Intel Core 2 Duo CPU than the suggested C language implementation.

Note that this particular algo is significantly more precise than MS' own implementation of the same in their system Direct3D math library.
Mike
(3.6GHz Intel Core i5 Quad w/ 16GB RAM, nVidia GTX 1060Ti w/ 6GB VRAM, Windows 7 Ultimate Sp1)

Charles Pegge

  • Admin Support Member
  • *****
  • Posts: 4330
    • Oxygen Basic
Re: Half-Float Block conversions
« Reply #2 on: May 26, 2018, 05:00:31 AM »
I've added some more code (above) to handle subnormal values more gracefully, instead of abruptly setting zero when the exponent minimum limit is exceeded.

Code: [Select]
  sub eax,0x70       'adjust exponent bias -112 == (15-127)
  '
  'TEST NEGATIVE EXPONENT BIAS (LOSING PRECISION)
  (
    jge exit     'exclude zero or positive bias
    cmp eax,-10
    (
      jg exit
      mov eax,0 'SET ZERO
      jmp fwd nzero
    )
    xchg ecx,eax
    neg ecx      'make positive
    shr eax,cl   'downshift fraction
    mov ecx,eax
    mov eax,0    'zero exponent
  )   
  '


Mike Lobanovsky

  • Hero Member
  • *****
  • Posts: 1993
Re: Half-Float Block conversions
« Reply #3 on: May 26, 2018, 10:39:00 AM »
Hi Charles,

My interest is purely academic: is there any way whatsoever to avoid the shr eax,cl instruction in your code? All your shifts are immediate, which means very fast (1.5 clocks latency+throughput on modern Intel Core CPUs) and pairable, while this one and only instruction is comparatively slow (3.0 clocks latency+throughput) and non-pairable due to the 8-bit register used.

Is there also a way to avoid non-pairable xchg whose latency+throughput is also 3.0 clocks for the reg,reg case?
Mike
(3.6GHz Intel Core i5 Quad w/ 16GB RAM, nVidia GTX 1060Ti w/ 6GB VRAM, Windows 7 Ultimate Sp1)

Charles Pegge

  • Admin Support Member
  • *****
  • Posts: 4330
    • Oxygen Basic
Re: Half-Float Block conversions
« Reply #4 on: May 26, 2018, 01:06:37 PM »
We can lose the xchg instruction by swapping the roles of eax and ecx. But in any case, the subnormals only represent 9 in 256 cases (random float exponents), requiring a cl-based shift. This is the only kind of computed shift available.
« Last Edit: May 26, 2018, 01:24:59 PM by Charles Pegge »