Author Topic: Half-Float Block conversions  (Read 5069 times)

0 Members and 1 Guest are viewing this topic.

Charles Pegge

• Posts: 4486
Half-Float Block conversions
« on: May 24, 2018, 07:03:54 am »
These procedures are included in MathUtil.inc.

Half-floats are 16bit floats, useful for storing large amounts of float data with limited precision. Recent graphics hardware (including Intel) can use them for colors, normal maps, and vector arrays.

This is an update of previous work, with additional support for +/-infinities and NANS.

When converting from single to half float, any overflows are converted to infinities.

Code: [Select]
`'HALF-FLOAT'see IEEE 754'https://en.wikipedia.org/wiki/IEEE_754-1985''TYPE     SIZE   SIGN  EXPONENT  FRACTION'========================================'HALF     16     1     5         10'SINGLE   32     1     8         23'DOUBLE   64     1     11        52'EXTENDED 80     1     15        63function FloatToHalfFloat(sys pf,ph,n) as sys=============================================''USE OF REGISTERS:'edx sign transform'eax exponent transform'ecx fraction transform'rsi source pointer'rdi dest pointer'mov rsi,pfmov rdi,ph(  dec dword n 'iterator down count  jl exit  mov eax,[rsi]  mov edx,eax  and edx,0x80000000 'hold sign bit only  shr edx,0x10       'shift sign bit down 16  and eax,0x7fffffff 'remove sign bit  mov ecx,eax        'for significand  '  '  'TEST FOR ZERO  (   cmp eax,0   jz fwd nzero  )  'TEST FOR NAN  (   cmp eax,0x7f800000   jle exit   mov eax, 0x7fff 'set NAN   jmp fwd nzero  )  'TEST FOR INFINITY  (   jl exit   mov eax, 0x7c00 'set infinity   jmp fwd nzero  )  shr eax,0x17       'shift exponent down 23  shr ecx,0x0d       'reduce fraction 23 bits to 10 bits  sub eax,0x70       'adjust exponent bias -112 == (15-127)  '  'TEST NEGATIVE EXPONENT BIAS (LOSING PRECISION)  (    jge exit     'exclude zero or positive bias    cmp eax,-10    (      jg exit      mov eax,0 'SET ZERO      jmp fwd nzero    )    xchg ecx,eax    neg ecx      'make positive    shr eax,cl   'downshift fraction    mov ecx,eax    mov eax,0    'zero exponent  )     '  'TEST EXPONENT FOR OVERFLOW  (    cmp eax,0x1f    jle exit    mov eax,0x7c00 'CLAMP INFINITY    jmp fwd nzero  )    shl eax,0x0a       'place exponent 10 bits up  and ecx,0x3ff      'mask significand 10 bits  or  eax,ecx        'combine exponent and significand  nzero:  or  eax,edx        'combine sign  mov [rdi],ax       'store  add rsi,4          'stride next float  add rdi,2          'stride next half-float  repeat)return phend functionfunction HalfFloatToFloat(sys ph,pf,n) as sys=============================================''USE OF REGISTERS:'edx sign transform'eax exponent transform'ecx significand transform'rsi source pointer'rdi dest pointer'mov rsi,phmov rdi,pf(  dec dword n 'iterator down count  jl exit  xor eax,eax  mov ax,[rsi]  mov edx,eax  and edx,0x8000     'hold sign bit only  shl edx,0x10       'shift sign bit up 16  and eax,0x7fff     'remove sign bit  mov ecx,eax        'for significand  '  '  'TEST FOR ZERO  (   cmp eax,0   jz fwd nzero  )  'TEST FOR NAN  (   cmp eax,0x7c00   jle exit   mov eax, 0x7fffffff 'set NAN   jmp fwd nzero  )  'TEST FOR INFINITY  (   jl exit   mov eax, 0x7f800000 'set infinity   jmp fwd nzero  )  shr eax,0x0A       'shift exponent down 10  add eax,0x70       'adjust exponent bias +112 == (127-15)  shl eax,23         'shift exponent into final position  and ecx,0x3ff      'mask significand 10 bits  shl ecx,0x0d       'shift significand from 10 bits to 23 bits (13)  '  or  eax,ecx        'combine exponent and significand  nzero:  or  eax,edx        'combine sign  mov [rdi],eax      'store  add rsi,2          'stride next half-float  add rdi,4          'stride next float  repeat)return phend function`
« Last Edit: May 26, 2018, 04:52:59 am by Charles Pegge »

Mike Lobanovsky

• Hero Member
• Posts: 1993
Re: Half-Float Block conversions
« Reply #1 on: May 24, 2018, 02:42:17 pm »
Here is a manual on practical implementation of a fast algorithm to store and retrieve large amounts of single-precision floating-point data in a half-precision format, built around LUTs and usable in the game dev contexts. It also sheds some light on the requirements to handle corner cases such as infinities and NaNs.

I once re-implemented it in FBSL dynamic assembly and got speeds some 10 to 15% faster both ways on the same type of Intel Core 2 Duo CPU than the suggested C language implementation.

Note that this particular algo is significantly more precise than MS' own implementation of the same in their system Direct3D math library.
Mike
(3.6GHz Intel Core i5 Quad w/ 16GB RAM, nVidia GTX 1060Ti w/ 6GB VRAM, Windows 7 Ultimate Sp1)

Charles Pegge

• Posts: 4486
Re: Half-Float Block conversions
« Reply #2 on: May 26, 2018, 05:00:31 am »
I've added some more code (above) to handle subnormal values more gracefully, instead of abruptly setting zero when the exponent minimum limit is exceeded.

Code: [Select]
`  sub eax,0x70       'adjust exponent bias -112 == (15-127)  '  'TEST NEGATIVE EXPONENT BIAS (LOSING PRECISION)  (    jge exit     'exclude zero or positive bias    cmp eax,-10    (      jg exit      mov eax,0 'SET ZERO      jmp fwd nzero    )    xchg ecx,eax    neg ecx      'make positive    shr eax,cl   'downshift fraction    mov ecx,eax    mov eax,0    'zero exponent  )     '`

Mike Lobanovsky

• Hero Member
• Posts: 1993
Re: Half-Float Block conversions
« Reply #3 on: May 26, 2018, 10:39:00 am »
Hi Charles,

My interest is purely academic: is there any way whatsoever to avoid the shr eax,cl instruction in your code? All your shifts are immediate, which means very fast (1.5 clocks latency+throughput on modern Intel Core CPUs) and pairable, while this one and only instruction is comparatively slow (3.0 clocks latency+throughput) and non-pairable due to the 8-bit register used.

Is there also a way to avoid non-pairable xchg whose latency+throughput is also 3.0 clocks for the reg,reg case?
Mike
(3.6GHz Intel Core i5 Quad w/ 16GB RAM, nVidia GTX 1060Ti w/ 6GB VRAM, Windows 7 Ultimate Sp1)