How to perform inference on an hardware with alpha='auto' #134

lcit · 2024-06-13T09:05:24Z

I'm trying to implement inference on a hardware using the Xilinx ap_fixed for a model quantized with alpha='auto'. With alpha=1 it is straightforward. The weights (after applying the quantizer) can be exported directly to the hardware. When alpha='auto' is more challenging. I have not found an explanation on how to compute the weights and the scale, so I have analyzed the code.
This is an extract of quantized_bits for alpha='auto':

m = K.pow(2.0, K.cast_to_floatx(unsigned_bits))
m_i = K.pow(2.0, K.cast_to_floatx(self.integer))
x = x / m_i
levels = (2**(self.bits-1)-1) * 2 if self.symmetric else (2**self.bits)-1
scale = (K.max(abs(x), axis=axis, keepdims=True) * 2) / levels
v = tf.floor(tf.abs(x) / scale + 0.5)
mask = v < levels / 2
z = tf.sign(x) * tf.where(mask, v, tf.ones_like(v) * levels / 2)
xq = m_i * z / m
xq2 = scale * xq

My understanding is that z contains the integer representation of the weights that utilize the entire range of the type, that is the scale is optimal. xq are the floating point representation of z. and xq2 the quantized weights in floating point representation that are actually used in the convolution during training. These can exceed the range of the type.

To implement this in the hardware I have to save z as the weights and compute scale which is a constant that have to be applied after the convolution. For alpha='po2' it would be the same but the scale can be applied as a bit shift.

If this is true, it would be nice to have a function that return z and scale as quantized_bits does not.
Thanks

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to perform inference on an hardware with alpha='auto' #134

How to perform inference on an hardware with alpha='auto' #134

lcit commented Jun 13, 2024 •

edited

Loading

How to perform inference on an hardware with alpha='auto' #134

How to perform inference on an hardware with alpha='auto' #134

Comments

lcit commented Jun 13, 2024 • edited Loading

lcit commented Jun 13, 2024 •

edited

Loading