Inverse temperature #10

DavidMrd · 2022-12-21T18:11:42Z

Hello, In the original article, they say "In order to avoid very soft decisions in the tree, we introduced an inverse
temperature β to the filter activations prior to calculating the sigmoid." I am not sure, but I think you did not implement this temperature, did you?. Thanks!

xuyxu · 2022-12-22T05:39:35Z

Hi @DavidMrd, beta is not implemented in the current version, may it is suffice to add a discount factor here.

YaserGholizade · 2023-08-11T21:42:04Z

Many thanks for your implementation. It works well.
But I am a little bit confused.
According to the original paper "Distilling a NN Into SDT" and equation 2, in each leaf we should have N values, where their sum is one. (N: number of classes) i.e. each leaf contains a probability vector.
To make a prediction, the model uses the maximum path probabilities to select a leaf and then its probability vector as an output.
But in your code, you considered the path probability as a final value for each leaf and then using them through a fully connected layer to compute the final prediction of the model.

Am I right?
Thanks

xuyxu · 2023-08-12T01:05:05Z

Hi @YaserGholizade, the fully connected layer is treated as the final value on each leaf node. You can see its dimension (code here) is (n_leaf_node, n_classes) for classification.

YaserGholizade · 2023-08-12T20:02:46Z

Thanks @xuyxu
I completely understood how your code works and it is working very well.
But my question was about the difference between your code and the algorithm in the main paper.

According to the main paper, for MNIST classification, each leaf should has ten values (a probability vector). There is not any fully connected layer.
After training the model, for prediction the class of an image, the model select the leaf with maximum path probability and then according the probability vector of that leaf, make the final decision.

But in your code, you assigned the path probability to each leaf ( _mu = _mu * _path_prob). And then feed these values to a fully connected layer self.leaf_nodes = nn.Linear(self.leaf_node_num_, self.output_dim,bias=False).

xuyxu · 2023-08-13T01:26:55Z

The algorithm here is same as that in the raw paper, despite that we use matrices for faster computation. Instead of computing path probabilities one by one, which is very slow, _mu = _mu * _path_prob enables us to compute the path probability to all nodes in one specific layer at the same time. Furthermore, using the fully connected layer to simulate all leaf nodes also enables us to get the weighted sum of leaf node outputs (weighted demermined by the path probability) more quickly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inverse temperature #10

Inverse temperature #10

DavidMrd commented Dec 21, 2022

xuyxu commented Dec 22, 2022

YaserGholizade commented Aug 11, 2023 •

edited

Loading

xuyxu commented Aug 12, 2023

YaserGholizade commented Aug 12, 2023

xuyxu commented Aug 13, 2023

Inverse temperature #10

Inverse temperature #10

Comments

DavidMrd commented Dec 21, 2022

xuyxu commented Dec 22, 2022

YaserGholizade commented Aug 11, 2023 • edited Loading

xuyxu commented Aug 12, 2023

YaserGholizade commented Aug 12, 2023

xuyxu commented Aug 13, 2023

YaserGholizade commented Aug 11, 2023 •

edited

Loading