Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pos_inds.numel() == 0 is not allowed in fact #25

Open
MzeroMiko opened this issue Apr 9, 2022 · 2 comments
Open

pos_inds.numel() == 0 is not allowed in fact #25

MzeroMiko opened this issue Apr 9, 2022 · 2 comments

Comments

@MzeroMiko
Copy link

MzeroMiko commented Apr 9, 2022

in fcos3d.py: FCOS3DLoss, it writes

if pos_inds.numel() == 0:
            losses = {
                "loss_box3d_quat": box3d_quat.sum() * 0.,
                "loss_box3d_proj_ctr": box3d_ctr.sum() * 0.,
                "loss_box3d_depth": box3d_depth.sum() * 0.,
                "loss_box3d_size": box3d_size.sum() * 0.,
                "loss_conf3d": box3d_conf.sum() * 0.
            }
            return losses

        if len(labels) != len(box3d_targets):
            raise ValueError(
                f"The size of 'labels' and 'box3d_targets' does not match: a={len(labels)}, b={len(box3d_targets)}"
            )

        num_classes = self.num_classes if not self.class_agnostic else 1

        box3d_quat_pred = cat([x.permute(0, 2, 3, 1).reshape(-1, 4, num_classes) for x in box3d_quat])
        box3d_ctr_pred = cat([x.permute(0, 2, 3, 1).reshape(-1, 2, num_classes) for x in box3d_ctr])
        box3d_depth_pred = cat([x.permute(0, 2, 3, 1).reshape(-1, num_classes) for x in box3d_depth])
        box3d_size_pred = cat([x.permute(0, 2, 3, 1).reshape(-1, 3, num_classes) for x in box3d_size])
        box3d_conf_pred = cat([x.permute(0, 2, 3, 1).reshape(-1, num_classes) for x in box3d_conf])

but it should be fixed as

if len(labels) != len(box3d_targets):
            raise ValueError(
                f"The size of 'labels' and 'box3d_targets' does not match: a={len(labels)}, b={len(box3d_targets)}"
            )

        num_classes = self.num_classes if not self.class_agnostic else 1

        box3d_quat_pred = cat([x.permute(0, 2, 3, 1).reshape(-1, 4, num_classes) for x in box3d_quat])
        box3d_ctr_pred = cat([x.permute(0, 2, 3, 1).reshape(-1, 2, num_classes) for x in box3d_ctr])
        box3d_depth_pred = cat([x.permute(0, 2, 3, 1).reshape(-1, num_classes) for x in box3d_depth])
        box3d_size_pred = cat([x.permute(0, 2, 3, 1).reshape(-1, 3, num_classes) for x in box3d_size])
        box3d_conf_pred = cat([x.permute(0, 2, 3, 1).reshape(-1, num_classes) for x in box3d_conf])

        ## ori author got wrong here, they put it above cut
        if pos_inds.numel() == 0:
            losses = {
                "loss_box3d_quat": box3d_quat_pred.sum() * 0.,
                "loss_box3d_proj_ctr": box3d_ctr_pred.sum() * 0.,
                "loss_box3d_depth": box3d_depth_pred.sum() * 0.,
                "loss_box3d_size": box3d_size_pred.sum() * 0.,
                "loss_conf3d": box3d_conf_pred.sum() * 0.
            }
            return losses

other wise, .sum() is not allowed

in fcos2d.py: FCOS2DLoss, it write:

if pos_inds.numel() == 0:
            losses = {
                "loss_cls": loss_cls,
                "loss_box2d_reg": box2d_reg_pred.sum() * 0.,
                "loss_centerness": centerness_pred.sum() * 0.,
            }
            return losses, {}

        # NOTE: The rest of losses only consider foreground pixels.
        box2d_reg_pred = box2d_reg_pred[pos_inds]
        box2d_reg_targets = box2d_reg_targets[pos_inds]

        centerness_pred = centerness_pred[pos_inds]

        # Compute centerness targets here using 2D regression targets of foreground pixels.
        centerness_targets = compute_ctrness_targets(box2d_reg_targets)

        # Denominator for all foreground losses.
        ctrness_targets_sum = centerness_targets.sum()
        loss_denom = max(reduce_sum(ctrness_targets_sum).item() / num_gpus, 1e-6)

but the fact is, if one card returns (pos_inds.numel()==0), the other card would be waiting in reduce_sum. so the code is just waiting and stucks on cuda's reduction .
finally, it was modified like this:

# used for reduce_sum(ctrness_targets_sum) below, as all reduce | reduce_sum is used !!!!
        ctrness_targets_sum = centerness_pred.sum() * 0. 
        _loss_box2d_reg = box2d_reg_pred.sum() * 0.
        loss_centerness = centerness_pred.sum() * 0.
        if pos_inds.numel() != 0:
            # NOTE: The rest of losses only consider foreground pixels.
            box2d_reg_pred = box2d_reg_pred[pos_inds]
            box2d_reg_targets = box2d_reg_targets[pos_inds]
            centerness_pred = centerness_pred[pos_inds]
        
            # Compute centerness targets here using 2D regression targets of foreground pixels.
            centerness_targets = compute_ctrness_targets(box2d_reg_targets)

            # Denominator for all foreground losses.
            ctrness_targets_sum = centerness_targets.sum()

            # ----------------------
            # 2D box regression loss
            # ----------------------
            _loss_box2d_reg = self.box2d_reg_loss_fn(box2d_reg_pred, box2d_reg_targets, centerness_targets)

            # ---------------
            # Centerness loss
            # ---------------
            loss_centerness = F.binary_cross_entropy_with_logits(
                centerness_pred, centerness_targets, reduction="sum"
            ) / num_pos_avg
        # else:
            # print(".ret", end=" ", flush=True)

        # print(".red", end=" ", flush=True)
        loss_denom = max(reduce_sum(ctrness_targets_sum).item() / num_gpus, 1e-6)
        loss_box2d_reg = _loss_box2d_reg / loss_denom

        loss_dict = {"loss_cls": loss_cls, "loss_box2d_reg": loss_box2d_reg, "loss_centerness": loss_centerness}
        extra_info = {"loss_denom": loss_denom, "centerness_targets": centerness_targets} if pos_inds.numel() != 0 else {}

        # print(".f", end=" ", flush=True)
        return loss_dict, extra_info
@ChonghaoSima
Copy link

Thank you for your code! I struggled with the nan loss for a long time……

@xingshuohan
Copy link

But still have the same issue...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants