Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to implement cdist #378

Open
KukumavMozolo opened this issue Jul 23, 2024 · 3 comments
Open

how to implement cdist #378

KukumavMozolo opened this issue Jul 23, 2024 · 3 comments

Comments

@KukumavMozolo
Copy link

KukumavMozolo commented Jul 23, 2024

Hi there, thanks a million for this library.
I am trying to figure out how compute distances between every row(or column) of two matrices e.g. like torch.cdist.

Here is one way:

def sparse_cdist(a: SparseTensor, b: SparseTensor):
        a_repeated = cat([a] * a.size(0), dim=0)
        b_repeated = cat(
            [cat([b[i, :]] * b.size(0), dim=0) for i in range(b.size(0))], dim=0
        )
        distances = sparse_distance(a_repeated, b_repeated)
        distances.requires_grad = False
    return distances.view((a.size(0), b.size(0)))

and another:

def sparse_cdist2(a: SparseTensor, b: SparseTensor):
    with torch.no_grad():
        distances = torch.zeros(
            (a.size(0), b.size(0)),
            device=a.device(),
            dtype=a.dtype(),
            requires_grad=False,
        )
        counter = 0
        for i in range(a.size(0)):
            for j in range(b.size(0)):
                distances[i, j] = sparse_distance(a[i, :], b[j, :])
    return distances

and another:

def sparse_cdist3(a: SparseTensor, b: SparseTensor):
    with torch.no_grad():
        distances = torch.zeros(
            (a.size(0), b.size(0)),
            device=a.device(),
            dtype=a.dtype(),
            requires_grad=False,
        )
        if a.size(0) <= b.size(0):
            for i in range(a.size(0)):
                idx = torch.tensor([i])
                idx = idx.expand(b.size(0))
                a_repeated = a.index_select(0,idx)
                distances[i, :] = sparse_distance(a_repeated, b)
        else:
            for j in range(b.size(0)):
                idx = torch.tensor([j])
                idx = idx.expand(a.size(0))
                b_repeated = b.index_select(0,idx)
                distances[:, j] = sparse_distance(a, b_repeated)
    return distances

where sparse_distance is defined as follows

def sparse_distance(a: SparseTensor, b: SparseTensor):
    c = a + b.mul_nnz(torch.tensor(-1).to(device), "coo")
    c = c * c
    c = reduction(c, dim=1, reduce="sum")
    return torch.sqrt(c + 0.000000001)

The first one has the disadvantage that it creates huge matrices and eats a lot of memory, while the second one doesn't benefit from gpu parallelism. The third seems to be okish but maybe there is an even better solution? Ideally one would only load matrices a and b onto the gpu and only reserve additional memory for the result.
Maybe somebody has an idea how to do that with what is currently possible with torch sparse?
Or would it be necessary to write a specific cuda kernel for that?
Any suggestions are very welcome.

@rusty1s
Copy link
Owner

rusty1s commented Jul 27, 2024

Interesting problem. The best way to implement this is indeed with a custom CUDA kernel, in which way you can drop the for-loop in option2/3 and avoid the memory blowup of option1. I think option3 is okay if you don't want to go into this rabbit hole of custom CUDA.

@KukumavMozolo
Copy link
Author

KukumavMozolo commented Aug 13, 2024

So i am currently busy implementing a cuda kernel that computes the euclidean distance between two matrices e.g. similar to torch.cdist.

Unfortunately i ran into a problem while implementing the backward path when computing the local gradients:

Some definitions:

Let $X \in R^{NxC}$ and $Y \in R^{MxC}$ then $cdist(X,Y) \in R^{MxN}$ and $cdist(X,Y)_{k,l} = \sqrt{\sum_j (X_{k,j} - Y_{l,j})^2}$

The derivative:
Here is the derivative of cdist with respect to some entry of X: $\frac{\nabla cdist(X,Y)_{k,l}}{\nabla X_{k,n}} = \frac{X_{k,n} - Y_{l,n}}{\sqrt{\sum_j (X_{k,j} - Y_{l,j})^2}}$

Then there can be 4 cases that need to be treated differently with sparse matrices(Ignoring that the euclidean distance is not differentiable at zero for now):
Case 1: $X_{k,n} \neq 0 \, and \, Y_{l,n} \neq 0 \,\,\rightarrow \frac{\nabla cdist(X,Y)_{k,l}}{\nabla X_{k,n}} = \frac{X_{k,n} - Y_{l,n}}{\sqrt{\sum_j (X_{k,j} - Y_{l,j})^2}}$
Case 2: $X_{k,n} \neq 0 \, and \, Y_{l,n} = 0 \,\,\rightarrow \frac{\nabla cdist(X,Y)_{k,l}}{\nabla X_{k,n}} = \frac{X_{k,n} }{\sqrt{\sum_j (X_{k,j} - Y_{l,j})^2}}$
Case 3: $X_{k,n} = 0 \, and \, Y_{l,n} = 0 \,\,\rightarrow \frac{\nabla cdist(X,Y)_{k,l}}{\nabla X_{k,n}} = 0$
Case 4: $X_{k,n} = 0 \, and \, Y_{l,n} \neq 0 \,\,\rightarrow \frac{\nabla cdist(X,Y)_{k,l}}{\nabla X_{k,n}} = \frac{-Y_{l,n} }{\sqrt{\sum_j (X_{k,j} - Y_{l,j})^2}}$

The problematic case is case 4!
Here, when computing the derivatives, the sparsity of X will change because where X was zero before the gradient would introduce a new non zero value. This requires to also change the size and values of rowPtr and colIndex together with the value array of the CSR Matrix.
Hence my question: How could this be solved?

Here is the autograd boilerplate i am currently using:

using torch::autograd::AutogradContext;
using torch::autograd::Variable;
using torch::autograd::variable_list;

class SparseCdist : public torch::autograd::Function<SparseCdist> {
public: static variable_list forward(
    AutogradContext *ctx,
    torch::Tensor a_rowptr_data,
    torch::Tensor a_col_data,
    torch::Tensor a_value_data,
    torch::Tensor b_rowptr_data,
    torch::Tensor b_col_data,
    torch::Tensor b_value_data,
    int dim_a,
    int dim_b
    ) {
    auto out = sparse_cdist(a_rowptr_data, a_col_data, a_value_data, b_rowptr_data, b_col_data, b_value_data, dim_a, dim_b);
    ctx->saved_data["dim_a"] = dim_a;
    ctx->saved_data["dim_b"] = dim_b;
    ctx->save_for_backward({a_rowptr_data, a_col_data, a_value_data, b_rowptr_data, b_col_data, b_value_data, out});
    return {out};
  }

  static variable_list backward(AutogradContext *ctx, variable_list grad_outs) {
    auto dim_a = ctx->saved_data["dim_a"].toInt();
    auto dim_b = ctx->saved_data["dim_b"].toInt();
    auto grad_out = grad_outs[0];
    auto saved = ctx->get_saved_variables();
    auto a_rowptr_data = saved[0], a_col_data = saved[1], a_value_data = saved[2], b_rowptr_data = saved[3],
         b_col_data = saved[4], b_value_data = saved[5], distance = saved[5];

    auto grad_value_a = Variable();
    if (torch::autograd::any_variable_requires_grad({a_value_data})){
      std::cout << "grad_outs is: " << grad_out;
      grad_value_a = sparse_bw_cdist(a_rowptr_data, a_col_data, a_value_data, b_rowptr_data, b_col_data, b_value_data, grad_out, distance, dim_a, dim_b);
    }
    
    auto grad_value_b = Variable();
    if (torch::autograd::any_variable_requires_grad({b_value_data})){
      std::cout << "grad_outs is: " << grad_out;
      grad_value_b = sparse_bw_cdist(b_rowptr_data, b_col_data, b_value_data,a_rowptr_data, a_col_data, a_value_data, grad_out, distance, dim_b, dim_a);
    }    
    return {Variable(), Variable(), grad_value_a,
            Variable(), Variable(), grad_value_b, Variable(), Variable()};
  }
};

What i would want to do is overwrite a_rowptr_data and a_col_data but it is unclear to me how this can be done using this AutogradContext concept.
Help would be much appreciated, also if this starts to work would there be interest to integrate cdist it into this project?

@KukumavMozolo
Copy link
Author

Maybe one workaround is to explicitly introduce zeros to X where X is implicitly is zero and Y is none zero?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants