Embedding - error when using sparse grads - what else needs to be enabled/configured to use this? #21200

mureva · 2023-05-12T13:17:33Z

mureva
May 12, 2023

As I understand it, I can use the Embedding symbol such that I can extract rows of a table (weights matrix) using a set of indices. I should also be able to enable the sparse-gradient option even though the indices and table are dense.

I made the simplest relevant test I could (I'm using the symbol API because I play in C++). Basically, we make a table and indices symbol. To make it more like an actual use-case, I'm running an optimiser to solve the table to 0, using one batch of random indices each iteration.

This works fine when we don't use sparse_grad on the Embedding, but fails if we do. The error message is: "Cannot use sparse_grad = 1, while stype of gradients w.r.t embedding weight is default"

I'm using 1.9.1

#include <iostream>
using std::cout;
using std::endl;

#include "mxnet-cpp/MxNetCpp.h"
namespace mx = mxnet::cpp;


int main( int argc, char *argv[] )
{

	auto ctx = mx::Context::gpu(0);
	
	
	
	// settings
	int numInds   = pow(2,4);
	int tableRows = pow(2,10);
	int tableCols = 4;
	
	
	// stupidly simple graph
	auto indices = mx::BlockGrad( mx::Symbol::Variable("inds") );
	auto table   = mx::Symbol::Variable("table");
	auto sample = mx::Embedding( indices, table, tableRows, tableCols, mx::EmbeddingDtype::kFloat32, false );
	
	
	// and a simple loss - solve table values to 0
	auto loss = mx::square( sample );
	
	
	// execution graph is...
	auto graph = mx::Symbol::Group( 
	                                {
	                                   mx::MakeLoss(loss),
	                                   mx::BlockGrad(mx::mean(loss))
	                                }
	                              );
	
	
	
	// state size of inds.
	std::map< std::string, mx::NDArray > args;
	args[ "inds" ] = mx::NDArray( mx::Shape( numInds ), ctx );
	
	
	// infer other sizes
	graph.InferArgsMap(ctx, &args, args);
	
	// bind...
	auto exec = graph.SimpleBind(ctx, args);
	
	// initialise table to some random values.
	auto ui = mx::Uniform( 100 );
	ui( "table", &args["table"] );
	
	
	// adam optimiser, we know that that is meant to support sparse grads.
	mx::Optimizer *opt = mx::OptimizerRegistry::Find("adam");
	opt->SetParam("lr", 0.1 );
	
	for( unsigned c = 0; c < 1000000; ++c )
	{
		// make a batch of indices....
		std::vector< float > inds(numInds);
		for( unsigned c = 0; c < numInds; ++c )
		{
			inds[c] = rand()%(tableRows-1);
		}
		auto indsND = mx::NDArray( inds, mx::Shape( numInds ), mx::Context::cpu() );
		indsND.CopyTo( &args["inds"] );
		
		
		exec->Forward(true);
		exec->Backward( {exec->outputs[0]} );
		
		for( unsigned ac = 0; ac < exec->arg_arrays.size(); ++ac )
		{
			opt->Update(ac, exec->arg_arrays[ac], exec->grad_arrays[ac]);
		}
		
		mx::NDArray errND    = exec->outputs[1].Copy( mx::Context::cpu() );
		
		mx::NDArray::WaitAll();
		
		
		
		
		cout << "err: " << *errND.GetData() << endl;
		
		
	}
	
	
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embedding - error when using sparse grads - what else needs to be enabled/configured to use this? #21200

{{title}}

Replies: 0 comments

Select a reply

Embedding - error when using sparse grads - what else needs to be enabled/configured to use this? #21200

mureva May 12, 2023

Replies: 0 comments

mureva
May 12, 2023