A script for inference using T5 in Java #17432

AayushSameerShah · 2023-09-06T10:06:26Z

AayushSameerShah
Sep 6, 2023

Hello, I am having a little trouble for building a script to perform the inference in Java using the T5 model. I am using T5 for the summarization use case.

1️⃣ Model

Luckily the t5-small already has ONNX models exported on the HF: t5-small-onnx, so I just have to use the encoder_model.onnx and decoder_model.onnx files.

2️⃣ The code I am using currently

public class t5_onnx {
    public static void main(String args[]) throws OrtException {
        // Load the model and create InferenceSession
        System.out.println("This is the model loading");
        String encoder = "encoder_model.onnx";
        String decoder = "decoder_model.onnx";
        OrtEnvironment env = OrtEnvironment.getEnvironment();
        OrtSession encoderSession = env.createSession(encoder);
        OrtSession decoderSession = env.createSession(decoder);

        String prompt = "yo!";
        String generatedText = generate(prompt, env, encoderSession, decoderSession);
    }

    static String generate(String prompt, OrtEnvironment env, OrtSession encoderSession, OrtSession decoderSession) throws OrtException {
        // Get the input and output names for the encoder and decoder
        String encoderInputName = encoderSession.getInputNames().iterator().next();
        String encoderOutputName = encoderSession.getOutputNames().iterator().next();
        String decoderInputName = decoderSession.getInputNames().iterator().next();
        String decoderOutputName = decoderSession.getOutputNames().iterator().next();

        // Encoding
        // INPUT IDS
        long[] inputData = new long[10];
        for (int i = 0; i < inputData.length; i++) {
            inputData[i] = (long) Math.random() * 10;
        }
        long[] input_ids_shape = new long[]{1, inputData.length};  // Shape of the input data

        // ATTENTION MASK FOR INPUT IDS
        long[] attention_mask = new long[inputData.length];
        Arrays.fill(attention_mask, 1);
        long[] attention_mask_shape = new long[]{1, inputData.length};  // Shape of the input data

        // OnnxTensor - INPUT IDS
        OnnxTensor inputTensor = OnnxTensor.createTensor(env, LongBuffer.wrap(inputData), input_ids_shape);
        OnnxTensor attentionTensor = OnnxTensor.createTensor(env, LongBuffer.wrap(attention_mask), attention_mask_shape);
        Map<String, OnnxTensor> encoder_inputs = Map.of("input_ids", inputTensor, "attention_mask", attentionTensor);

        OrtSession.Result encoderOutput = encoderSession.run(encoder_inputs);
        OnnxTensor encoderOutputTensor = (OnnxTensor) encoderOutput.get(encoderOutputName).get();

//        OrtSession.Result decoderOutput = decoderSession.run(inputMap);
//        OnnxTensor decoderOutputTensor = (OnnxTensor) decoderOutput.get(decoderOutputName).get();

        return "STATIC RETURN";
    }

As you can see:

I can successfully forward pass in the encoder, but I am not sure how to create the OnnxTensor for the Decoder as input
How to sample?

🙏🏻 A small request

Can you please share a correct way to go for this?

Because:

Currently, I am picking random tokens to forward: How can I tokenize the inputs?
How can I perform a successful forward pass and get the summary for my input?

Can I get a code guidance to do this? Because in CausalLM models, we have to go for a loop until the max tokens are reached, I am not sure what should I do for this seq to seq model and how the "doSample" and "top K" can be applied here.

Please share a script to do this... I would appreciate your help.
Thanks!

Answered by Craigacp

Sep 6, 2023

For the next step the input ids should be your start of sequence token (which you can get by loading in the sentencepiece protobuf and querying it for the start of sequence id), then whatever token you sampled from the logits. The encoder_hidden_states should be the same, those won't change. You'll want the version that accepts past_key_values and supply those from the outputs of the decoder.

It's a little messy as you don't want to reallocate the buffer every time. It might be better to allocate a single direct LongBuffer (with ByteBuffer.allocateDirect(seqLength*8).order(ByteOrder.nativeOrder()).asLongBuffer()) then set the position to 0 and increment the limit each time you wrap it int…

View full answer

Craigacp · 2023-09-06T13:14:32Z

Craigacp
Sep 6, 2023

You'll need an implementation of the T5 tokenizer, which looks like it's sentencepiece, so either sentencepiece-jni, or the wrapper in DJL will work. I'm working on a pure Java implementation of sentencepiece which will live in Tribuo but it's not finished yet.

The decoder expects the input ids for the decoder prompt and the hidden state output from the encoder to generate the first token, then you want to use decoder_with_past_model to generate tokens keeping the key & value cache. If you inspect the models with session.getInputInfo() & session.getOutputInfo() you can see the whole set of inputs and outputs.

Then once you get logits out you can run the sampling procedure yourself. Either greedy (by taking the largest logit) or performing nucleus sampling or any other sampling procedure. It's not baked into the model, so you need to do it by hand.

However there are tools which will bake in a beam search and other operations into an ONNX model (e.g. this for GPT-2 & T5) so you don't need to do the sampling in Java, but I've not used them.

2 replies

AayushSameerShah Sep 6, 2023
Author

Hello, here is what I did... and actually your answer strikes me when you said "using the first token as input" for the decoder to start generation - I totally forgot that the decoder needs a starter token and encoder's last hidden state.

😳 Here's what I did (won't bore you... ;)

First I took 10 random Long numbers as tokens (which then will be replaced by a tokenizer)
Then forwarded in the encoder which gave me the OnnxTensor of shape [1, 10, 512]
So that is my "encoder_hidden_states".
Then I have to get the input_ids for the starter token for the decoder since this is the first pass.
Generatlly it is the SOS token, since I don't know which one is this, I took some random Long value 42 an considered it as a starter token. (It has a shape of [1, 1])
Then performed the (first) forward pass in the decoder and that worked! (Thanks!) giving me the logits of shape [1, 1, 32128] which makes sense.

Now... this process has to be repeated either till max_tokens reached or I hit the EOS token... (which I will figure out with sentencepiece) but the thing is... I am not sure how to go forward after this.

I mean, I will select argmax token out of 32K possibilities, but after that how should I keep passing the tokens in the decoder loop? Should I keep appending the tokens?

Can you please help me out here? Let me give you a simple code that I use, so that you don't have to get verbose with it 🤗

// omitted code model loading...

// --- ENCODER --- ///

// Taking random 10 values (for Input IDs)
long[] encoder_input = new long[]{1, 33, 55, 33, 66, 86, 34, 678, 4443, 443};
long[] encoder_input_shape = new long[]{1, encoder_input.length};  // Shape of the input data

// ATTENTION MASK FOR INPUT IDS
long[] encoder_input_att_mask = new long[encoder_input.length];
Arrays.fill(encoder_input_att_mask, 1);
long[] encoder_input_att_mask_shape = new long[]{1, encoder_input.length};  // Shape of the input data

// OnnxTensor - INPUT IDS
OnnxTensor encoder_inputTensor = OnnxTensor.createTensor(env, LongBuffer.wrap(encoder_input), encoder_input_shape);
OnnxTensor encoder_attentionTensor = OnnxTensor.createTensor(env, LongBuffer.wrap(encoder_input_att_mask), encoder_input_att_mask_shape);
Map<String, OnnxTensor> encoder_inputs = Map.of("input_ids", encoder_inputTensor, "attention_mask", encoder_attentionTensor);

// Making a forward pass in the encoder
OrtSession.Result encoderOutput = encoderSession.run(encoder_inputs);
OnnxTensor encoderOutputTensor = (OnnxTensor) encoderOutput.get(encoderOutputName).get();

// Now we have `encoder_hidden_states` with the shape [1, 10, 512] //

// --- DECODER --- //

// Creating a single token (random number now) for the starting token SOS
OnnxTensor temp_single_tensor = OnnxTensor.createTensor(env, LongBuffer.wrap(new long[]{42}), new long[]{1,1});
Map<String, OnnxTensor> decoder_inputs = Map.of("input_ids", temp_single_tensor, "encoder_hidden_states", encoderOutputTensor);

// First forward pass in the decoder
OrtSession.Result decoderOutput = decoderSession.run(decoder_inputs);
OnnxTensor decoderOutputTensor = (OnnxTensor) decoderOutput.get(decoderOutputName).get();

// Now we have `output logits` with the shape [1, 1, 32128] //

Please help me out here to get the keep generating the new tokens, the confusion is what should be next time in the 'input_ids' and 'encoder_hidden_states' fields for the decoder.

Thanks 🙏🏻

Craigacp Sep 6, 2023

For the next step the input ids should be your start of sequence token (which you can get by loading in the sentencepiece protobuf and querying it for the start of sequence id), then whatever token you sampled from the logits. The encoder_hidden_states should be the same, those won't change. You'll want the version that accepts past_key_values and supply those from the outputs of the decoder.

It's a little messy as you don't want to reallocate the buffer every time. It might be better to allocate a single direct LongBuffer (with ByteBuffer.allocateDirect(seqLength*8).order(ByteOrder.nativeOrder()).asLongBuffer()) then set the position to 0 and increment the limit each time you wrap it into a new OnnxTensor after adding a fresh token.

Answer selected by AayushSameerShah

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A script for inference using T5 in Java #17432

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

A script for inference using T5 in Java #17432

AayushSameerShah Sep 6, 2023

1️⃣ Model

2️⃣ The code I am using currently

🙏🏻 A small request

Replies: 1 comment · 2 replies

Craigacp Sep 6, 2023

AayushSameerShah Sep 6, 2023 Author

😳 Here's what I did (won't bore you... ;)

Craigacp Sep 6, 2023

AayushSameerShah
Sep 6, 2023

Replies: 1 comment 2 replies

Craigacp
Sep 6, 2023

AayushSameerShah Sep 6, 2023
Author