Stand-alone text generation and scoring scripts #2

binhvq · 2019-03-29T08:18:53Z

Hi lopuhin!
I using your code very nice in training and generate sample in training, please write code for load model, weights... and generate text from other python file.
Thanks you!

lopuhin · 2019-03-29T08:26:51Z

Thank you @binhvq 👍
This is something I would love to have. It's possible to do that by excluding some code from the training script, as it generates samples during training, but I would love to have this feature stand-alone.

But right now I'm moving to Tensorflow 2.0 to be able to support multi-GPU training easily, plan to get back to easy text scoring and generation once it's done.

lopuhin · 2019-03-29T09:05:48Z

@binhvq keeping it open as this is something to be done, I hope you don't mind the title change.

binhvq · 2019-03-29T09:07:49Z

Hi lopuhin!
I was write code for load model and generate text. It's very simple.

Thanks

lopuhin · 2019-04-18T19:30:22Z

Scoring is implemented in https://github.com/lopuhin/transformer-lm/blob/master/lm/inference.py

virgulvirgul · 2019-05-07T15:33:42Z

@binhvq @lopuhin Can you please share text generate script?

lopuhin · 2019-05-08T06:48:34Z

@virgulvirgul I don't have it, but if you have log probabilities for next token, you can take an exponent to get real probabilities, then select say top 40 of them, and then select next token using this probabilities (e.g. passing probabilities into p parameter of np.random.choice).

gooofy · 2019-07-16T09:33:27Z

implemented this approach, works great for my model so far:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from pathlib import Path
from lm import inference

import numpy as np

MODEL_PATH = Path('run-root')

TOKENS_TO_GENERATE = 32

TOP_K = 8

mw = inference.ModelWrapper.load(MODEL_PATH)

txt = "Die Forschung an der künstlichen Intelligenz"

tokens = mw.tokenize(txt)

for i in range(TOKENS_TO_GENERATE):

    # generate TOP_K potential next tokens
    ntk = mw.get_next_top_k(tokens, TOP_K)

    # convert log probs to real probs
    logprobs = np.array(list(map(lambda a: a[0], ntk)))
    probs = np.exp(logprobs) / np.exp(logprobs).sum()

    # pick next token randomly according to probs distribution
    next_token_n = np.random.choice(TOP_K, p=probs)
    next_token = ntk[next_token_n][1]
    print (next_token)

    tokens.append(next_token)

print(mw.sp_model.DecodePieces(tokens))

binhvq closed this as completed Mar 29, 2019

lopuhin changed the title ~~Test after trainning finish~~ Stand-alone text generation and scoring scripts Mar 29, 2019

lopuhin reopened this Mar 29, 2019

gsuszka added a commit to iambotHQ/transformer-lm that referenced this issue Aug 9, 2019

fix prev compability lopuhin#2

d5944b1

r23 mentioned this issue Apr 17, 2021

I would like a longer text result #31

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stand-alone text generation and scoring scripts #2

Stand-alone text generation and scoring scripts #2

binhvq commented Mar 29, 2019 •

edited

Loading

lopuhin commented Mar 29, 2019

lopuhin commented Mar 29, 2019

binhvq commented Mar 29, 2019

lopuhin commented Apr 18, 2019

virgulvirgul commented May 7, 2019

lopuhin commented May 8, 2019

gooofy commented Jul 16, 2019

Stand-alone text generation and scoring scripts #2

Stand-alone text generation and scoring scripts #2

Comments

binhvq commented Mar 29, 2019 • edited Loading

lopuhin commented Mar 29, 2019

lopuhin commented Mar 29, 2019

binhvq commented Mar 29, 2019

lopuhin commented Apr 18, 2019

virgulvirgul commented May 7, 2019

lopuhin commented May 8, 2019

gooofy commented Jul 16, 2019

binhvq commented Mar 29, 2019 •

edited

Loading