Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I would like a longer text result #31

Closed
r23 opened this issue Apr 17, 2021 · 2 comments
Closed

I would like a longer text result #31

r23 opened this issue Apr 17, 2021 · 2 comments

Comments

@r23
Copy link

r23 commented Apr 17, 2021

Hello,

Thank you very much for the wonderful project. It works great! I just have no question about the text length.

I use the following script
From #2 (comment)


#!/usr/bin/env python
# -*- coding: utf-8 -*-

from pathlib import Path
from lm import inference

import numpy as np

MODEL_PATH = Path('/..../pytorch_models/de345-root/')

TOKENS_TO_GENERATE = 38

TOP_K = 8

mw = inference.ModelWrapper.load(MODEL_PATH)

txt = "Die Forschung an der künstlichen Intelligenz"

tokens = mw.tokenize(txt)

for i in range(TOKENS_TO_GENERATE):

    # generate TOP_K potential next tokens
    ntk = mw.get_next_top_k(tokens, TOP_K)

    # convert log probs to real probs
    logprobs = np.array(list(map(lambda a: a[0], ntk)))
    probs = np.exp(logprobs) / np.exp(logprobs).sum()

    # pick next token randomly according to probs distribution
    next_token_n = np.random.choice(TOP_K, p=probs)
    next_token = ntk[next_token_n][1]
    # print (next_token)

    tokens.append(next_token)

print(mw.sp_model.DecodePieces(tokens))

The result

Die Forschung an der künstlichen Intelligenz, die sich mit der künstlichen Intelligenz befassen und die Entwicklung der künstlichen Intelligenz vorantreiben will, soll in der Zukunft fortgesetzt werden. Das berichtet Technology Review in seiner aktuellen Ausgabe (online zu bestellen). Das

Great

I'm afraid I'd like a longer text, comparable to
python3 src/interactive_conditional_samples.py
https://github.com/openai/gpt-2/blob/master/src/interactive_conditional_samples.py

GPT-2 generates there sample texts with 4 paragraphs with 2338 characters 406 words.

What do I have to change in the above script for a longer result text?

I look forward to hints and tips and thank you already now

Ralf

@r23
Copy link
Author

r23 commented Apr 17, 2021

Hello,

you only have to change the value in the line ...
for i in range(TOKENS_TO_GENERATE):

:lol:

sorry

@r23 r23 closed this as completed Apr 17, 2021
@lopuhin
Copy link
Owner

lopuhin commented Apr 18, 2021

Nice, glad you figured it out! Fixed formatting in the original post to make the script render.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants