-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Similar task has some problem #3
Comments
@babyhuzi111 one possibility may be the tokenizer, if you can share with me your training chinese text file. i can try it with my models and let you know. |
The data like this, each line has two sentences, left is raw sentence, right is the target sentence. Thank you.
…------------------ 原始邮件 ------------------
发件人: "Xianshun Chen"<[email protected]>;
发送时间: 2018年4月17日(星期二) 上午8:22
收件人: "chen0040/keras-text-summarization"<[email protected]>;
抄送: "杨虎"<[email protected]>; "Mention"<[email protected]>;
主题: Re: [chen0040/keras-text-summarization] Similar task has some problem(#3)
@babyhuzi111 one possibility may be the tokenizer, if you can share with me your training chinese text file. i can try it with my models and let you know.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
从QQ邮箱发来的超大附件
data_clean.raw (133.01M, 2018年05月17日 09:43 到期)进入下载页面:http://mail.qq.com/cgi-bin/ftnExs_download?k=276133357cd5cbc755df400a4530561f0157040d510253064900550c061d500500041e0d0254061d54525153065150005252520c633e64540515526a005c01510a4f4154143059&t=exs_ftn_download&code=da35c0d0
|
I forgot some line has only one sentence, just let it go.
…------------------ 原始邮件 ------------------
发件人: "Xianshun Chen"<[email protected]>;
发送时间: 2018年4月17日(星期二) 上午8:22
收件人: "chen0040/keras-text-summarization"<[email protected]>;
抄送: "杨虎"<[email protected]>; "Mention"<[email protected]>;
主题: Re: [chen0040/keras-text-summarization] Similar task has some problem(#3)
@babyhuzi111 one possibility may be the tokenizer, if you can share with me your training chinese text file. i can try it with my models and let you know.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@babyhuzi111 测试结果我的也是得到重复性的某个词,这个问题你解决了吗 |
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi,
I am doing some task,that is very similar to your task, expect its inputs and outputs are chinese . The frame is also seq2seq. I write code as same as yours. When I run the code, the train accuracy get very high, but when I test it, the decode output is always "的的的的的的的的的的的的的的" or "哪哪哪哪哪哪哪哪哪哪哪哪" or "PADPADPADPADPADPADPADPADPADPADPAD". I have no idea about it.
The model code like this:
encoder model
embedding_size = 50
encoder_inputs = Input(shape=(None,))
en_x = Embedding(vocab_size, embedding_size)(encoder_inputs)
encoder = LSTM(50, return_state=True)
encoder_outputs, state_h, state_c = encoder(en_x)
encoder_states = [state_h, state_c]
decoder model
decoder_inputs = Input(shape=(None,))
dex = Embedding(vocab_size, embedding_size)
final_dex = dex(decoder_inputs)
decoder_lstm = LSTM(50, return_sequences=True, return_state=True)
decoder_outputs,, = decoder_lstm(final_dex, initial_state=encoder_states)
decoder_dense = Dense(vocab_size, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
and the batch data is gerenate like this:
def mygenerator(batch_size):
max_batch_index = len(trainx) // batch_size
i = 0
while 1:
batch_trainy_categ = to_categorical(trainy[ibatch_size:(i+1)batch_size].reshape(batch_sizemax_sentB_len),
num_classes=vocab_size)
batch_trainy_categ = np.array(batch_trainy_categ).reshape(-1, max_sentB_len, vocab_size)
batch_trainx = trainx[ibatch_size:(i+1)batch_size]
batch_trainy = trainy[ibatch_size:(i+1)*batch_size]
i += 1
i = i % max_batch_index
# print('batch data:')
# print(batch_trainx[:1])
# print(batch_trainy[:1])
# print(batch_trainy_categ[:1])
yield ([batch_trainx, batch_trainy], batch_trainy_categ)
model.fit_generator(mygenerator(128), steps_per_epoch=len(trainx) // 128, epochs=1, verbose=1,
validation_data=([testx, testy], testy_catey))
can you give me some advice about debugging or reason? thank you.
The text was updated successfully, but these errors were encountered: