-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data preprocessing #177
Comments
As it was mentioned , source is original text, target is corrected text |
For example, what I downloaded is the FCE data set, which contains M2 file and json file. In this file, there is no distinction between correct and incorrect sentences. How should I pass the data processing file.I would like to ask for your guidance, for which I greatly appreciate it |
Only the downloaded synthetic data set has correct and incorrect sentences, do we have to use the synthetic data to pass in? |
You can take a look at the M2scorer repository and specifically the edit_creator.py script. |
Sincerely thank you for your answer, I will try |
Yes, this is a specific format for training to save only input tokens and corresponding tags |
What do SOURCE and TARGER stand for in data preprocessing? Could you explain them? Thank you for your reply
The text was updated successfully, but these errors were encountered: