There is no way to tell a segment linebreak from a real linebreak in `verbose_json` #2381

C0rn3j · 2024-08-23T21:13:09Z

Regular JSON:

[{"text": " Hello, this message is long enough to have a line break, so it will have a line break.\n"}]

verbose_json:

"text": " Hello, this message is long enough to have a line break, so\n it will have a line break.\n"

actual full output:

[{"task": "transcribe", "language": "english", "duration": 6.383999824523926, "text": " Hello, this message is long enough to have a line break, so\n it will have a line break.\n", "segments": [{"id": 0, "text": " Hello, this message is long enough to have a line break, so", "start": 0.0, "end": 4.3, "tokens": [2425, 11, 341, 3636, 307, 938, 1547, 281, 362, 257, 1622, 1821, 11, 370], "words": [{"word": " Hello", "start": 0.01, "end": 0.43, "t_dtw": -1, "probability": 0.9765891432762146}, {"word": ",", "start": 0.43, "end": 0.54, "t_dtw": -1, "probability": 0.9967133402824402}, {"word": " this", "start": 0.87, "end": 0.9400000000000001, "t_dtw": -1, "probability": 1.0}, {"word": " message", "start": 0.9400000000000001, "end": 1.55, "t_dtw": -1, "probability": 1.0}, {"word": " is", "start": 1.55, "end": 1.72, "t_dtw": -1, "probability": 1.0}, {"word": " long", "start": 1.72, "end": 2.0, "t_dtw": -1, "probability": 1.0}, {"word": " enough", "start": 2.09, "end": 2.58, "t_dtw": -1, "probability": 1.0}, {"word": " to", "start": 2.58, "end": 2.75, "t_dtw": -1, "probability": 1.0}, {"word": " have", "start": 2.75, "end": 3.09, "t_dtw": -1, "probability": 1.0}, {"word": " a", "start": 3.09, "end": 3.17, "t_dtw": -1, "probability": 1.0}, {"word": " line", "start": 3.17, "end": 3.5, "t_dtw": -1, "probability": 1.0}, {"word": " break", "start": 3.5, "end": 3.83, "t_dtw": -1, "probability": 1.0}, {"word": ",", "start": 4.0, "end": 4.11, "t_dtw": -1, "probability": 0.9934980869293213}, {"word": " so", "start": 4.11, "end": 4.25, "t_dtw": -1, "probability": 1.0}], "temperature": 0.20000000298023224, "avg_logprob": -0.0022336323745548725}, {"id": 1, "text": " it will have a line break.", "start": 4.3, "end": 6.38, "tokens": [309, 486, 362, 257, 1622, 1821, 13], "words": [{"word": " it", "start": 4.3, "end": 4.45, "t_dtw": -1, "probability": 1.0}, {"word": " will", "start": 4.45, "end": 4.79, "t_dtw": -1, "probability": 1.0}, {"word": " have", "start": 4.79, "end": 5.13, "t_dtw": -1, "probability": 1.0}, {"word": " a", "start": 5.13, "end": 5.21, "t_dtw": -1, "probability": 1.0}, {"word": " line", "start": 5.21, "end": 5.38, "t_dtw": -1, "probability": 1.0}, {"word": " break", "start": 5.54, "end": 5.98, "t_dtw": -1, "probability": 1.0}, {"word": ".", "start": 5.98, "end": 6.38, "t_dtw": -1, "probability": 1.0}], "temperature": 0.20000000298023224, "avg_logprob": 0.0}]}]

This becomes a problem when one is trying to sort out actual newlines for filtering out hallucinations:

"text": " Hello, this message is long enough to have a line break, so\n it will have a line break.\n Thank you.\n"

[{"task": "transcribe", "language": "english", "duration": 7.684000015258789, "text": " Hello, this message is long enough to have a line break, so\n it will have a line break.\n Thank you.\n", "segments": [{"id": 0, "text": " Hello, this message is long enough to have a line break, so", "start": 0.0, "end": 3.36, "tokens": [2425, 11, 341, 3636, 307, 938, 1547, 281, 362, 257, 1622, 1821, 11, 370], "words": [{"word": " Hello", "start": 0.32, "end": 0.32, "t_dtw": -1, "probability": 0.9946319460868835}, {"word": ",", "start": 0.35000000000000003, "end": 0.45, "t_dtw": -1, "probability": 0.8468424081802368}, {"word": " this", "start": 0.45, "end": 0.71, "t_dtw": -1, "probability": 1.0}, {"word": " message", "start": 0.71, "end": 0.88, "t_dtw": -1, "probability": 1.0}, {"word": " is", "start": 1.18, "end": 1.29, "t_dtw": -1, "probability": 1.0}, {"word": " long", "start": 1.29, "end": 1.54, "t_dtw": -1, "probability": 1.0}, {"word": " enough", "start": 1.56, "end": 1.94, "t_dtw": -1, "probability": 1.0}, {"word": " to", "start": 1.94, "end": 2.07, "t_dtw": -1, "probability": 1.0}, {"word": " have", "start": 2.07, "end": 2.2600000000000002, "t_dtw": -1, "probability": 1.0}, {"word": " a", "start": 2.37, "end": 2.39, "t_dtw": -1, "probability": 1.0}, {"word": " line", "start": 2.39, "end": 2.65, "t_dtw": -1, "probability": 1.0}, {"word": " break", "start": 2.65, "end": 2.97, "t_dtw": -1, "probability": 1.0}, {"word": ",", "start": 2.97, "end": 3.1, "t_dtw": -1, "probability": 0.9999771118164062}, {"word": " so", "start": 3.1, "end": 3.17, "t_dtw": -1, "probability": 1.0}], "temperature": 0.20000000298023224, "avg_logprob": -0.01144307479262352}, {"id": 1, "text": " it will have a line break.", "start": 3.36, "end": 4.8, "tokens": [309, 486, 362, 257, 1622, 1821, 13], "words": [{"word": " it", "start": 3.36, "end": 3.36, "t_dtw": -1, "probability": 1.0}, {"word": " will", "start": 3.48, "end": 3.62, "t_dtw": -1, "probability": 1.0}, {"word": " have", "start": 3.62, "end": 3.88, "t_dtw": -1, "probability": 1.0}, {"word": " a", "start": 3.88, "end": 3.94, "t_dtw": -1, "probability": 1.0}, {"word": " line", "start": 3.94, "end": 4.2, "t_dtw": -1, "probability": 1.0}, {"word": " break", "start": 4.2, "end": 4.5200000000000005, "t_dtw": -1, "probability": 1.0}, {"word": ".", "start": 4.5200000000000005, "end": 4.79, "t_dtw": -1, "probability": 1.0}], "temperature": 0.20000000298023224, "avg_logprob": 0.0}, {"id": 2, "text": " Thank you.", "start": 4.8, "end": 6.8, "tokens": [1044, 291, 13], "words": [{"word": " Thank", "start": 4.8, "end": 5.7, "t_dtw": -1, "probability": 0.9997387528419495}, {"word": " you", "start": 5.7, "end": 6.24, "t_dtw": -1, "probability": 1.0}, {"word": ".", "start": 6.24, "end": 6.78, "t_dtw": -1, "probability": 0.9999942779541016}], "temperature": 0.20000000298023224, "avg_logprob": -5.340576171875e-05}]}]

In this case, I can no longer split by newlines, as if I do so, I will end up with broken up sentences:

Hello, this is a test of verbose JSON. This message is
meant to be really long so it can line break, but it won't.
 Thank you.

and if I try to read the text of the segments instead, I will simply lose the newlines altogether:
Hello, this message is long enough to have a line break, so it will have a line break. Thank you.

I would prefer if the full text in verbose_json did not have the inserted newlines, and maybe on top of that, if the segments had a marker if they're a full line or if text follows after them, so the text can be available both in the full text and reconstructed correctly from the segmented texts.

Relevant code to test these as follows:

       files = {'file': (f, open(f, 'rb'))}
       data = {'temperature': '0.2', 'response_format': 'verbose_json'}

       try:
           response = requests.post(cpp_url, files=files, data=data)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

There is no way to tell a segment linebreak from a real linebreak in `verbose_json` #2381

There is no way to tell a segment linebreak from a real linebreak in `verbose_json` #2381

C0rn3j commented Aug 23, 2024 •

edited

Loading

There is no way to tell a segment linebreak from a real linebreak in verbose_json #2381

There is no way to tell a segment linebreak from a real linebreak in verbose_json #2381

Comments

C0rn3j commented Aug 23, 2024 • edited Loading

There is no way to tell a segment linebreak from a real linebreak in `verbose_json` #2381

There is no way to tell a segment linebreak from a real linebreak in `verbose_json` #2381

C0rn3j commented Aug 23, 2024 •

edited

Loading