Skip to content
This repository has been archived by the owner on Oct 13, 2022. It is now read-only.

WIP: add compute-post. #210

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

csukuangfj
Copy link
Collaborator

@csukuangfj csukuangfj commented Jun 10, 2021

Usage:

$ snowfall net compute-post -m /ceph-fj/model-jit.pt -f exp/data/cuts_test-clean.json.gz -o exp

I find that there is one issue with the Torch Scripted module: We have to know the signature of the forward function
of the model as well as its subsampling factor.


Working on compute-ali and will submit them together.

@csukuangfj
Copy link
Collaborator Author

I just created a pull-request in Lhotse lhotse-speech/lhotse#319 to add
posteriors to the class Cut. The motivation is to reuse the serialization and dataset code from it.


Also, I find the alignment information contained in the supervision is too simple, see
https://github.com/lhotse-speech/lhotse/blob/ef7a037426f1b602a54f4d9ea43e711007e85719/lhotse/supervision.py#L24

    symbol: str    
    start: Seconds    
    duration: Seconds

Can we move the alignment class from snowfall to lhotse?

class Alignment:
# The key of the dict indicates the type of the alignment,
# e.g., ilabel, phone_label, etc.
#
# The value of the dict is the actual alignments.
# If the alignments are frame-wise and if the sampling rate
# is available, they can be converted to CTM format, like the
# one used in Lhotse
value: Dict[str, Union[List[int], List[str]]]

@csukuangfj
Copy link
Collaborator Author

The usage of compute-ali:

$ snowfall  ali compute-ali -l data/lang_nosp -p ./exp/cuts_post.json  --max-duration=500 -o exp

phone_ids_with_blank = [0] + phone_ids
ctc_topo = k2.arc_sort(build_ctc_topo(phone_ids_with_blank))

if not (lang_dir / 'HLG.pt').exists():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be refactored to a function and re-used across this script and decode scripts (and possibly others)

def load_or_compile_HLG(lang_dir: Path) -> k2.Fsa: ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Will refactor it and add options to enable/disable LM rescoring.

HLG = k2.Fsa.from_dict(d)

HLG = HLG.to(device)
HLG.aux_labels = k2.ragged.remove_values_eq(HLG.aux_labels, 0)
Copy link
Collaborator

@pzelasko pzelasko Jun 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this line doing? It looks like it's "sparsifying" the aux_labels (word ids) but how does HLG know which labels correspond to which aux_labels after that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just removes 0's from the word sequences; actually, it may not be necessary any more because we changed some defaults of what happens when you do remove_epsilons and convert linear to ragged attributes.

supervision_segments,
allow_truncate=sf - 1)

lattices = k2.intersect_dense_pruned(HLG, dense_fsa_vec, 20.0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pruning related arguments here could be function parameters

output_dir.mkdir(exist_ok=True)
storage_path = output_dir / 'posts'

posts_writer = lhotse.NumpyFilesWriter(storage_path=storage_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to create a lot of files, not sure if NumpyHdf5Writer is preferable

@pzelasko
Copy link
Collaborator

Also, I find the alignment information contained in the supervision is too simple

Can you describe the issue more? I'm not sure I understand what's missing there. We could move Snowfall's frame-wise alignment to Lhotse but I'm not sure how to make the two representations compatible with each other (the CTM-like description seems more general to me as you can cast it to frame-wise representation with different frame shifts).

@pzelasko
Copy link
Collaborator

BTW I wonder if we should support piping these programs together, Kaldi-style. Click easily allows doing that with file type arguments.

We could do that by writing/reading JSONL-serialized manifests in a streaming manner. Since most operations on CutSet refer to individual operations on Cut, this seems feasible without the need to re-write too much code. There is a function in Lhotse that tries to figure out the right manifest type from a dict, which can be used to parse individual lines (BTW @csukuangfj I just realized that you might need to extend that function to handle the posterior manifests in your Lhotse PR).

WDYT?

@pzelasko
Copy link
Collaborator

... there is also some code for line-by-line incremental JSONL writing in Lhotse that could be extended to support this.

@danpovey
Copy link
Contributor

This cool; I'm afraid I'm not following it in detail.
Just a reminder; this is more an "experimental direction" at this point. We'll have to learn from experience whether these kinds of command line utilites are actually a useful thing.

@pzelasko
Copy link
Collaborator

Fair enough. The idea is to allow sth like:

snowfall net compute-post <some-inputs-args..> - | snowfall net compute-ali - <some-more-args..>

but I just realized that with the current way things are done in Lhotse, we would have store the actual arrays/tensors on disk and just pass the manifests around, which might not be optimal. Maybe it's not relevant for now and we can see how to do that in the future, if needed at all.

@danpovey
Copy link
Contributor

BTW, I tend to think being able to do something at all tends to be more important than that thing being efficient-- premature optimization being the root of all evil etc., although I did plenty of it in Kaldi. I don't know what the optimal solution is here, I am afraid I have not been following this PR closely enough.

@pzelasko
Copy link
Collaborator

Agreed. But for the record, the full quote is actually:

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants