-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huge temporary files when running gretel on very large contigs #31
Comments
Although this is not desired behaviour, it is not high priority as it is off-label use of gretel. |
Hi. Do you think this behavior will be resolved or managed at some point ? At least it should be specified, I almost crashed my computer trying Gretel just minutes ago. Moreover, it could be good to specify that VCF file has to be bziped, otherwise you got an uninformative pyVCF error. Thanks. |
Hi @jsgounot, thanks for the comment and I'm sorry about locking up all your storage! I don't intend to resolve this any time soon as Gretel is designed for local haplotyping on "short" regions (intuition here https://www.biorxiv.org/content/10.1101/2020.08.10.244848v1). I would love to get the time in future to improve the storage requirements for Hansel to help with this problem but I can't promise anything. Locking up your machine is totally undesired behaviour though, and I should try and catch this use-case with a warning (perhaps one that can be overrriden with On your second point I note the requirement is stressed in the README, but you are absolutely right in that it should raise an error on the CLI if it looks the wrong format. Thanks. (#33) |
Thanks for the reply. Well, I guess it was by far exceeding what we can call a short region, I will try with a real and shorter one (I used a way too large and random test bamfile with hundreds of kb). |
No problem - thanks for taking the time to report. Good luck! |
Hi @SamStudio8, |
Gretel is a proposal to the local MIH problem (defined in our manuscript
here https://academic.oup.com/bioinformatics/article/37/10/1360/5988481)
and is designed to find shorter regions of interest within metagenomic data.
In theory it could recover genomes but in practice those regions are
probably too large and will lead to very large intractable matrices. The
longest regions I've recovered with it are more on the order of kilobases
rather than megabases!
…On Tue, Oct 19, 2021 at 3:03 PM kangxiongbin ***@***.***> wrote:
Although Gretel is not designed for recovering large haplotypes, it should
at least try its best. Apparently very large contigs will cause Gretel to
write very large temporary files and lead to an OSError.
Hi @SamStudio8 <https://github.com/SamStudio8>,
I want to know how large haplotypes Gretel can recover? Can I use Gretel
to recover some bacterial genomes in metagenome data? The genome size of
these bacteria may be 2~7M.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#31 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIN6OVBKDDLJI2NX4IT5J3UHV3BPANCNFSM4KD4YJSQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Although Gretel is not designed for recovering large haplotypes, it should at least try its best. Apparently very large contigs will cause Gretel to write very large temporary files and lead to an
OSError
.First reported by @mherold1 in #30.
The text was updated successfully, but these errors were encountered: