Problem creating seq files when running setup_clade_ap.py. #48

teagerv · 2022-05-21T17:21:31Z

Question Where is the -s parameter (SEQGZFOLDER) for setup_clade_ap.py meant to point?

Issue: I seem to be having a problem populating the gzip directory with sequences. The .table file is all populated from the ncbi db, but it's not finding the sequences. I'm not sure where the -s parameter is supposed to be pointing maybe? ~/ is where all the compressed ncbi files are from phlawd_db_maker.

snail@snailbuntu:~/PyPHLAWD/src$ python3 setup_clade_ap.py -t Architaenioglossa -b /media/snail/RED1/ncbi/inv.db -o ~/Desktop/ -s ~/ -l ~/Desktop/logfile
STARTING PYPHLAWD *。ヾ(｡&gt;ｖ&lt;｡)ﾉﾞ*。
MAKING TREE Architaenioglossa ٩(๑꒦ິȏ꒦ິ๑)۶
MAKING DIRS IN /home/snail/Desktop ヽ(*´∀`)ﾉﾞ
PROBLEM CREATING /home/snail/Desktop/Architaenioglossa_75116 (´；ω；`)
POPULATING DIRS /home/snail/Desktop ヽ/❀o ل͜ o\ﾉ
Traceback (most recent call last):
  File "/home/snail/PyPHLAWD/src/populate_dirs_first.py", line 47, in <module>
    mfid_in(tid,DB,dirl+dirr+"/"+orig+".fas",dirl+dirr+"/"+orig+".table",gzfileloc,True,limitlist = taxalist) 
  File "/home/snail/PyPHLAWD/src/get_subset_genbank.py", line 275, in make_files_with_id_internal
    idstoseq = get_seqs_from_gz(gzfileloc,fn,files_ids[fn])
  File "/home/snail/PyPHLAWD/src/get_subset_genbank.py", line 24, in get_seqs_from_gz
    fl = gzip.open(gzdir+"/"+filename,"r")
  File "/usr/lib/python3.8/gzip.py", line 58, in open
    binary_file = GzipFile(filename, gz_mode, compresslevel)
  File "/usr/lib/python3.8/gzip.py", line 173, in __init__
    fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/home/snail//seqs.Viviparus subpurpureus voucher USNM 1292588 histone 3 (H3) gene, partial cds.'
CREATED TEMPDIR_44273/
CLUSTERING SINGLE /home/snail/Desktop/Architaenioglossa_75116/Cyclophoroidea_75117/Megalomastomatidae_928797/Acroptychia_928777 ヽ(｡´･д･)ﾉ
Traceback (most recent call last):
  File "/home/snail/PyPHLAWD/src/cluster_tree.py", line 38, in <module>
    tablename = [x for x in files if ".table" in x][0]
IndexError: list index out of range
PYPHLAWD DONE ヽ(^□^｡)ノ
Total time (H:M:S): 0:00:00.638717 ٩(º౪º๑)۶
(⌐■_■)

Steps taken: Followed the steps on the Install page. Built phlawd_db_maker and all dependencies without errors. Built the database with phlawd_db_maker with no errors. Followed directions on the Runs page for a clustering analysis. Python version is 3.8.10

I know Python pretty well, so if I find a fix I'll make a pull request.

The text was updated successfully, but these errors were encountered:

hmarx · 2022-07-19T22:18:11Z

I'm having this same issue on Python 3.9.13. Have there been any updates?

teagerv · 2022-07-23T18:07:53Z

Solution: I figured it out, you have to make a file with the NCBI ids that you want to include if you're subsetting taxa, or it won't populate with any sequences (this is described in the 'Runs' doc). Don't know why I decided that wasn't relevant last time I looked at this...

There is a helper script if you already have a file with all the names, but I just used a quick BioPython script to pull them and it's running now:

from Bio import Entrez

def main():
    Entrez.email = ""
    db_type = 'nucleotide'
    search_terms = '(Architaenioglossa[Orgn])'
    output_file = '/home/snail/Desktop/architaenioglossa_taxalist.txt'

    returned_ids = esearch(search_terms, db_type)
    make_taxalist(returned_ids, output_file)

    return

def esearch(search_terms, db_type):
    
    handle = Entrez.esearch(db=db_type, term = search_terms, idtype="acc", retmax = )
    record = Entrez.read(handle)
    print('Search returned %s results.\n' %record["Count"])
    
    ids = record["IdList"]

    return ids

def make_taxalist(ids, output):
    
    with open(output, 'a') as fh:

        for i in ids:
            fh.write(f'{i}\n')

    return

if __name__ == '__main__':
    main()

Just set your search terms to the subset you want, set retmax to at least the number of taxa, and put in a random email (not sure if this is required).

YingyingYang2019 · 2022-11-14T05:47:04Z

Hi， I have the same problems! And I have provided the taxalist, still does work! Does anyone can help? Thanks!
The code and results are shown here:

yang@bdchxy-PowerEdge-M630-VRTX:~$ python application/PyPHLAWD-master/src/setup_clade_ap.py -t Fagales -b /storage/phlawd_db_maker-master/DB/pln.db -s /storage/phlawd_db_maker-master/DB -o application/PyPHLAWD-master/examples/clustered/ -l application/PyPHLAWD-master/examples/clustered/ -f ncbi_sp_ids_938.txt

STARTING PYPHLAWD (⌯꒪͒ ꌂ̇ ꒪͒)
LIMITING TO TAXA IN ncbi_sp_ids_938.txt
MAKING TREE Fagales (✧ ꒪◞౪◟꒪)
MAKING DIRS IN application/PyPHLAWD-master/examples/clustered ヾ(≧∪≦*)ノ〃
PROBLEM CREATING application/PyPHLAWD-master/examples/clustered/Fagales_3502 (゜´Д｀゜)
POPULATING DIRS application/PyPHLAWD-master/examples/clustered ₊·◟(˶╹̆ꇴ╹̆˵)◜‧･
Traceback (most recent call last):
File "/home/yang/application/PyPHLAWD-master/src/populate_dirs_first.py", line 47, in
mfid_in(tid,DB,dirl+dirr+"/"+orig+".fas",dirl+dirr+"/"+orig+".table",gzfileloc,True,limitlist = taxalist)
File "/home/yang/application/PyPHLAWD-master/src/get_subset_genbank.py", line 275, in make_files_with_id_internal
idstoseq = get_seqs_from_gz(gzfileloc,fn,files_ids[fn])
File "/home/yang/application/PyPHLAWD-master/src/get_subset_genbank.py", line 24, in get_seqs_from_gz
fl = gzip.open(gzdir+"/"+filename,"r")
File "/home/yang/anaconda3/envs/python3.8/lib/python3.8/gzip.py", line 58, in open
binary_file = GzipFile(filename, gz_mode, compresslevel)
File "/home/yang/anaconda3/envs/python3.8/lib/python3.8/gzip.py", line 173, in init
fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/storage/phlawd_db_maker-master/DB//seqs.Ticodendron incognitum chloroplast rbcL gene for ribulose-1,5-bisphosphate carboxylase large subunit, partial cds.'
CREATED TEMPDIR_69418/
CLUSTERING SINGLE application/PyPHLAWD-master/examples/clustered/Fagales_3502/Fagaceae_3503/Chrysolepis_21022 (ノ′Дヾ)
Traceback (most recent call last):
File "/home/yang/application/PyPHLAWD-master/src/cluster_tree.py", line 38, in
tablename = [x for x in files if ".table" in x][0]
IndexError: list index out of range
PYPHLAWD DONE ٩(๑˃́ꇴ˂̀๑)۶
Total time (H:M:S): 0:00:06.033473 ◦°˚(*❛‿❛)/˚°◦ (⌐■_■)

bheimbu · 2023-01-02T13:00:27Z

Hi and a happy new year,

I'm experiencing the same issue, any help would be highly appreciated?!

It would also be nice if the website (https://fephyfofum.github.io/PyPHLAWD/) could be updated as there is no more setup_clade.py (which is now called setup_clade_ap.py).

Cheers Bastian

YingyingYang2019 · 2023-01-03T09:04:46Z

Hi bheimubu! Happy new year!
For this question " I'm experiencing the same issue, any help would be highly appreciated?! It would also be nice if the website (https://fephyfofum.github.io/PyPHLAWD/) could be updated as there is no more setup_clade.py (which is now called setup_clade_ap.py).", mine works with the old version PyPhlawd. Therefore, if you have an old version, you could try. The new version doesn't work well this time. Good luck!

Yingyya

bheimbu · 2023-01-03T10:05:24Z

Hi @YingyingYang2019,

you make my day, it's working with the old version (downloaded as source code from here).

Cheers Bastian

harsimranpadam · 2023-11-23T21:53:41Z

Hi. I would just like to add that I was having the same trouble. If there is anything you figure out, please keep me updated. I also couldn't understand how to have the genus & sequence for this. If that is possible, please let me know.
The code is here, in which I am running trouble in:

python3 setup_clade_ap.py -t Laurales -b /Users/administrator_ge/Desktop/pln.db -s /Users/administrator_ge/Desktop/seq -o /Users/administrator_ge/Desktop/output -l /Users/administrator_ge/Desktop/logfile.md.gz -f /Users/administrator_ge/Desktop/taxalist.txt

STARTING PYPHLAWD ٩(⚙ȏ⚙)۶
LIMITING TO TAXA IN /Users/administrator_ge/Desktop/taxalist.txt
MAKING TREE Laurales ╰(✧∇✧)╯
MAKING DIRS IN /Users/administrator_ge/Desktop/output Σ(ノ°▽°)ノ
PROBLEM CREATING /Users/administrator_ge/Desktop/output/Laurales_3432 （；へ：）
POPULATING DIRS /Users/administrator_ge/Desktop/output Σ(*ﾉ´>ω<｡`)ﾉ
Traceback (most recent call last):
File "/Users/administrator_ge/apps/PyPHLAWD/src/populate_dirs_first.py", line 47, in
mfid_in(tid,DB,dirl+dirr+"/"+orig+".fas",dirl+dirr+"/"+orig+".table",gzfileloc,True,limitlist = taxalist)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/administrator_ge/apps/PyPHLAWD/src/get_subset_genbank.py", line 275, in make_files_with_id_internal
idstoseq = get_seqs_from_gz(gzfileloc,fn,files_ids[fn])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/administrator_ge/apps/PyPHLAWD/src/get_subset_genbank.py", line 24, in get_seqs_from_gz
fl = gzip.open(gzdir+"/"+filename,"r")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/gzip.py", line 58, in open
binary_file = GzipFile(filename, gz_mode, compresslevel)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/gzip.py", line 174, in init
fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/Users/administrator_ge/Desktop/seq//seqs.Hernandia nymphaeifolia trnL-trnF intergenic spacer region and trnF gene, partial sequence; chloroplast gene for chloroplast product.'
CREATED TEMPDIR_77128/
CLUSTERING SINGLE /Users/administrator_ge/Desktop/output/Laurales_3432/Hernandiaceae_22009/Gyrocarpus_13552 (ノдヽ)
Traceback (most recent call last):
File "/Users/administrator_ge/apps/PyPHLAWD/src/cluster_tree.py", line 38, in
tablename = [x for x in files if ".table" in x][0]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
PYPHLAWD DONE ୧༼✿ ͡◕ д ◕͡ ༽୨
Total time (H:M:S): 0:01:01.869942 ヽ(^o^)丿
(⌐■_■)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem creating seq files when running setup_clade_ap.py. #48

Problem creating seq files when running setup_clade_ap.py. #48

teagerv commented May 21, 2022 •

edited

Loading

hmarx commented Jul 19, 2022

teagerv commented Jul 23, 2022 •

edited

Loading

YingyingYang2019 commented Nov 14, 2022 •

edited

Loading

bheimbu commented Jan 2, 2023

YingyingYang2019 commented Jan 3, 2023

bheimbu commented Jan 3, 2023

harsimranpadam commented Nov 23, 2023

Problem creating seq files when running setup_clade_ap.py. #48

Problem creating seq files when running setup_clade_ap.py. #48

Comments

teagerv commented May 21, 2022 • edited Loading

hmarx commented Jul 19, 2022

teagerv commented Jul 23, 2022 • edited Loading

YingyingYang2019 commented Nov 14, 2022 • edited Loading

bheimbu commented Jan 2, 2023

YingyingYang2019 commented Jan 3, 2023

bheimbu commented Jan 3, 2023

harsimranpadam commented Nov 23, 2023

teagerv commented May 21, 2022 •

edited

Loading

teagerv commented Jul 23, 2022 •

edited

Loading

YingyingYang2019 commented Nov 14, 2022 •

edited

Loading