Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hyb analyse fails to run #8

Open
SreeniEadara opened this issue Aug 16, 2022 · 17 comments
Open

hyb analyse fails to run #8

SreeniEadara opened this issue Aug 16, 2022 · 17 comments

Comments

@SreeniEadara
Copy link

SreeniEadara commented Aug 16, 2022

Hi,

I'm trying to run hyb on the example data using Mac OSX on a 2018 MacBook Air.

I've installed all dependencies besides flexbar 2.5 using Conda (edit: flexbar 2.5 was installed manually). My list of installed packages is as follows:

blast                     2.6.0               boost1.64_2    bioconda
blat                      35                            1    bioconda
bowtie2                   2.4.5            py39he245752_2    bioconda
bzip2                     1.0.8                h0d85af4_4    conda-forge
ca-certificates           2022.6.15            h033912b_0    conda-forge
certifi                   2022.6.15        py39h6e9494a_0    conda-forge
expat                     2.4.8                h96cf925_0    conda-forge
fastqc                    0.11.9               hdfd78af_1    bioconda
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
fontconfig                2.14.0               h676cef8_0    conda-forge
freetype                  2.12.1               h3f81eb7_0    conda-forge
libcxx                    14.0.6               hce7ea42_0    conda-forge
libffi                    3.4.2                h0d85af4_5    conda-forge
libpng                    1.6.37               h5481273_4    conda-forge
libsqlite                 3.39.2               h5a3d3bf_1    conda-forge
libzlib                   1.2.12               hfe4f2af_2    conda-forge
lz4-c                     1.9.3                he49afe7_1    conda-forge
ncurses                   6.3                  h96cf925_1    conda-forge
oligoarrayaux             3.8                  h770b8ee_0    bioconda
openjdk                   17.0.3               hfa58983_1    conda-forge
openssl                   3.0.5                hb81d4ab_1    conda-forge
perl                      5.32.1          2_h0d85af4_perl5    conda-forge
pip                       22.2.2             pyhd8ed1ab_0    conda-forge
python                    3.9.13          hf8d34f4_0_cpython    conda-forge
python_abi                3.9                      2_cp39    conda-forge
readline                  8.1.2                h3899abd_0    conda-forge
setuptools                65.0.1           py39h6e9494a_0    conda-forge
sqlite                    3.39.2               hd9f0692_1    conda-forge
tbb                       2020.2               h940c156_4    conda-forge
tk                        8.6.12               h5dbffcc_0    conda-forge
tzdata                    2022c                h191b570_0    conda-forge
viennarna                 2.1.9                         0    bioconda
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
xz                        5.2.6                h775f41a_0    conda-forge
zlib                      1.2.12               hfe4f2af_2    conda-forge
zstd                      1.5.2                hb844be6_4    conda-forge

I've also configured my Conda environment to set a few useful paths on activation as follows. The paths are all unset prior to deactivation:

export DYLD_LIBRARY_PATH=$CONDA_PREFIX/flexbin/
export HYB_DB=$CONDA_PREFIX/data/db
export HYB_HOME=$CONDA_PREFIX

I've also configured sra-tools, and changed the shebang line on the top of sam2blast to #!/usr/bin/env python3 so it can work on MacOS.
All of the contents of hyb's source, including the scripts in bin, the entry in man, data, and lib have been moved to the corresponding folders in the path of the Conda environment so that they can easily be accessed upon activation.
I also used make all to make the included hOH7 database.

I am able to run all steps of the pipeline, including preprocess, check, and detect without error. Upon trying to run hyb analyse, however, I am met with the following output and am not sure what is causing this problem:

hyb: Tue Aug 16 15:47:38 EDT 2022
analyse
in=testdata.txt id=testdata format=fastq code= miss=0 qc=flexbar qual=33 link=TGGAATTCTCGGGTGCCAAGGC min=4 len=17 trim=0 filt=0 pc=0 align=bowtie2 db=hOH7 word=11 eval=0.1 ref= anti=0 type=all fold=UNAfold pref=mim hval=0.1 hmax=10 gmax=4

/usr/local/Caskroom/miniconda/base/envs/hyb/bin/hyb2fasta_bits_allRNAs.awk /usr/local/Caskroom/miniconda/base/envs/hyb/data/db/hOH7.tab testdata_comp_hOH7_hybrids_ua.hyb
/usr/local/Caskroom/miniconda/base/envs/hyb/bin/hybrid-min testdata_comp_hOH7_hybrids_ua.bit_1.fasta testdata_comp_hOH7_hybrids_ua.bit_2.fasta 2>&1 > /dev/null
testdata_comp_hOH7_hybrids_ua.bit_1.fasta: No such file or directory
make: *** [testdata_comp_hOH7_hybrids_ua.bit_1.fasta-comp_hOH7_hybrids_ua.bit_2.fasta.ct] Error 1

Could you please help me understand what is causing this problem?

Thanks!

Sincerely,
Sreenivas

@tony-travis
Copy link
Collaborator

tony-travis commented Aug 17, 2022 via email

@SreeniEadara
Copy link
Author

SreeniEadara commented Aug 17, 2022

Hi Tony,

Awesome! Happy to hear from you.
I can definitely open a pull request containing the Conda env setup once it has been validated.

I think attachments from email replies may not make it onto GitHub Issues, would you be able to add it in a development branch on this repository?

Thanks for your help!

Sincerely,
Sreenivas

edit: removed my email, don't want it to be found by bots :)

@tony-travis
Copy link
Collaborator

tony-travis commented Aug 17, 2022 via email

@SreeniEadara
Copy link
Author

SreeniEadara commented Aug 20, 2022

Hi Tony,

I was able to run hyb analyse on testdata.txt and didn't encounter any errors! Would you be able to send me the expected output so I can compare it against what I have?

I ended up using WSL to install Ubuntu 20.04 LTS and followed all installation steps - further debugging didn't work on macOS when using the Conda environment.
One thing to note is that I had to install manually. I used git to clone the repository, and upon running INSTALL, it found the existing files and cleared them, but subsequently failed to get the files for hyb.

Upon running the following on my data I encountered the following:

sreenieadara@DESKTOP:/mnt/d/hyb/SRR959751$ hyb preprocess qc=flexbar trim=30 len=17 min=4 check detect align=bowtie2 word=11 analyse fold=vienna in=SRR959751.fastq.gz db=hOH7
hyb: Fri Aug 19 18:31:41 PDT 2022
preprocess check detect analyse
in=SRR959751.fastq.gz id=SRR959751 format=fastq code= miss=0 qc=flexbar qual=33 link=TGGAATTCTCGGGTGCCAAGGC min=4 len=17 trim=30 filt=0 pc=0 align=bowtie2 db=hOH7 word=11 eval=0.1 ref= anti=0 type=all fold=vienna pref=mim hval=0.1 hmax=10 gmax=4
gunzip -c SRR959751.fastq.gz > SRR959751.fastq
/usr/bin/flexbar -t SRR959751_clipped_qf -r SRR959751.fastq -q 30 -as TGGAATTCTCGGGTGCCAAGGC -ao 4 -u 3 -m 17 -n 1
flexbar: the given value '30' is not in the list of allowed values [TAIL, WIN, BWA]

Available on github.com/seqan/flexbar

make: *** [/home/sreenieadara/hyb/bin/hyb:1029: SRR959751_clipped_qf.fastq] Error 1

It looks like the -q parameter may not be the correct one to use in this case. I've changed it to -qt within bin/hyb and it is currently running. I will see if this works!

@SreeniEadara
Copy link
Author

Hi Tony,

Unfortunately, the analysis is frozen at one step (over 20 hours without a change). Could you please let me know if this is expected or unexpected behavior?
I am running the following on a fastq.gz of SRR959751 received via fastq-dump.

This is running in Ubuntu 20.04 LTS.

sreenieadara@DESKTOP:/mnt/d/hyb/SRR959751$ hyb preprocess qc=flexbar trim=30 len=17 min=4 check detect align=bowtie2 word=11 analyse fold=vienna in=SRR959751.fastq.gz db=hOH7
hyb: Fri Aug 19 19:07:52 PDT 2022
preprocess check detect analyse
in=SRR959751.fastq.gz id=SRR959751 format=fastq code= miss=0 qc=flexbar qual=33 link=TGGAATTCTCGGGTGCCAAGGC min=4 len=17 trim=30 filt=0 pc=0 align=bowtie2 db=hOH7 word=11 eval=0.1 ref= anti=0 type=all fold=vienna pref=mim hval=0.1 hmax=10 gmax=4
/usr/bin/flexbar -t SRR959751_clipped_qf -r SRR959751.fastq -qt 30 -as TGGAATTCTCGGGTGCCAAGGC -ao 4 -u 3 -m 17 -n 1
/home/sreenieadara/hyb/bin/solexa2fasta.awk SRR959751_clipped_qf.fastq | /home/sreenieadara/hyb/bin/fasta2tab.awk > SRR959751_clipped_qf.tab
/home/sreenieadara/hyb/bin/make_comp_fasta.pl SRR959751_clipped_qf.tab > SRR959751_comp.fasta
/usr/bin/fastqc -q -k 8 --noextract --contaminants /home/sreenieadara/hyb/data/fastqc/Contaminants SRR959751_clipped_qf.fastq
awk '{if(NR%4==2) print length($1)}' SRR959751_clipped_qf.fastq | /home/sreenieadara/hyb/bin/histogram.pl -n > SRR959751_clipped_qf.hist

Thanks!

Sincerely,
Sreenivas

@tony-travis
Copy link
Collaborator

tony-travis commented Aug 22, 2022 via email

@SreeniEadara
Copy link
Author

Hi Tony,

Looks like the bug fix for Vienna worked! I have the vienna package as well as the python3, python, and perl bindings installed. Not sure if those were necessary or not.

Here are the first 10 lines of the result file SRR959751_comp_hOH7_hybrids_ua_dg.hyb:

1215_2879	AAGAGGGACGGCCGGGGGCATTCGTATTGCTCCCTGGTGGTCTAGTGGTTAGGAT	-16.60	ENSG000000XXXXX_NR003286-2_RN18S1_rRNA	1	33	919	951	3.4e-08	ENSG_ENST_chr1-trna116-GluCTC_tRNA	31	55	1	25	2e-05	
1577_2209	AAGAGGGACGGCCGGGGGCTATTGCACTTGTCCCGGCCTGT	-17.68	ENSG000000XXXXX_NR003286-2_RN18S1_rRNA	1	19	919	937	0.023	MIMAT0000092_MirBase_miR-92a_microRNA	20	41	1	22	0.0005	
2046_1671	AGAGGGACAAGTGGCGTTCTATTGCACTTGTCCCGGCCTGT	-18.99	ENSG000000XXXXX_NR003286-2_RN18S1_rRNA	1	19	1446	1464	0.023	MIMAT0000092_MirBase_miR-92a_microRNA	20	41	1	22	0.0005	
3050_1082	ACTGCATTATGAGCACTTAAAGTTAAAGTGCTTATAGTGCAGGTAG	-24.37	MIMAT0004493_MirBase_miR-20a*_microRNA	1	22	1	22	0.00066	MIMAT0000075_MirBase_miR-20a_microRNA	24	46	1	23	0.00018	
3068_1076	GGAAGATAACTATACAACCTACTGCCTTCCTGAGGTAGTAGGTTGTGTGGTTTCA	-30.53	MIMAT0004482_MirBase_let-7b*_microRNA	10	30	1	21	0.0034	MIMAT0000063_MirBase_let-7b_microRNA	31	52	1	22	0.00094	
3532_922	AAGAGGGACGGCCGGGGGCATTCGTATTGCTCCCTGTGGTCTAGTGGTTAGGATT	-9.76	ENSG000000XXXXX_NR003286-2_RN18S1_rRNA	1	33	919	951	3.4e-08	ENSG_ENST_chr1-trna64-GluTTC_tRNA	32	53	1	22	0.00094	
3746_872	GCCCCTGGGCCTATCCTAGAACTTTGGGTTCCGGGGGGAGTATGGTTGC	-17.15	MIMAT0000760_MirBase_miR-331-3p_microRNA	1	21	1	21	0.0027	ENSG000000XXXXX_NR003286-2_RN18S1_rRNA	22	49	1153	1180	3.5e-07	
4016_814	AGAGGGACAAGTGGCGTTTATTGCACTTGTCCCGGCCTGT	-18.99	ENSG000000XXXXX_NR003286-2_RN18S1_rRNA	1	18	1446	1463	0.079	MIMAT0000092_MirBase_miR-92a_microRNA	19	40	1	22	0.00047	
4521_718	CGGAAGATAACTATACAACCTACTGCCTTCCTGAGGTAGTAGGTTGTGTGGTTTC	-30.53	MIMAT0004482_MirBase_let-7b*_microRNA	11	31	1	21	0.0034	MIMAT0000063_MirBase_let-7b_microRNA	32	53	1	22	0.00094	
4766_680	TCCCTGAGACCCTAACTTGTGAGTGATGGGGATCGGGGATTGC	-19.82	MIMAT0000423_MirBase_miR-125b_microRNA	1	22	1	22	0.00056	ENSG000000XXXXX_NR003286-2_RN18S1_rRNA	23	43	1598	1618	0.002	

How does this compare to the result you received?

Also, one additional question - say a miRNA is listed first, and an mRNA is listed second in a single row.
Does that mean that the chimera was a miRNA-first chimera, or are they ordered differently (i.e. alphabetical order)?

@gkudla
Copy link
Owner

gkudla commented Aug 28, 2022 via email

@SreeniEadara
Copy link
Author

SreeniEadara commented Aug 30, 2022

Hi Greg,

Awesome! Glad to hear that the results file is similar, and good to know that the list order indicates order in the chimera.

I'm a bit confused about how to make the required databases to analyze using a different reference genome. I am able to rename target filenames in the Makefile and use 'make all' to make hg38.fasta.gz (human genome) as well as the provided hOH7-microRNA.fasta.gz, but the result after running hyb produces a result containing only hits between genomic loci.

Renaming hOH7-microRNA.fasta.gz to hg38-microRNA.fasta.gz, modifying the Makefile accordingly, and remaking the database produced the same result.

How would you recommend I set up the files before building the database? I am also trying to rename both files to start with hOH7 and I will see how it goes. Is there something here that I might be missing?

@gkudla
Copy link
Owner

gkudla commented Aug 31, 2022 via email

@SreeniEadara
Copy link
Author

Hi Greg,

Thanks!
I think I understand the process for building the databases a bit better now.

Also wanted to add that in Ubuntu 20.04 LTS within Windows Subsystem for Linux, the following line worked a bit better for BLAT installation within the INSTALL script:
make MACHTYPE=$MACHTYPE

@SreeniEadara
Copy link
Author

Hi Greg,

I'm running into issues trying to use INSTALL on new Ubuntu 20.04 installations. I am able to get Hyb to work, but this involved building BLAT from source and making the default databases using 'make all'.
I believe this is because rsync isn't a default package on 20.04 LTS, so after the directory is cleared the latest source isn't received. A modified INSTALL script worked better:

#!/bin/bash
#@(#)INSTALL  2022-08-22  A.J.Travis

#
# Install "hyb" under Ubuntu 20.04 LTS
#

# GitHub repository
export GITHUB=https://github.com/gkudla/hyb

# installation directory
if [ $USER == root ]; then
    export HYB_HOME=/usr/local/hyb
else
    export HYB_HOME=${HOME}/hyb
fi

# set PATH for "hyb" test run
export PATH=${HYB_HOME}/bin:$PATH
echo "Please add ${HYB_HOME}/bin to your PATH after running the INSTALL script"
echo "(press any key to continue...)"
read -n 1 key; echo

# download directory must be writeable
dir=$(pwd)
if [ ! -w ${dir} ]; then
    echo "$0: can't write to ${dir}"
    exit 1
fi

# check if "hyb" is already installed
if [ -e ${HYB_HOME} ]; then
    echo "$0: ${HYB_HOME} already exists - replace it?"
    read -n 1 key; echo
    if [ "$key" == "y" ]; then
        echo "alright . . . "
    else
        echo "$0: installation cancelled"
        exit 1
    fi
fi

# libpng-dev (required to compile BLAT)
if [ ! -r "/usr/include/libpng16/png.h" ]; then
    if [ $USER == root ]; then
        apt install libpng-dev
    else
        echo "$0: install libpng-dev to test hyb"
        exit 1
    fi
fi

# download and compile BLAT
wget -nc http://users.soe.ucsc.edu/~kent/src/blatSrc35.zip
unzip blatSrc35.zip
export MACHTYPE=$(arch)
mkdir -p ${HOME}/bin/${MACHTYPE}
cd blatSrc
make MACHTYPE=$MACHTYPE

# move to BLAT installation directory
if [ $USER == root ]; then
    mv -i ${HOME}/bin/${MACHTYPE}/* /usr/local/bin/
else
    export PATH=${HOME}/bin/${MACHTYPE}:${PATH}
fi

# build databases
cd ${HYB_HOME}/data/db
make

# Flexbar
if [ ! -x "$(which flexbar)" ]; then
    if [ $USER == root ]; then
        apt install flexbar
    else
        echo "$0: install flexbar to test hyb"
        exit 1
    fi
fi

# bowtie2
if [ ! -x "$(which bowtie2)" ]; then
    if [ $USER == root ]; then
        apt install bowtie2
    else
        echo "$0: install bowtie2 to test hyb"
        exit 1
    fi
fi

# UNAfold
if [ ! -x "$(which hybrid-min)" ]; then
    if [ $USER == root ]; then
        wget http://www.unafold.org/download/oligoarrayaux-3.8.tar.bz2
	tar xf oligoarrayaux-3.8.tar.bz2
	cd oligoarrayaux-3.8
	make install
    else
        echo "$0: install bio-linux-oligoarrayaux to test hyb"
        exit 1
    fi
fi

# Vienna RNA
if [ ! -x "$(which RNAfold)" ]; then
    if [ $USER == root ]; then
        wget https://www.tbi.univie.ac.at/RNA/download/ubuntu/ubuntu_20_04/viennarna_2.5.1-1_amd64.deb
        gdebi viennarna_2.5.1-1_amd64.deb
    else
        echo "$0: install Vienna RNA to test hyb"
        exit 1
    fi
fi

# test
cd ${HYB_HOME}/data/fastq
hyb analyse in=testdata.txt db=hOH7

# finished
exit 0

It seems there may be a decent number of packages that have to be installed outside of the INSTALL script, including rsync, wget, make, and unzip. The steps I followed during installation are here:

Ubuntu 20.04 LTS can be installed on Windows with the following command in Powershell (while running Powershell as an administrator):

wsl --install -d Ubuntu-20.04

Upon restart, an empty Linux shell will appear. You may need to press Enter to continue the installation.
Hyb was installed as follows on Ubuntu 20.04 LTS.
First, hyb source is cloned from GitHub:

git clone https://github.com/gkudla/hyb.git

Dependencies available on apt are installed:

sudo apt update
sudo apt install wget libpng-dev flexbar bowtie2 make gcc unzip ncbi-blast+ fastqc gdebi-core rnahybrid rsync

Package oligoarrayaux version 3.8 is installed as follows:

wget http://www.unafold.org/download/oligoarrayaux-3.8.tar.gz
gunzip oligoarrayaux-3.8.tar.gz
tar -xvf oligoarrayaux.tar
cd oligoarrayaux-3.8
./configure
make
make check
sudo make install
make clean

The SRA (Sequence Read Archive) tools must be downloaded and unzipped:

wget --output-document sratoolkit.tar.gz https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-ubuntu64.tar.gz
tar -vxzf sratoolkit.tar.gz

In order for the SRA tools to work, they must be added to the PATH. The PATH may reset with every new session.

export PATH=$PATH:$PWD/sratoolkit.3.0.0-ubuntu64/bin

The SRA tools must then be configured. This only needs to be performed once. Running the following command will launch the interactive SRA tools configuration utility. Under the “Cache” tab, the directory for local file caching should be set to an empty directory.

vdb-config -i

The viennaRNA package should then be installed:

wget https://www.tbi.univie.ac.at/RNA/download/ubuntu/ubuntu_20_04/viennarna_2.5.1-1_amd64.deb
wget https://www.tbi.univie.ac.at/RNA/download/ubuntu/ubuntu_20_04/python3-rna_2.5.1-1_amd64.deb
wget https://www.tbi.univie.ac.at/RNA/download/ubuntu/ubuntu_20_04/perl-rna_2.5.1-1_amd64.deb
sudo gdebi viennarna_2.5.1-1_amd64.deb
sudo gdebi python3-rna_2.5.1-1_amd64.deb
sudo gdebi perl-rna_2.5.1-1_amd64.deb

BLAT is installed as follows:

wget -nc http://users.soe.ucsc.edu/~kent/src/blatSrc35.zip
unzip blatSrc35.zip
export MACHTYPE=$(arch)
mkdir -p ${HOME}/bin/${MACHTYPE}
cd blatSrc
make MACHTYPE=$MACHTYPE
sudo mv -i ${HOME}/bin/${MACHTYPE}/* /usr/local/bin

Hyb includes a human transcriptome and miRNA database (hOH7) by default. Databases can be built as follows:

cd data/db
make all

You can test that Hyb was installed correctly with the following:

cd ..
cd fastq
hyb analyse in=testdata.txt db=hOH7

You can check the resulting .hyb files to verify that Hyb was successfully installed (there should be four, ending as follows:

  • “_hybrids.hyb”
  • “_hybrids_ua.hyb”
  • “_hybrids_ua_dg.hyb”
  • “_hybrids_ua_merged.hyb”

This procedure works well for me but may not be ideal for all users. Do you think you could post these instructions or modify the INSTALL script so that it works better on 20.04 LTS? Please let me know if there is something I am missing and INSTALL should be working normally. If you would like, I can also open a pull request to update the README with these instructions.

@tony-travis
Copy link
Collaborator

tony-travis commented Sep 11, 2022 via email

@SreeniEadara
Copy link
Author

Hi Tony,

Sounds good! Let me know if I can help validate a new install script on WSL.

Sincerely,
Sreenivas

@tony-travis
Copy link
Collaborator

Hi, SreeniEadara.

Sorry it's taken me so long to respond: I've just updated the INSTALL script, to include the missing dependencies that you suggested. Please let me know about any issues if you try it out.

Thanks for your interest in "hyb",

Tony.

@gkudla
Copy link
Owner

gkudla commented Dec 14, 2022 via email

@tony-travis
Copy link
Collaborator

tony-travis commented Dec 14, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants