Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scaffold2fasta: Assertion `expectedOverlap < (int)contigString.length()' failed #125

Open
sjackman opened this issue Jul 30, 2016 · 18 comments

Comments

@sjackman
Copy link
Contributor

Any suggestions?

sga scaffold2fasta -m 500 --write-unplaced --use-overlap -o hsapiens-scaffolds.fa -a hsapiens-graph.asqg.gz hsapiens.scaf
Graph best: 0
Graph unique: 1
Find overlaps: 4
Deleted edges for 0 super repetitive vertices
Warning: removed 2900 duplicate edges
sga: ScaffoldRecord.cpp:317: bool ScaffoldRecord::introduceGap(int, const string&, const ScaffoldLink&, std::__cxx11::string&) const: Assertion `expectedOverlap < (int)contigString.length()' failed.
Vertices: 6703921 Edges: 28309816 Islands: 136547 Tips: 625936 Monobranch: 3553099 Dibranch: 1249436 Simple: 5058208
@jts
Copy link
Owner

jts commented Jul 30, 2016

I haven't seen that one before - did you use the -mean setting for DistanceEst?

@sjackman
Copy link
Contributor Author

I did at first, and I had suspected that's what had gone wrong. But, no, I repeated the estimates with --mle and got the same error.

@jts
Copy link
Owner

jts commented Jul 30, 2016

I'm at a bit of a loss - could you send the .scaf file and .asqg.gz (or contig fasta, if the error appears with that input too). I won't be able to look until next week though.

@sjackman
Copy link
Contributor Author

sjackman commented Jul 31, 2016

Here you go:

  • ftp://ftp.bcgsc.ca/public/sjackman/hsapiens-graph.asqg.gz
  • ftp://ftp.bcgsc.ca/public/sjackman/hsapiens.scaf
sga scaffold2fasta -m 500 --write-unplaced --use-overlap -o hsapiens-scaffolds.fa -a hsapiens-graph.asqg.gz hsapiens.scaf

@sjackman
Copy link
Contributor Author

sjackman commented Oct 27, 2016

Any chance you've had some time to look into this issue, Jared?

@jts
Copy link
Owner

jts commented Oct 28, 2016

Sorry, no. My time is short these days I'm afraid. Is this blocking you from something?

@sjackman
Copy link
Contributor Author

No worries. I am sympathetic to time constraints. We wanted to include SGA in an assembly comparison of GIAB HG004. SGA will be included in the comparison of contigs. Including scaffold results hinges on this issue.

@sjackman
Copy link
Contributor Author

sjackman commented Nov 1, 2016

What's the format of the .scaf file? Perhaps I could use MergeContigs from ABySS as a hack work around to create the scaffolds FASTA file.

contig-6181580  contig-4954829,228,151.2,0,0,D  contig-197051,2178,122.2,0,1,D

@sjackman
Copy link
Contributor Author

sjackman commented Nov 1, 2016

The format is:

std::ostream& operator<<(std::ostream& out, const ScaffoldLink& link)
{
    out << link.endpointID << "," << link.distance << "," << link.stdDev << ","
        << link.edgeData.getDir() << "," << link.edgeData.getComp() << "," << link.getTypeCode();
    return out;
}

https://github.com/jts/sga/blob/master/src/Scaffold/ScaffoldLink.cpp#L118

@sjackman
Copy link
Contributor Author

sjackman commented Nov 1, 2016

Could I convert this to an ABySS path file like so, or is it not quite so simple? How do I convert the Dir and Comp components to a + / - orientation?

1    contig-6181580+ 100N contig-4954829+ 100N contig-197051-

@sjackman
Copy link
Contributor Author

sjackman commented Nov 2, 2016

Is the .scaf file the final list of which contigs go in which scaffolds with one scaffold per line, or is it a list of distance estimates that still need to be merged into larger scaffolds?

@jts
Copy link
Owner

jts commented Nov 2, 2016

The former - it is the final list of contigs that go into scaffolds.

@sjackman
Copy link
Contributor Author

sjackman commented Nov 2, 2016

Can you give me a brief description of how to convert a SGA scaf file to an ABySS path file? If you are able to convert the following .scaf record manually, I can figure it out and write the converter.

contig-1297628  contig-1106169,-24,25.3,0,1,D   contig-4496238,-47,17.3,1,1,D   contig-2839914,-92,242.9,0,1,D  contig-1247260,-62,97.3,1,1,D   contig-2437944,3840,177,0,0,D   contig-368915,112,186.4,0,1,D   contig-4976804,-306,22.8,1,0,D  contig-3455841,371,216.8,1,1,D  contig-3969805,201,94.5,0,0,D   contig-742829,-92,135.6,0,1,D   contig-3111798,11,28.9,1,1,D    contig-1750901,-79,27.5,0,0,D

Uses cases like this one are why I so badly want to a GFA standard! It would be great if scaffold2fasta and MergeContigs accepted the same file formats and were interchangeable. Please do comment on the proposed GFA 2 spec if you find the time. The format of the path record is one of the file items to nail down. GFA-spec/GFA-spec#48

@jts
Copy link
Owner

jts commented Nov 2, 2016

Hi Shaun,

Agreed about GFA2.

I describe the file format here:

https://groups.google.com/forum/#!msg/sga-users/0pqCJwpPe8A/RRvCVPKU25QJ;context-place=forum/sga-users

The orientation bit describes whether the scaffold should be built left-to-right (the first contig in the record is the first contig in the scaffold) or right-to-left (the first contig is the last record). See here:

https://github.com/jts/sga/blob/master/src/Scaffold/ScaffoldRecord.cpp#L45

I suggest you don't spend much time on this - SGA is no longer under active development and there are better short read assemblers out there.

Sorry I can't help more!

Jared

@sjackman
Copy link
Contributor Author

sjackman commented Nov 2, 2016

No worries, Jared. Quick comments are helpful. We're including SGA in this comparison to ABySS 2.0 due to its low memory usage. We're also comparing to DISCOVARdenovo, and it yields great contiguity, but takes a lot of memory and time.

From the code it appears that only the direction bit of the first link matters, and the rest are ignored. Is that correct? https://github.com/jts/sga/blob/master/src/Scaffold/ScaffoldRecord.cpp#L60

@jts
Copy link
Owner

jts commented Nov 2, 2016

Yes I believe that is correct but verify :)

On Wed, Nov 2, 2016 at 1:48 PM, Shaun Jackman [email protected]
wrote:

No worries, Jared. Quick comments are helpful. We're including SGA in this
comparison due to its low memory usage.

From the code it appears that only the direction bit of the first link
matters, and the rest are ignored. Is that correct?
https://github.com/jts/sga/blob/master/src/Scaffold/ScaffoldRecord.cpp#L60


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#125 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAXxn0Uo125sAzXYLlyxrrUIKqEG5onfks5q6MzWgaJpZM4JYrqM
.

@sjackman
Copy link
Contributor Author

sjackman commented Nov 2, 2016

Here's a hacktastic script to convert a SGA .scaf file to an ABySS .path file. Thanks for your help, Jared.

#!/bin/sh
set -eu
exec gawk -e '
        !/\t/ { next }
        {
            # Determine whether the contigs are in reverse order.
            reverse = /^[^\t]*\t[^\t]*,1,[01],D/
            # Orient the contigs.
            $1 = $1 "+"
            rc = 0
            for (i = 2; i <= NF; ++i) {
                if ($i ~ /,1,D/)
                    rc = !rc
                sub(/,.*/, rc ? "-" : "+", $i)
            }
            # Print the contig IDs and orientations.
            printf "%u", id++
            if (reverse) {
                for (i = NF; i > 0; --i)
                    printf " %s", $i
                printf "\n"
            } else {
                print " " $0
            }
        }' "$@" \
    | gsed -e 's/ /\t/;s/ / 199N /g'

sjackman added a commit to bcgsc/abyss-2.0-giab that referenced this issue Nov 2, 2016
sga scaffold2fasta blows an assertion.
See jts/sga#125
@sjackman sjackman changed the title Assertion `expectedOverlap < (int)contigString.length()' failed scaffold2fasta: Assertion `expectedOverlap < (int)contigString.length()' failed Nov 3, 2016
@jts
Copy link
Owner

jts commented Nov 3, 2016

Great, thanks for the script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants