Skip to content

Commit

Permalink
Merge branch 'main' of https://github.com/theiagen/theiavalidate into…
Browse files Browse the repository at this point in the history
… main
  • Loading branch information
sage-wright committed Jan 29, 2024
2 parents e802ff9 + ac70f85 commit fd4414a
Show file tree
Hide file tree
Showing 49 changed files with 831 additions and 20 deletions.
Binary file added .DS_Store
Binary file not shown.
7 changes: 6 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -160,4 +160,9 @@ cython_debug/
#.idea/

# IDE
.vscode/
.vscode/
.devcontainer

# testing files
sandbox/
file_diffs/
14 changes: 8 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,16 +72,15 @@ column2 SET
column3 0.01
```

Currently implmented validation criteria include:
Currently implemented validation criteria include:

| validation_criteria | explanation |
| --- | --- |
| EXACT | the values in the two columns must be exactly the same; in this case `[foo,bar] != [bar,foo]` |
| SET | the values in the two columns must be the same set of values; in this case `[foo,bar] == [bar,foo]` |
| \<FLOAT\> | the values in the two columns must be within `<FLOAT>*100` of each other; e.g., 0.3 -> 30% difference allowed |
| IGNORE | the values in the two columns are assumed to match; in this case `foo == bar` |
| EXACT | The values in the two columns must be exactly the same; in this case `[foo,bar] != [bar,foo]`. When applied to columns referencing files, file contents will be compared to check if they are identical.|
| SET | The values in the two columns must be the same set of values; in this case `[foo,bar] == [bar,foo]`. When applied to columns referencing files, the lines within the files will be sorted alphabetically before comparing.|
| \<FLOAT\> | The values in the two columns must be within `<FLOAT>*100` of each other; e.g., 0.3 -> 30% difference allowed. |
| IGNORE | The values in the two columns are assumed to match; in this case `foo == bar`. |

Future comparisons to include `FILE-EXACT`, `FILE-SET`, `FILE-<FLOAT>`.

#### Optional: `column_translation`

Expand Down Expand Up @@ -149,3 +148,6 @@ This file (available as an HTML and PDF) is a summary of the differences between
- the number of samples failing the validation criteria

If a `validation_criteria.tsv` file was provided, a definition of the (currently implemented) validation criteria are provided at the bottom of the table

#### `<sample>_<column>_diff.txt`
Shows the differing lines within mismatching files for a given sample and column. Each pair of mismatching files generates a separate file.
2 changes: 2 additions & 0 deletions __init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
__VERSION__ = "v0.0.1"
import os, sys; sys.path.append(os.path.dirname(os.path.realpath(__file__)))
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
column criteria
assembly_length 0.01
gambit_predicted_taxon EXACT
amrfinderplus_amr_core_genes SET
extra_column IGNORE
file_column EXACT
sort_file_column SET
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
amrfinderplus_amr_genes amrfinderplus_amr_core_genes
extra_column2 extra_column
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"assembly_length,gambit_predicted_taxon,amrfinderplus_amr_core_genes,extra_column,file_column,sort_file_column"
6 changes: 6 additions & 0 deletions examples/file_comparison/file_comparison_table1.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
entity:table1_with_files_id amrfinderplus_amr_core_genes assembly_length extra_column file_column gambit_predicted_taxon sort_file_column
sample01 tet(A),aph(6)-Id,aph(3'')-Ib 4783605 extra_value gs://path/to/table1_files/match1-1.txt Salmonella enterica gs://path/to/table1_files/match1-1.txt
sample02 glpT_E448K,gyrA_D87G,gyrA_S83L,sat2,dfrA1,parC_S80I,blaCTX-M-27 5226301 gs://path/to/table1_files/mismatch1-1.txt Shigella sonnei gs://path/to/table1_files/mismatch1-1.txt
sample03 4719410 extra_value gs://path/to/table1_files/mismatch2-1.txt Shigella gs://path/to/table1_files/sortmatch1-1.txt
sample04 sul1,aadA7,parC_S87L,gyrA_T83I 6674526 gs://path/to/table1_files/mismatch2-1.txt Pseudomonas aeruginosa gs://path/to/table1_files/mismatch1-1.txt
sample05 parC_S80Y,tet(38),mecR1,murA_G257D,fosB,gyrA_S84L,mecA 2773544 Staphylococcus aureus
6 changes: 6 additions & 0 deletions examples/file_comparison/file_comparison_table2.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
entity:table2_with_files_id amrfinderplus_amr_genes assembly_length extra_column2 file_column gambit_predicted_taxon sort_file_column
sample01 aph(3'')-Ib,aph(6)-Id,tet(A) 4783610 extra_value gs://path/to/table2_files/match1-1.txt Salmonella enterica gs://path/to/table2_files/match1-1.txt
sample02 glpT_E448K,gyrA_D87G,gyrA_S83L,parC_S80I,blaCTX-M-27,sat2,dfrA1 5274928 gs://path/to/table2_files/mismatch1-1.txt Shigella sonnei gs://path/to/table2_files/mismatch1-1.txt
sample03 glpT_E448K,gyrA_D87G,gyrA_S83L,sat2 5287603 gs://path/to/table2_files/mismatch2-1.txt Shigella sonnei gs://path/to/table2_files/sortmatch1-1.txt
sample04 parC_S87L,gyrA_T83I,sul1,aadA7 6674503 extra_value Pseudomonas aeruginosa
sample05 parC_S80Y,tet(38),fosB,gyrA_S84L,mecA,mecR1,murA_G257D 2771914 Staphylococcus aureus
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
--- table1_files/path/to/table1_files/mismatch1-1.txt+++ table2_files/path/to/table2_files/mismatch1-1.txt@@ -1,3 +1,3 @@-foo
-bar
+eggs
+spam

Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
--- table1_files/path/to/table1_files/mismatch1-1.txt+++ table2_files/path/to/table2_files/mismatch1-1.txt@@ -1,3 +1,3 @@-foo
-bar
+eggs
+spam

Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
--- table1_files/path/to/table1_files/mismatch2-1.txt+++ table2_files/path/to/table2_files/mismatch2-1.txt@@ -1,2 +1 @@-1 2 3
-
+1 2
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
--- table1_files/path/to/table1_files/sortmatch1-1.txt+++ table2_files/path/to/table2_files/sortmatch1-1.txt@@ -1,3 +1,3 @@+baz
foo
bar
-baz
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
amrfinderplus_amr_core_genes amrfinderplus_amr_core_genes assembly_length assembly_length extra_column extra_column gambit_predicted_taxon gambit_predicted_taxon sort_file_column sort_file_column file_column file_column
table1_with_files.tsv table2_with_files.tsv table1_with_files.tsv table2_with_files.tsv table1_with_files.tsv table2_with_files.tsv table1_with_files.tsv table2_with_files.tsv table1_with_files.tsv table2_with_files.tsv table1_with_files.tsv table2_with_files.tsv
samples
sample01 tet(A),aph(6)-Id,aph(3'')-Ib aph(3'')-Ib,aph(6)-Id,tet(A) 4783605 4783610
sample02 glpT_E448K,gyrA_D87G,gyrA_S83L,sat2,dfrA1,parC_S80I,blaCTX-M-27 glpT_E448K,gyrA_D87G,gyrA_S83L,parC_S80I,blaCTX-M-27,sat2,dfrA1 5226301 5274928 gs://path/to/table1_files/mismatch1-1.txt gs://path/to/table2_files/mismatch1-1.txt gs://path/to/table1_files/mismatch1-1.txt gs://path/to/table2_files/mismatch1-1.txt
sample03 glpT_E448K,gyrA_D87G,gyrA_S83L,sat2 4719410 5287603 extra_value Shigella Shigella sonnei gs://path/to/table1_files/sortmatch1-1.txt gs://path/to/table2_files/sortmatch1-1.txt gs://path/to/table1_files/mismatch2-1.txt gs://path/to/table2_files/mismatch2-1.txt
sample04 sul1,aadA7,parC_S87L,gyrA_T83I parC_S87L,gyrA_T83I,sul1,aadA7 6674526 6674503 extra_value gs://path/to/table1_files/mismatch1-1.txt gs://path/to/table1_files/mismatch2-1.txt
sample05 parC_S80Y,tet(38),mecR1,murA_G257D,fosB,gyrA_S84L,mecA parC_S80Y,tet(38),fosB,gyrA_S84L,mecA,mecR1,murA_G257D 2773544 2771914
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Column assembly_length assembly_length gambit_predicted_taxon gambit_predicted_taxon amrfinderplus_amr_core_genes amrfinderplus_amr_core_genes file_column file_column sort_file_column sort_file_column
Table table1_with_files.tsv table2_with_files.tsv table1_with_files.tsv table2_with_files.tsv table1_with_files.tsv table2_with_files.tsv table1_with_files.tsv table2_with_files.tsv table1_with_files.tsv table2_with_files.tsv
sample01
sample02 gs://path/to/table1_files/mismatch1-1.txt gs://path/to/table2_files/mismatch1-1.txt gs://path/to/table1_files/mismatch1-1.txt gs://path/to/table2_files/mismatch1-1.txt
sample03 4719410.0 5287603.0 Shigella Shigella sonnei glpT_E448K,gyrA_D87G,gyrA_S83L,sat2 gs://path/to/table1_files/mismatch2-1.txt gs://path/to/table2_files/mismatch2-1.txt
sample04 gs://path/to/table1_files/mismatch2-1.txt gs://path/to/table1_files/mismatch1-1.txt
sample05
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
samples amrfinderplus_amr_core_genes assembly_length extra_column file_column gambit_predicted_taxon sort_file_column
sample01 tet(A),aph(6)-Id,aph(3'')-Ib 4783605 extra_value gs://path/to/table1_files/match1-1.txt Salmonella enterica gs://path/to/table1_files/match1-1.txt
sample02 glpT_E448K,gyrA_D87G,gyrA_S83L,sat2,dfrA1,parC_S80I,blaCTX-M-27 5226301 gs://path/to/table1_files/mismatch1-1.txt Shigella sonnei gs://path/to/table1_files/mismatch1-1.txt
sample03 4719410 extra_value gs://path/to/table1_files/mismatch2-1.txt Shigella gs://path/to/table1_files/sortmatch1-1.txt
sample04 sul1,aadA7,parC_S87L,gyrA_T83I 6674526 gs://path/to/table1_files/mismatch2-1.txt Pseudomonas aeruginosa gs://path/to/table1_files/mismatch1-1.txt
sample05 parC_S80Y,tet(38),mecR1,murA_G257D,fosB,gyrA_S84L,mecA 2773544 Staphylococcus aureus
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
samples amrfinderplus_amr_core_genes assembly_length extra_column file_column gambit_predicted_taxon sort_file_column
sample01 aph(3'')-Ib,aph(6)-Id,tet(A) 4783610 extra_value gs://path/to/table2_files/match1-1.txt Salmonella enterica gs://path/to/table2_files/match1-1.txt
sample02 glpT_E448K,gyrA_D87G,gyrA_S83L,parC_S80I,blaCTX-M-27,sat2,dfrA1 5274928 gs://path/to/table2_files/mismatch1-1.txt Shigella sonnei gs://path/to/table2_files/mismatch1-1.txt
sample03 glpT_E448K,gyrA_D87G,gyrA_S83L,sat2 5287603 gs://path/to/table2_files/mismatch2-1.txt Shigella sonnei gs://path/to/table2_files/sortmatch1-1.txt
sample04 parC_S87L,gyrA_T83I,sul1,aadA7 6674503 extra_value Pseudomonas aeruginosa
sample05 parC_S80Y,tet(38),fosB,gyrA_S84L,mecA,mecR1,murA_G257D 2771914 Staphylococcus aureus
2 changes: 2 additions & 0 deletions tests/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
__VERSION__ = "v0.0.1"
import os, sys; sys.path.append(os.path.dirname(os.path.realpath(__file__)))
3 changes: 3 additions & 0 deletions tests/table1_files/match1-1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
foo
bar

3 changes: 3 additions & 0 deletions tests/table1_files/match1-2.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
baz
eggs

3 changes: 3 additions & 0 deletions tests/table1_files/match1-3.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
spam
monty

2 changes: 2 additions & 0 deletions tests/table1_files/match2-1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
1 2 3

2 changes: 2 additions & 0 deletions tests/table1_files/match2-2.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
4 5 6

2 changes: 2 additions & 0 deletions tests/table1_files/match2-3.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
7 8 9

3 changes: 3 additions & 0 deletions tests/table1_files/mismatch1-1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
foo
bar

2 changes: 2 additions & 0 deletions tests/table1_files/mismatch1-2.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
foo

4 changes: 4 additions & 0 deletions tests/table1_files/mismatch1-3.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@

spam
eggs

2 changes: 2 additions & 0 deletions tests/table1_files/mismatch2-1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
1 2 3

2 changes: 2 additions & 0 deletions tests/table1_files/mismatch2-2.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
5 6 6

2 changes: 2 additions & 0 deletions tests/table1_files/mismatch2-3.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
hello, world

3 changes: 3 additions & 0 deletions tests/table1_files/sortmatch1-1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
foo
bar
baz
3 changes: 3 additions & 0 deletions tests/table2_files/match1-1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
foo
bar

3 changes: 3 additions & 0 deletions tests/table2_files/match1-2.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
baz
eggs

3 changes: 3 additions & 0 deletions tests/table2_files/match1-3.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
spam
monty

2 changes: 2 additions & 0 deletions tests/table2_files/match2-1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
1 2 3

2 changes: 2 additions & 0 deletions tests/table2_files/match2-2.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
4 5 6

2 changes: 2 additions & 0 deletions tests/table2_files/match2-3.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
7 8 9

3 changes: 3 additions & 0 deletions tests/table2_files/mismatch1-1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
eggs
spam

3 changes: 3 additions & 0 deletions tests/table2_files/mismatch1-2.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
foo
foo

3 changes: 3 additions & 0 deletions tests/table2_files/mismatch1-3.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
spam

eggs
1 change: 1 addition & 0 deletions tests/table2_files/mismatch2-1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
1 2
2 changes: 2 additions & 0 deletions tests/table2_files/mismatch2-2.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
4 5 6

1 change: 1 addition & 0 deletions tests/table2_files/mismatch2-3.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
hello, world!
3 changes: 3 additions & 0 deletions tests/table2_files/sortmatch1-1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
baz
foo
bar
Loading

0 comments on commit fd4414a

Please sign in to comment.