-
Notifications
You must be signed in to change notification settings - Fork 8
00 Accepted input formats
Variable name | Column name or prefix | Description | Mandatory |
---|---|---|---|
Protein ID | Majority.protein.IDs | unique identifier | yes |
Gene name | Gene.names | yes | |
LFQ intensity prefix | LFQIntensity_ | MaxQuants (MQs) 'LFQ intensity' columns | no |
Imputed intensity prefix | ImputedIntensity_ | Imputed (and potentially re-normalized) intensities | yes |
razor unique count | razorUniqueCount | MQs 'razor+unique count' column | no |
razor unique prefix | razorUniqueCount | MQs 'razor+unique count' column per sample | no |
p-value prefix | P.Value_ | e.g P.Value_group1__vs__group2
|
no |
adj. p-value prefix | adj.P.Val_ | e.g adj.P.Val_group1__vs__group2
|
no |
Log2 fold change prefix | logFC_ | e.g logFC_group1__vs__group2
|
yes |
avg. expression prefix | AveExpr_ | e.g AveExp_group1__vs__group2
|
no |
comparison infix | __vs__ |
see below | yes |
Quantified column | quantified | see below | no |
Potential contaminant column | Potential.contaminant | MQs Potential.contaminants column | no |
- IntensityPrefix, ImputedIntensityPrefix and abundancePrefix columns are log2 transformed, all 0s need to be converted to NANs. No INF values allowed. amica searches for all Intensity prefixes in the column names, if you want to provide more than the dafault intensities. However, all intensity prefixes must have the same number of samples in order to get processed.
- ImputedIntensityPrefix should only contain filtered, imputed and normalized values
- Quantified column: All proteins passing filter by valid values, spectraCount and
razorUniqueCount thresholds that have been quantified are set to "+" in
this column. Otherwise no value ("") is written in the
column. If no
quantified
column is provided complete cases (i.e., have no missing values) of allImputedIntensity
and all columns containing the group comparison infix__vs__
are set to be quantified. - comparisonInfix: The infix is important to retrieve the group ids
from a group comparison (e.g for downstream visualizations like heatmaps).
The groups before and after the
__vs__
infix need to match with groups defined in the uploaded experimental design. - razorUniqueCount is a column, razorUniquePrefix is the prefix to the count per sample, but they may very well have the same value (just like in MaxQuant’s proteinGroups.txt)
- Proteins inferred from reverse hits and peptides ”only identified by site modifications” are not to be written into amica’s output. Additional columns can be added in the future but are at the moment not considered when uploaded.
For MaxQuant label-free quantification (LFQ) output following columns are parsed:
Variable name | Column name/Prefix | Comment |
---|---|---|
proteinId | Majority protein IDs |
|
geneName | Gene names |
|
intensityPrefix | LFQ Intensity <sample> |
|
Imputed Int. prefix | get's calculated | |
abundancePrefix | iBAQ <sample> |
|
razorUniqueCount | Razor + unique peptides |
specific column of summarized razor+unique count |
razorUniquePrefix | Razor + unique peptides <sample> |
corresponds to razor+unique count of a sample |
spectraCount | MS/MS count |
|
contaminantCol | Potential contaminant |
amica automatically filters out reverse hits and proteins only identified by site.
For FragPipe/Philosopher LFQ output following columns are parsed:
Variable name | Column name/Prefix or Suffix | Comment |
---|---|---|
Default parameters | ||
proteinId | Protein ID |
|
geneName | Gene Names |
|
intensityPrefix | <sample> Razor Intensity |
|
Imputed Int. prefix | get's calculated | |
abundancePrefix | ||
razorUniqueCount | Unique Stripped Peptides |
|
razorUniquePrefix | <sample> Razor Spectral Count |
|
spectraCount | Summarized Razor Spectral Count |
|
FragPipe v16 (MSFragger v3.3, Philosopher v4.0.0) | ||
proteinId | Protein ID |
|
geneName | Gene Names |
|
intensityPrefix | <sample> Intensity |
|
Imputed Int. prefix | get's calculated | |
abundancePrefix | ||
razorUniqueCount | Combined Total Peptides |
|
razorUniquePrefix | <sample> Razor Spectral Count |
|
spectraCount | Combined Spectral Count |
|
FragPipe v17 (MSFragger v3.4, Philosopher v4.1.0) | ||
proteinId | Protein ID |
|
geneName | Gene |
|
intensityPrefix | <sample> MaxLFQ Intensity |
|
Imputed Int. prefix | get's calculated | |
abundancePrefix | ||
razorUniqueCount | Combined Total Peptides |
|
razorUniquePrefix | <sample> Razor Spectral Count |
|
spectraCount | Combined Spectral Count |
For FragPipe/Philosopher TMT [abundance/ratio]_protein_[normalization].tsv
output following columns are parsed:
Variable name | Column name/Prefix or Suffix | Comment |
---|---|---|
proteinId | ProteinID |
|
geneName | Index |
|
intensityPrefix | <sample> |
There is no prefix. |
spectraCount | NumberPSM |
For Spectronaut's PG report following columns are parsed:
Variable name | Column name/Prefix or Suffix | Comment |
---|---|---|
proteinId | PG ProteinAccessions |
|
geneName | PG Genes |
|
intensityPrefix | PG Quantity <sample> |
|
razorUniqueCount | PG RunEvidenceCount |
non-mandatory |
razorUniquePrefix | PG NrOfPrecursorsIdentified <sample> |
non-mandatory |
For DIA-NN's PG matrix following columns are parsed:
Variable name | Column name/Prefix or Suffix | Comment |
---|---|---|
proteinId | Protein Group |
|
geneName | Genes |
|
intensityPrefix | <sample> |
There is no prefix. |
The design file has two columns: samples and groups. The sample names in the samples column need to match the column names of the input file in the order of the input file.
groups | samples |
---|---|
group1 | group1_sample_1 |
group1 | group1_sample_2 |
group1 | group1_sample_3 |
group2 | group2_sample_1 |
group2 | group2_sample_2 |
group2 | group2_sample_3 |
group3 | group3_sample_1 |
group3 | group3_sample_2 |
group3 | group3_sample_3 |
The contrast matrix tells amica which group comparisons to perform. The column names of this file can be freely chosen, but column names must be provided. For each row in this file the comparison group1-group2 is performed. If one wants to change the sign of the fold changes the position of the groups needs to be switched in the file (e.g group2-group1 instead of group1-group2
group1 | group2 |
---|---|
group1 | group2 |
group1 | group3 |
group2 | group3 |
The specification file needs to be uploaded if a custom tab-delimited file is analyzed. The file has two columns, Variable and Pattern, these are used to change the prefixes (or post-fixes) to identify the relevant columns in your data.
Following columns can be parsed:
Variable | Pattern | Mandatory |
---|---|---|
proteinId | ... | yes |
geneName | ... | yes |
intensityPrefix | ... | yes |
abundancePrefix | ... | no |
razorUniqueCount | ... | no |
razorUniquePrefix | ... | no |
spectraCount | ... | no |
contaminantCol | ... | no |
The proteinId column must only contain unique entries. If razorUnique count is missing some functionality will be lost (DEqMS). It is important that the provided intensities are not log2-transformed. An example format is provided in the examples.zip file The specification file needs to be uploaded if a custom tab-delimited file is analyzed. The file has two columns, Variable and Pattern, these are used to change the prefixes (or post- fixes) to identify the relevant columns in your data.
An example specification file
is provided here (the corresponding custom file can be downloaded in amica Input tab or from the file examples.zip):
Variable | Pattern |
---|---|
proteinId | Majority.protein.IDs |
geneName | Gene.names |
spectraCount | spectraCount |
razorUniqueCount | razorUniqueCount |
razorUniqueCountPrefix | razorUniqueCount_ |
abundancePrefix | iBAQ |
intensityPrefix | LFQIntensity_ |
contaminantCol | Potential.contaminant |
If you want to upload data into amica that has already been analyzed in a different tool or context (e.g data from RNA-Seq) you need to change the column names of your file into amica's column name.
The following example demonstrates how to do this:
uniqueID | Gene | logExpr_sample_1 | logExpr_sample_2 | ... | logExpr_sample_n | pval_trtmt/ctrl | padj_trtmt/ctrl | logfc_trtmt/ctrl |
---|---|---|---|---|---|---|---|---|
id_1 | Gene_1 | 30 | 30.5 | ... | 28.2 | 0.00012 | 0.002 | 1.7 |
id_2 | Gene_2 | 28.6 | 28.5 | ... | 26.9 | 0.0002 | 0.003 | 1.68 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
id_p | Gene_p | 20 | 20.3 | ... | 18 | 0.99 | 0.99 | -0.02 |
The uniqueID
column needs to be renamed into Majority.protein.IDs
,
the Gene
column into Gene.names
and all logExpr_
prefixes need to be replaced by ImputedIntensity_
(e.g ImputedIntensity_sample_1, ImputedIntensity_sample_2, ..., ImputedIntensity_sample_n
).
Columns containing the results from the differential expression analysis (pval_trtmt/ctrl, padj_trtmt/ctrl, logfc_trtmt/ctrl
) need to be adapted that they contain the correct prefixes and the __vs__
- infix.
pval_trtmt/ctrl
has to be changed to P.Value_trtmt__vs__ctrl
,
padj_trtmt/ctrl
to adj.P.Val_trtmt__vs__ctrl
and
logfc_trtmt/ctrl
to logFC_trtmt__vs__ctrl
.
Furthermore, you could specify a quantified
column that contains for each entry a +
if it has been quantified, else it needs to be left empty. If none is provided, amica automatically creates one and sets a +
in the quantified
column for all entries that do not contain NAs
in the ImputedIntensity
and __vs__
- infix columns.
The data looks now like this:
Majority.protein.IDs | Gene.names | ImputedIntensity_sample_1 | ImputedIntensity_sample_2 | ... | ImputedIntensity_sample_n | P.Value_trtmt__vs__ctrl | adj.P.Val_trtmt__vs__ctrl | logFC_trtmt__vs__ctrl | quantified |
---|---|---|---|---|---|---|---|---|---|
id_1 | Gene_1 | 30 | 30.5 | ... | 28.2 | 0.00012 | 0.002 | 1.7 | + |
id_2 | Gene_2 | 28.6 | 28.5 | ... | 26.9 | 0.0002 | 0.003 | 1.68 | + |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
id_p | Gene_p | 20 | 20.3 | ... | 18 | 0.99 | 0.99 | -0.02 | + |
Save this file as a tab-separated txt / tsv file (you can choose a file name of your choice, the output format of amica is by default amica_protein_groups.txt
).
Finally, we need to create a tab-separated experimental design that assigns the samples to their appropriate group. Here it is important to link the samples to the p-values and the fold-change columns of the group comparison infixes (e.g logFC_trtmt__vs__ctrl
corresponds to the group comparison trtmt
vs ctrl
). All groups from the group comparison infixes need to be defined in the experimental design. If you have multiple Intensity
- prefixes in your amica file, it is important that all of them have the same number of samples. The sample names in the samples column of the design need to match the column names of the input file in the order of the input file.
groups | samples |
---|---|
trtmt | sample_1 |
trtmt | sample_2 |
trtmt | sample_3 |
ctrl | sample_4 |
ctrl | sample_5 |
ctrl | sample_6 |
Save this file as a tab-separated txt/tsv file (you can choose a file name of your choice). Now you can upload both files and analyze and visualize your data in amica.