Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quantifying phosphopeptides/sites #37

Open
sooheon opened this issue Aug 8, 2024 · 9 comments
Open

quantifying phosphopeptides/sites #37

sooheon opened this issue Aug 8, 2024 · 9 comments

Comments

@sooheon
Copy link

sooheon commented Aug 8, 2024

Could directLFQ be adjusted to output phosphopeptide/site level aggregations?

I'm curious how much effort this would take.

@ammarcsj
Copy link
Member

ammarcsj commented Aug 12, 2024

Hi, if you put in a table that looks somewhat like the one below, it will also work for site level aggregations. So in the 'protein' column, you can put the collapsed ptmsite_ID and in the 'ion' column you can put all the precursors that map to the site. You can simply run directLFQ on this table:

lfq_manager.run_lfq(input_file, input_type_to_use = "directlfq").

image

Let me know if you have further questions

@sooheon
Copy link
Author

sooheon commented Sep 13, 2024

Sorry the notation inside your protein / ion columns is a little unfamiliar to me — what would be the equivalent columns from DIA-NN report.parquet or tsv?

@ammarcsj
Copy link
Member

could you share a small snippet of the .tsv file here? Then I could have a look

@sooheon
Copy link
Author

sooheon commented Sep 18, 2024

Thank you that'd be very helpful. This is report.parquet:

https://pastebin.com/raw/hf72a22j

0,0,Run0,"",AAAAAAAAAVSR2,AAAAAAAAAVSR2,AAAAAAAAAVSR,AAAAAAAAAVSR,2,0,1,500.780334,Q96JP5,Q96JP5,ZFP91_HUMAN,ZFP91,11.710395,-15.289568,11.671574,-15.114703,0.0,0.0,0.0,0.0,13325.707031,10599.387695,0.000000e+00,0.000000e+00,0.000000e+00,0.795409,0.138446,0.0,-0.116217,0.000000,0.043776,0.984055,0.784129,3.182355e+09,3.109464e+09,11.655906,11.737933,0.021625,71119.804688,71274.015625,71119.804688,71274.015625,71274.015625,0.0,0.006495,0.024844,0.017991,0.008975,0.0,0.017991,0.008975,0.0,0.0,"","",1.0,0.0,0.0,0.000785,0.002551,0.000785,0.000791,0.000660,0.000334
1,6,Run6,"",AAAAAAAAAVSR2,AAAAAAAAAVSR2,AAAAAAAAAVSR,AAAAAAAAAVSR,2,0,1,500.780334,Q96JP5,Q96JP5,ZFP91_HUMAN,ZFP91,11.710683,-15.289568,11.705577,-15.296646,0.0,0.0,0.0,0.0,13183.543945,5253.577637,1.651021e+05,6.579239e+04,7.926380e+04,0.398495,0.077357,0.0,-0.087740,0.399012,0.043776,1.137393,1.125110,3.086060e+09,3.670532e+09,11.683486,11.765830,0.038491,60222.667969,57854.832031,60222.667969,57854.832031,57854.832031,0.0,0.000533,0.003127,0.017991,0.008975,0.0,0.017991,0.008975,0.0,0.0,"","",1.0,0.0,0.0,0.000368,0.000707,0.000369,0.000371,0.000660,0.000334
2,11,Run11,"",AAAAAAAAAVSR2,AAAAAAAAAVSR2,AAAAAAAAAVSR,AAAAAAAAAVSR,2,0,1,500.780334,Q96JP5,Q96JP5,ZFP91_HUMAN,ZFP91,11.773827,-15.289568,11.740934,-15.152036,0.0,0.0,0.0,0.0,17052.261719,7179.111816,5.380993e+04,2.265433e+04,5.380993e+04,0.421006,0.899323,0.0,0.693147,0.592591,0.114301,2.006989,2.094320,2.603881e+09,2.366191e+09,11.746141,11.814918,0.022796,30615.044922,31799.070312,30615.044922,31799.070312,31799.070312,0.0,0.002044,0.011925,0.017991,0.008975,0.0,0.017991,0.008975,0.0,0.0,"","",1.0,0.0,0.0,0.000361,0.000679,0.000361,0.000364,0.000660,0.000334
3,0,Run0,"",AAAAAAALQAK2,AAAAAAALQAK2,AAAAAAALQAK,AAAAAAALQAK,2,1,1,478.779816,P36578,P36578,RL4_HUMAN,RPL4,12.151029,-12.683520,12.159771,-12.742603,0.0,0.0,0.0,0.0,275764.593750,221661.890625,5.560834e+06,4.469844e+06,3.836576e+06,0.803808,0.970348,0.0,-0.131470,0.968869,1.000000,6.345813,2.645251,2.415437e+09,2.287253e+09,12.095872,12.192030,0.042254,250738.312500,220139.078125,250738.312500,220139.078125,220139.078125,0.0,0.000256,0.000474,0.000123,0.000007,0.0,0.000076,0.000007,0.0,0.0,"","",1.0,0.0,0.0,0.000508,0.000785,0.000508,0.000510,0.000385,0.000334
4,1,Run1,"",AAAAAAALQAK2,AAAAAAALQAK2,AAAAAAALQAK,AAAAAAALQAK,2,1,1,478.779816,P36578,P36578,RL4_HUMAN,RPL4,12.154157,-12.683520,12.154436,-12.769872,0.0,0.0,0.0,0.0,129210.875000,123853.781250,2.339438e+06,2.242444e+06,1.521159e+06,0.958540,0.889785,0.0,-0.128458,0.889055,1.000000,5.630449,1.835465,1.987031e+09,1.821948e+09,12.099172,12.195401,0.041893,131105.531250,150049.578125,131105.531250,150049.578125,150049.578125,0.0,0.000515,0.000897,0.000123,0.000007,0.0,0.000076,0.000007,0.0,0.0,"","",1.0,0.0,0.0,0.000498,0.000963,0.000499,0.000501,0.000385,0.000334
5,2,Run2,"",AAAAAAALQAK2,AAAAAAALQAK2,AAAAAAALQAK,AAAAAAALQAK,2,1,1,478.779816,P36578,P36578,RL4_HUMAN,RPL4,12.148050,-12.683520,12.156453,-12.763556,0.0,0.0,0.0,0.0,217075.906250,195377.171875,4.961820e+06,4.465841e+06,3.254191e+06,0.900041,0.939773,0.0,0.008931,0.972394,1.000000,6.165475,2.350549,2.791409e+09,2.538605e+09,12.093519,12.189644,0.044612,195377.171875,183727.328125,195377.171875,183727.328125,183727.328125,0.0,0.000303,0.000453,0.000123,0.000007,0.0,0.000076,0.000007,0.0,0.0,"","",1.0,0.0,0.0,0.000427,0.000837,0.000427,0.000430,0.000385,0.000334
6,3,Run3,"",AAAAAAALQAK2,AAAAAAALQAK2,AAAAAAALQAK,AAAAAAALQAK,2,1,1,478.779816,P36578,P36578,RL4_HUMAN,RPL4,12.166359,-12.683520,12.188883,-12.822777,0.0,0.0,0.0,0.0,725813.125000,285200.156250,1.629776e+07,6.404024e+06,1.131683e+07,0.392939,0.974862,0.0,-0.192207,0.983279,1.000000,6.673508,2.873714,2.606206e+09,2.241895e+09,12.111776,12.221659,0.042378,285200.156250,203638.171875,285200.156250,203638.171875,203638.171875,0.0,0.000123,0.000246,0.000123,0.000007,0.0,0.000076,0.000007,0.0,0.0,"","",1.0,0.0,0.0,0.000379,0.000690,0.000380,0.000382,0.000385,0.000334
7,4,Run4,"",AAAAAAALQAK2,AAAAAAALQAK2,AAAAAAALQAK,AAAAAAALQAK,2,1,1,478.779816,P36578,P36578,RL4_HUMAN,RPL4,12.166615,-12.683520,12.175744,-12.792743,0.0,0.0,0.0,0.0,475374.125000,287772.906250,9.914825e+06,6.002048e+06,6.455768e+06,0.605361,0.976235,0.0,0.008973,0.961680,1.000000,6.642337,2.678076,1.680121e+09,1.750266e+09,12.112012,12.221948,0.042381,287772.906250,220811.140625,287772.906250,220811.140625,220811.140625,0.0,0.000149,0.000284,0.000123,0.000007,0.0,0.000076,0.000007,0.0,0.0,"","",1.0,0.0,0.0,0.000386,0.000714,0.000386,0.000389,0.000385,0.000334
8,5,Run5,"",AAAAAAALQAK2,AAAAAAALQAK2,AAAAAAALQAK,AAAAAAALQAK,2,1,1,478.779816,P36578,P36578,RL4_HUMAN,RPL4,12.194984,-12.683520,12.199633,-12.745639,0.0,0.0,0.0,0.0,130070.906250,160494.765625,2.350802e+06,2.900659e+06,1.208516e+06,1.233902,0.904731,0.0,-0.197559,0.867766,0.996941,5.576844,1.941219,1.407027e+09,1.371925e+09,12.140252,12.236103,0.048164,160494.765625,186142.187500,160494.765625,186142.187500,186142.187500,0.0,0.001539,0.002379,0.000123,0.000007,0.0,0.000076,0.000007,0.0,0.0,"","",1.0,0.0,0.0,0.000523,0.000929,0.000524,0.000527,0.000385,0.000334
9,6,Run6,"",AAAAAAALQAK2,AAAAAAALQAK2,AAAAAAALQAK,AAAAAAALQAK,2,1,1,478.779816,P36578,P36578,RL4_HUMAN,RPL4,12.178555,-12.683520,12.196507,-12.847833,0.0,0.0,0.0,0.0,682241.250000,280371.875000,1.458802e+07,5.995049e+06,9.695989e+06,0.410957,0.975438,0.0,-0.109065,0.985305,0.998437,6.771354,2.657729,1.650238e+09,1.608308e+09,12.123475,12.233718,0.040760,280371.875000,210758.421875,280371.875000,210758.421875,210758.421875,0.0,0.000211,0.000469,0.000123,0.000007,0.0,0.000076,0.000007,0.0,0.0,"","",1.0,0.0,0.0,0.000368,0.000707,0.000369,0.000371,0.000385,0.000334

@ammarcsj
Copy link
Member

ammarcsj commented Sep 18, 2024

Thanks, however, I would need it with headers and also all available columns. You have searched with phosphorylation?

@sooheon
Copy link
Author

sooheon commented Sep 18, 2024

My bad, didn't check columns. Yes, you can see in Modified.Sequence output that UniMod:21 is included.

https://pastebin.com/vPmHgpz0

@ammarcsj
Copy link
Member

Thanks! From what I can see, you can use the Precursor.Id as "ion" column and the Protein.Sites as "protein" column and then reformat the table. Additional filters might be necessary, e.g. PTM.Q.Value or PTM.Site.Confidence. Best check the DIA-NN manual for that.

@sooheon
Copy link
Author

sooheon commented Sep 19, 2024

Thank you. Is there any special notation for one observed peptide with multiple modifications, or are they just treated as a single unit?

@ammarcsj
Copy link
Member

I believe that this will be accounted for in the Protein.Sites column, in your example, there is eg listed: Q96JP5:S69. Let's assume there would be another modified site closeby, then this would probably be something like Q96JP5:S69:Y71. Maybe you can check your table for such examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants