Using Universe #2

lilyminium · 2019-08-31T06:07:53Z

Story: As a molecular dynamics scientist who is familiar with packages such as mdtraj and cpptraj, I want to quickly get information such as atom properties and coordinates, so that I can analyse my data.

Acceptance criteria:

I understand the difference between and hierarchical relationships of a Topology, Trajectory, AtomGroup, and Universe.
~~I understand that there is a difference between a parser and a reader~~ not important (see below)
I can find a page that goes details supported formats, and the information MDAnalysis reads and guesses from each
I understand important arguments for creating a Universe, e.g. guess_bonds, all_coordinates, in_memory.
I know how to use auxiliary readers, which are supported, and how to access auxiliary properties.
I understand how positions work with a trajectory and how to access specific frames.
I understand certain limitations, e.g. Universe cannot support varying numbers of atoms.
I have a link to the AtomGroup and Topology sections.

The text was updated successfully, but these errors were encountered:

richardjgowers · 2019-09-04T19:09:57Z

Parser vs Reader is an annoyance, you can "parse coordinates" and "read a topology", the verb doesn't disambiguate. One option is we could rename these (at least in all documentation) to TopologyParser and CoordinateReader so it's extra clear what is getting parsed/read.

The Topology object itself isn't well documented, but this is partly because it's not currently a public part of the API. I don't think there's anything that a user ever has to do which directly touches the object, all things are done via AtomGroup. (This is because of historical reasons, originally things were actually stored in AtomGroups (historically, technically a list of Atom objects), rather than the access via AtomGroup approach now).

So I think you might find that the Topology object doesn't need documenting for users...

see also:

MDAnalysis/mdanalysis#2199

orbeckst · 2019-09-05T21:38:07Z

Parsers vs Readers gets even more confusing when we use a single file for both topology and coordinate information.

On the other hand, anyone doing MD knows about "topology files" so perhaps the difficulty is more making clear what out "static" data are (atom identities, bonds, charges, ...) and our "dynamic" ones (positions, velocities, forces, box information, ... and auxiliaries for the advanced crowd).

I agree to drop the Topology object for right now. More importantly is how to make use of what the topology enables, namely bonds(), angles() etc – this is woefully underdocumented.

orbeckst · 2019-09-05T21:57:00Z

hierarchical relationships

In MDAnalysis we talk of a hierarchy of containers: Segment > Residue > Atom and then we have containers that can span different levels: AtomGroup is "just a bunch of Atoms" and Fragment is "a bunch of atoms connected by bonds".

lilyminium · 2019-09-09T14:28:24Z

I agree that users are unlikely to interact with a Topology.

Parsers vs Readers gets even more confusing when we use a single file for both topology and coordinate information.

One option is we could rename these (at least in all documentation) to TopologyParser and CoordinateReader so it's extra clear what is getting parsed/read.

This seems like a good solution. I think I thought it was important to include this distinction because a trajectory is usually just some kind of Reader object pointing to a frame.

In MDAnalysis we talk of a hierarchy of containers: Segment > Residue > Atom and then we have containers that can span different levels: AtomGroup is "just a bunch of Atoms" and Fragment is "a bunch of atoms connected by bonds".

@orbeckst Are fragments used anywhere but in methods for periodic boundary conditions?

orbeckst · 2019-09-09T16:23:41Z

On Sep 9, 2019, at 7:28 AM, Lily Wang ***@***.***> wrote: @orbeckst <https://github.com/orbeckst> Are fragments used anywhere but in methods for periodic boundary conditions?

@jbarnoud used them extensively for various things, IIRC. You should also be able to group by fragments. But I don’t think they are part of the selection language. (That’s another area where harmonization or at least documentation would be good: How can I do X with (1) select_atoms(), (2) methods, (3) pandas-style slicing ag[ag.masses < 2].

jbarnoud · 2019-09-09T18:30:58Z

I indeed use fragments on a regular basis because segments are very ill-defined. The meaning of a fragment varies depending on the input format, so fragment may be the most reliable way of identifying a molecule.

jbarnoud · 2019-09-11T15:36:03Z

I realized I mistyped. I meant to say that the meaning of a segment varies from one format to the other.

lilyminium · 2019-09-28T15:46:44Z

@orbeckst @jbarnoud Thanks for summarising fragments and segments for me. There's a third concept in MDAnalysis: molecules. Am I correct that fragments and molecules are synonymous in MD theory but independent in Python implementation: segments are defined by segid in the topology, molecules are defined by molnum in the topology, and fragments are defined by connectivity?

I'm unfamiliar with MD segments. In theory, are they subsets of molecules, or can segments overlap different molecules? Is it the same case in MDAnalysis' implementation?

Do the relationships in this diagram make sense? Each monospace greyscale shape is a real class in MDAnalysis, while the orange Helvetica fragment and molecule are just convenient concepts. In this diagram, a molecule is not a collection of segments, but rather a collection of residues.

These are the methods that use fragments:

AtomGroup.fragments
AtomGroup.groupby(’fragments’) -> This results in TypeError: Can't perform '__eq__' between objects: 'AtomGroup' and 'tuple' so not really
AtomGroup.accumulate(compound=’fragments’)
AtomGroup.center(compound=’fragments’)
AtomGroup.center_of_geometry(compound='fragments')
AtomGroup.centroid(compound='fragments')
“same fragment as xxx”

Methods that use molecules:

AtomGroup.split(’molecule’) --> singular
AtomGroup.accumulate(compound=’molecules’)
AtomGroup.center(compound=’molecules’)
AtomGroup.center_of_geometry(compound='molecules')
AtomGroup.centroid(compound='molecules')

jbarnoud · 2019-09-28T17:33:29Z

In principle you are right and segment, fragment, and molecule should be synonymous. In practice, however, they are not.

A fragment is, indeed, defined by the connectivity. A molecule is, for now at least, a Gromacs only concept: it describes what is defined as a molecule in a Gromacs topology. A Gromacs molecule is, in most cases, a connected ensemble of atoms but it does not have to be. The meaning of a segment is different from one file format to another.

Here is an example where all of these concepts are the same: take a multimeric protein where each monomer is attached to a ligand; you read the topology from a Gromacs TPR file. Here, each monomer and each ligand is a fragment, a molecule, and a segment.

The segments match the definition of the molecules because it is how we read them from TPR files. If we read the segments from a PDB file, then the segments correspond to the chains so it is very likely that each segment will constitute of a monomer and its ligand.

It happens that a multimeric protein is defined in a Gromacs topology as a single molecule. While it is not the default, but a user can choose to do so if they need to create specific interactions between the monomers or to make fixing periodic artefact a little bit easier.

Finally, the fragment will be clear cuts in most cases. However, it can happen that some atom will be defined as virtual particles. In such a case, these atoms will not be connected to the rest of the molecule and will appear as their own fragments. This last case can most likely count as a but, though: MDAnalysis/mdanalysis#1954.

So, yes, in principle, your schema is correct. But...

jbarnoud · 2019-09-28T17:34:32Z

Also, you can do atoms.groupby('molnums').

lilyminium · 2019-09-28T22:46:22Z

Thank you, @jbarnoud . Just to clarify: the difference between

take a multimeric protein where each monomer is attached to a ligand; you read the topology from a Gromacs TPR file. Here, each monomer and each ligand is a fragment, a molecule, and a segment.

and

It happens that a multimeric protein is defined in a Gromacs topology as a single molecule.

is what is included in the moleculetype definition?

richardjgowers · 2019-09-29T00:06:08Z

Fragments is defined by mda based on bonds, so it’s something we calculate as a derived quantity. Molecules is something read from gromacs, so is more of a primary source where we’re blindly trusting the topology file. I think.....

…

On Sep 28, 2019 at 23:46, <Lily Wang ***@***.***)> wrote: Thank you, @jbarnoud (https://github.com/jbarnoud) . Just to clarify: the difference between > > > take a multimeric protein where each monomer is attached to a ligand; you read the topology from a Gromacs TPR file. Here, each monomer and each ligand is a fragment, a molecule, and a segment. > > and > > > It happens that a multimeric protein is defined in a Gromacs topology as a single molecule. > > is what is included in the moleculetype definition? — You are receiving this because you commented. Reply to this email directly, view it on GitHub (#2?email_source=notifications&email_token=ACGSGBYPM4ADIL3OKB5WQIDQL7ND7A5CNFSM4ISSWN42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD73DY7Q#issuecomment-536231038), or mute the thread (https://github.com/notifications/unsubscribe-auth/ACGSGB4REXHJYNT7O5H3R6DQL7ND7ANCNFSM4ISSWN4Q).

jbarnoud · 2019-09-29T07:54:56Z

@lilyminium Yes, "molecule" is based on the "moleculetype" section of a Gromacs topology.

@richardjgowers I'd say so, yes.

lilyminium · 2019-12-28T11:54:23Z

Closed by #14 and #30.

lilyminium self-assigned this Aug 31, 2019

lilyminium added data structures universe Relates to universes and removed data structures labels Aug 31, 2019

lilyminium added this to the Page on core data structures milestone Aug 31, 2019

lilyminium mentioned this issue Oct 23, 2019

Data structures #14

Merged

6 tasks

lilyminium closed this as completed Dec 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Universe #2

Using Universe #2

lilyminium commented Aug 31, 2019 •

edited

Loading

richardjgowers commented Sep 4, 2019

orbeckst commented Sep 5, 2019

orbeckst commented Sep 5, 2019

lilyminium commented Sep 9, 2019

orbeckst commented Sep 9, 2019 via email

jbarnoud commented Sep 9, 2019

jbarnoud commented Sep 11, 2019

lilyminium commented Sep 28, 2019

jbarnoud commented Sep 28, 2019

jbarnoud commented Sep 28, 2019

lilyminium commented Sep 28, 2019

richardjgowers commented Sep 29, 2019 via email

jbarnoud commented Sep 29, 2019

lilyminium commented Dec 28, 2019

Using Universe #2

Using Universe #2

Comments

lilyminium commented Aug 31, 2019 • edited Loading

richardjgowers commented Sep 4, 2019

orbeckst commented Sep 5, 2019

orbeckst commented Sep 5, 2019

lilyminium commented Sep 9, 2019

orbeckst commented Sep 9, 2019 via email

jbarnoud commented Sep 9, 2019

jbarnoud commented Sep 11, 2019

lilyminium commented Sep 28, 2019

jbarnoud commented Sep 28, 2019

jbarnoud commented Sep 28, 2019

lilyminium commented Sep 28, 2019

richardjgowers commented Sep 29, 2019 via email

jbarnoud commented Sep 29, 2019

lilyminium commented Dec 28, 2019

lilyminium commented Aug 31, 2019 •

edited

Loading