Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Universe #2

Closed
6 of 8 tasks
lilyminium opened this issue Aug 31, 2019 · 14 comments
Closed
6 of 8 tasks

Using Universe #2

lilyminium opened this issue Aug 31, 2019 · 14 comments
Assignees
Labels
universe Relates to universes

Comments

@lilyminium
Copy link
Member

lilyminium commented Aug 31, 2019

Story: As a molecular dynamics scientist who is familiar with packages such as mdtraj and cpptraj, I want to quickly get information such as atom properties and coordinates, so that I can analyse my data.

Acceptance criteria:

  • I understand the difference between and hierarchical relationships of a Topology, Trajectory, AtomGroup, and Universe.
  • I understand that there is a difference between a parser and a reader not important (see below)
  • I can find a page that goes details supported formats, and the information MDAnalysis reads and guesses from each
  • I understand important arguments for creating a Universe, e.g. guess_bonds, all_coordinates, in_memory.
  • I know how to use auxiliary readers, which are supported, and how to access auxiliary properties.
  • I understand how positions work with a trajectory and how to access specific frames.
  • I understand certain limitations, e.g. Universe cannot support varying numbers of atoms.
  • I have a link to the AtomGroup and Topology sections.
@lilyminium lilyminium self-assigned this Aug 31, 2019
@lilyminium lilyminium added data structures universe Relates to universes and removed data structures labels Aug 31, 2019
@richardjgowers
Copy link
Member

Parser vs Reader is an annoyance, you can "parse coordinates" and "read a topology", the verb doesn't disambiguate. One option is we could rename these (at least in all documentation) to TopologyParser and CoordinateReader so it's extra clear what is getting parsed/read.

The Topology object itself isn't well documented, but this is partly because it's not currently a public part of the API. I don't think there's anything that a user ever has to do which directly touches the object, all things are done via AtomGroup. (This is because of historical reasons, originally things were actually stored in AtomGroups (historically, technically a list of Atom objects), rather than the access via AtomGroup approach now).

So I think you might find that the Topology object doesn't need documenting for users...

see also:

MDAnalysis/mdanalysis#2199

@orbeckst
Copy link
Member

orbeckst commented Sep 5, 2019

Parsers vs Readers gets even more confusing when we use a single file for both topology and coordinate information.

On the other hand, anyone doing MD knows about "topology files" so perhaps the difficulty is more making clear what out "static" data are (atom identities, bonds, charges, ...) and our "dynamic" ones (positions, velocities, forces, box information, ... and auxiliaries for the advanced crowd).

I agree to drop the Topology object for right now. More importantly is how to make use of what the topology enables, namely bonds(), angles() etc – this is woefully underdocumented.

@orbeckst
Copy link
Member

orbeckst commented Sep 5, 2019

hierarchical relationships

In MDAnalysis we talk of a hierarchy of containers: Segment > Residue > Atom and then we have containers that can span different levels: AtomGroup is "just a bunch of Atoms" and Fragment is "a bunch of atoms connected by bonds".

@lilyminium
Copy link
Member Author

I agree that users are unlikely to interact with a Topology.

Parsers vs Readers gets even more confusing when we use a single file for both topology and coordinate information.

One option is we could rename these (at least in all documentation) to TopologyParser and CoordinateReader so it's extra clear what is getting parsed/read.

This seems like a good solution. I think I thought it was important to include this distinction because a trajectory is usually just some kind of Reader object pointing to a frame.

In MDAnalysis we talk of a hierarchy of containers: Segment > Residue > Atom and then we have containers that can span different levels: AtomGroup is "just a bunch of Atoms" and Fragment is "a bunch of atoms connected by bonds".

@orbeckst Are fragments used anywhere but in methods for periodic boundary conditions?

@orbeckst
Copy link
Member

orbeckst commented Sep 9, 2019 via email

@jbarnoud
Copy link

jbarnoud commented Sep 9, 2019

I indeed use fragments on a regular basis because segments are very ill-defined. The meaning of a fragment varies depending on the input format, so fragment may be the most reliable way of identifying a molecule.

@jbarnoud
Copy link

I realized I mistyped. I meant to say that the meaning of a segment varies from one format to the other.

@lilyminium
Copy link
Member Author

@orbeckst @jbarnoud Thanks for summarising fragments and segments for me. There's a third concept in MDAnalysis: molecules. Am I correct that fragments and molecules are synonymous in MD theory but independent in Python implementation: segments are defined by segid in the topology, molecules are defined by molnum in the topology, and fragments are defined by connectivity?

I'm unfamiliar with MD segments. In theory, are they subsets of molecules, or can segments overlap different molecules? Is it the same case in MDAnalysis' implementation?

Do the relationships in this diagram make sense? Each monospace greyscale shape is a real class in MDAnalysis, while the orange Helvetica fragment and molecule are just convenient concepts. In this diagram, a molecule is not a collection of segments, but rather a collection of residues.

classes

These are the methods that use fragments:

  • AtomGroup.fragments
  • AtomGroup.groupby(’fragments’) -> This results in TypeError: Can't perform '__eq__' between objects: 'AtomGroup' and 'tuple' so not really
  • AtomGroup.accumulate(compound=’fragments’)
  • AtomGroup.center(compound=’fragments’)
  • AtomGroup.center_of_geometry(compound='fragments')
  • AtomGroup.centroid(compound='fragments')
  • “same fragment as xxx”

Methods that use molecules:

  • AtomGroup.split(’molecule’) --> singular
  • AtomGroup.accumulate(compound=’molecules’)
  • AtomGroup.center(compound=’molecules’)
  • AtomGroup.center_of_geometry(compound='molecules')
  • AtomGroup.centroid(compound='molecules')

@jbarnoud
Copy link

In principle you are right and segment, fragment, and molecule should be synonymous. In practice, however, they are not.

A fragment is, indeed, defined by the connectivity. A molecule is, for now at least, a Gromacs only concept: it describes what is defined as a molecule in a Gromacs topology. A Gromacs molecule is, in most cases, a connected ensemble of atoms but it does not have to be. The meaning of a segment is different from one file format to another.

Here is an example where all of these concepts are the same: take a multimeric protein where each monomer is attached to a ligand; you read the topology from a Gromacs TPR file. Here, each monomer and each ligand is a fragment, a molecule, and a segment.

The segments match the definition of the molecules because it is how we read them from TPR files. If we read the segments from a PDB file, then the segments correspond to the chains so it is very likely that each segment will constitute of a monomer and its ligand.

It happens that a multimeric protein is defined in a Gromacs topology as a single molecule. While it is not the default, but a user can choose to do so if they need to create specific interactions between the monomers or to make fixing periodic artefact a little bit easier.

Finally, the fragment will be clear cuts in most cases. However, it can happen that some atom will be defined as virtual particles. In such a case, these atoms will not be connected to the rest of the molecule and will appear as their own fragments. This last case can most likely count as a but, though: MDAnalysis/mdanalysis#1954.

So, yes, in principle, your schema is correct. But...

@jbarnoud
Copy link

Also, you can do atoms.groupby('molnums').

@lilyminium
Copy link
Member Author

Thank you, @jbarnoud . Just to clarify: the difference between

take a multimeric protein where each monomer is attached to a ligand; you read the topology from a Gromacs TPR file. Here, each monomer and each ligand is a fragment, a molecule, and a segment.

and

It happens that a multimeric protein is defined in a Gromacs topology as a single molecule.

is what is included in the moleculetype definition?

@richardjgowers
Copy link
Member

richardjgowers commented Sep 29, 2019 via email

@jbarnoud
Copy link

@lilyminium Yes, "molecule" is based on the "moleculetype" section of a Gromacs topology.

@richardjgowers I'd say so, yes.

@lilyminium lilyminium mentioned this issue Oct 23, 2019
6 tasks
@lilyminium
Copy link
Member Author

Closed by #14 and #30.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
universe Relates to universes
Projects
None yet
Development

No branches or pull requests

4 participants