Handling ModelCIF (modelarchive.modelcif)

The modelcif module assists the ModelArchive team to prepare deposition data before storing in the database. For the most part this involves converting files from PDB legacy format into ModelCIF, as well as refining submitted mmCIF/ ModelCIF files.

Functionality can be broadly divided into accessing and editing ModelCIF files.

One note on performance: by the nature of the task addressed here, code clarity is preferred over raw efficiency. Scripts that translate user data into ModelCIF for a deposition, are run once and offline, so it does not matter if execution takes one minute or five. In general, preparing data usually takes far longer than running the code itself.

Keep this in mind when implementing ModelCIF support in your own tool, you may prefer to draw inspiration from this module rather than use it directly… or use the python-modelcif package straight away.

Accessing ModelCIF (modelarchive.modelcif.access)

Functionality to access data in a ModelCIF file

modelarchive.modelcif.access.get_table(block, category, items=None)[source]

Get a gemmi.cif.Table from a gemmi.cif.Block for a category.

It is much more convenient to work with gemmi.cif.Table objects instead of Gemmi’s loops and pairs directly. Imagine a ModelCIF file in which a certain category is represented as loop, while another ModelCIF file stores the same category as list of pairs. Both representations may be valid ModelCIF files and would require two separate handlers implemented for essentially the same data.

By using gemmi.cif.Table as a wrapper, loops and pairs can be treated uniformly, allowing you to handle both cases through a single code base.

Gemmi provides two functions to retrieve tables, find_mmcif_category() and find(). One of them just needs a category name and the other requires a category name and a list of columns to fetch. So, different behaviour again and… lets just accept: get_table() hides these details away and happily returns a table, whether you provide a list of items or not. If a list of items is given, the resulting table will contain only those columns. Plus, in case the category can’t be found in block, None is returned, which feels more pythonic than getting an empty table back of length 0.

Example

>>> from gemmi import cif
>>> from modelarchive.modelcif import access
>>> # get sample CIF data
>>> cif_data = '''data_test
... _ma_qa_metric.id 1
... _ma_qa_metric.description test_score
... loop_
... _ma_qa_metric_local.ordinal_id
... _ma_qa_metric_local.metric_value
... _ma_qa_metric_local.metric_id
... 1 1.0 1
... 2 1.5 1
... '''
>>> block = cif.read_string(cif_data).sole_block()
>>> table = access.get_table(block, "_ma_qa_metric")
>>> len(table)
1
>>> table[-1]["description"]
'test_score'
>>> table = access.get_table(
...             block,
...             "_ma_qa_metric_local",
...             items=["metric_id", "metric_value"],
...         )
>>> # table should have 2 columns and 2 rows
>>> table
<gemmi.cif.Table 2 x 2>
>>> # columns are sorted as requested, not as stored
>>> table.tags[0]
'_ma_qa_metric_local.metric_id'
>>> table.tags[1]
'_ma_qa_metric_local.metric_value'
Parameters:
  • block (gemmi.cif.Block) – CIF data block holding the categories of the CIF document.

  • category (str) – Category to fetch from block, single category only, no Joins. Gemmi requires category names to end with ., so this function adds it if missing.

  • items (list[str]) – List of items to fetch as columns. Order of columns (items) follows the provided list. If None, the whole category with all its items as columns will be fetched. In case of None, items are fetched in the same order as they are found in the CIF document.

Returns:

The requested table if category can be found, otherwise None.

Return type:

gemmi.cif.Table | None

Editing ModelCIF (modelarchive.modelcif.edit)

Functionality to extend and modify ModelCIF files.

exception modelarchive.modelcif.edit.MoveIdxToFarError(category, idx)[source]

Bases: RuntimeError

Exception if repositioning exceeds the size of document-category-list.

Primarily used by move_category(), on the attempt to move a category to a position that does not exist within the corresponding gemmi.cif.Block. For example, if the gemmi.cif.Block object contains 10 categories, trying to move a category to position 15 will fail and should raise this exception.

Parameters:
  • category (str) – Name of the category that could not be moved.

  • idx (int) – Target position to which the category was to be moved.

exception modelarchive.modelcif.edit.NotFoundCategoryError(category)[source]

Bases: RuntimeError

Exception if a category can not be found.

This exception should be raised when a function expects a specific category to exist in the corresponding gemmi.cif.Block, but the category cannot be retrieved.

Parameters:

category (str) – Name of the category that could not be found.

modelarchive.modelcif.edit.add_category(block, category, item_data, index=None, mod_cat_itms=None, raw=False)[source]

Introduce a new category to a gemmi.cif.Block and populate it.

Add category to block using data from item_data. item_data is a dictionary with the CIF item names as keys and values as values to the items. On single values, named-pairs will be created, on lists with more than one value, a loop will be created. index can be used to place the category at a certain position. Use an integer for a specific place in the category list or a string of form [after|before]:<CATEGORY> for relative positioning.

Example

>>> from gemmi import cif
>>> from modelarchive.modelcif import edit
>>> # start with an empty CIF document
>>> cif_data = '''data_test
... '''
>>> block = cif.read_string(cif_data).sole_block()
>>> # lets add entities
>>> _ = edit.add_category(
...         block,
...         "_entity",
...         {
...             "id": [1, 2, 3],
...             "type": ["polymer", "non-polymer", "water"],
...         }
...     )
>>> print(block.as_string())
data_test
loop_
_entity.id
_entity.type
1 polymer
2 non-polymer
3 water

>>> # lets add an "_entry" ID before the entities
>>> _ = edit.add_category(
...         block, "_entry", {"id": "1FOO"}, index="before:_entity"
...     )
>>> print(block.as_string())
data_test
_entry.id 1FOO

loop_
_entity.id
_entity.type
1 polymer
2 non-polymer
3 water
Parameters:
  • block (gemmi.cif.Block) – CIF data block holding the categories of the CIF document.

  • category (str) – Name of the new category to be created.

  • item_data (dict[str, list[Any]|Any]) – Attributes and values to be added to the new category. Dictionary with item names as keys. Values are either a list of values or a single value. If a single value is provided (or a list containing only one element), a named key-value pair is created instead of a loop.

  • index (int|str) – Placement of the new category within block. This can be an integer for exact positioning, or a string of form [after|before]:<CATEGORY> for relative positioning. In relative positioning, <CATEGORY> specifies the name of the category before or after which cat will be placed.

  • mod_cat_itms (dict[str, set]) – A record of what has been modified. Dictionary of category assigned a set of items changed. Items which already have the value of the update, are not recorded. This is meant for the revision history, most likely you can ignore it.

  • raw (bool) – Force to not quote strings containing white-spaces.

Returns:

A record of what has been modified. To be used with a revision history, most likely you can ignore it.

Return type:

dict[str, set]

Raises:

MoveIdxToFarError – If the target position is outside block. For example, if block contains 10 categories, trying to create a category at position 15 will raise this error.

modelarchive.modelcif.edit.add_column(block, category, item, callback, pos=-1, raw=False)[source]

Extend a category with a new item and populate it using a callback.

Thinking of ModelCIF categories as tables, this function adds a new column (item) to a table that already exists in block. A callback function, to be provided, is executed with each row to compute the value for the new column. This avoids having a static list to fetch the values from.

make_res_per_chain_counter() is an example of a stateful implementation of a working callback.

The callback has to be of form function(row) and return the value to be set for the item in the given row.

Example

>>> # Add "ndb_seq_num" to "_pdbx_nonpoly_scheme" including values
>>> # Reminder: "ndb_seq_num" -> column, "_pdbx_nonpoly_scheme" -> table
>>> from gemmi import cif
>>> from modelarchive.modelcif import edit
>>> cif_data = '''data_test
... loop_
... _pdbx_nonpoly_scheme.asym_id
... _pdbx_nonpoly_scheme.entity_id
... _pdbx_nonpoly_scheme.mon_id
... _pdbx_nonpoly_scheme.pdb_seq_num
... C 1 ATP 1
... D 2 HEM 1
... E 3 HOH 1
... E 3 HOH 2
... '''
>>> block = cif.read_string(cif_data).sole_block()
>>> edit.add_column(
...     block,
...     "_pdbx_nonpoly_scheme",
...     "ndb_seq_num",
...     edit.make_res_per_chain_counter("asym_id"),
...     pos=-1,
... )
>>> print(block.as_string())
data_test
loop_
_pdbx_nonpoly_scheme.asym_id
_pdbx_nonpoly_scheme.entity_id
_pdbx_nonpoly_scheme.mon_id
_pdbx_nonpoly_scheme.pdb_seq_num
_pdbx_nonpoly_scheme.ndb_seq_num
C 1 ATP 1 1
D 2 HEM 1 1
E 3 HOH 1 1
E 3 HOH 2 2

>>> # "ndb_seq_num" was appended as last column according to pos=-1
Parameters:
  • block (gemmi.cif.Block) – block holding the categories of the CIF document.

  • category (str) – The CIF category (table) to add the item to.

  • item (str) – The item (column) to be added.

  • callback (Callable[[gemmi.cif.Table.Row], int]) – Function to be executed to compute values for each row of the new column.

  • pos (int) – Position to insert the column at. Default is at the end (-1). Inserting at the beginning requires pos=1.

  • raw (bool) – Force to not quote strings containing white-spaces.

Returns:

None

Raises:

NotFoundCategoryError – If category can not be found in block.

modelarchive.modelcif.edit.make_res_per_chain_counter(asym_id_item)[source]

Returns a stateful callback function counting residues per chain.

make_res_per_chain_counter() returns a function that can be used as callback in add_column().

The returned callback assigns consecutive residue numbers within each chain of a table, starting at 1. When the chain identifier changes between two rows while iterating over the table, the counter is reset to 1.

Example

>>> # Add item "ndb_seq_num" to category "_pdbx_nonpoly_scheme"
>>> # Reminder: "ndb_seq_num" -> column, "_pdbx_nonpoly_scheme" -> table
>>> from gemmi import cif
>>> from modelarchive.modelcif import edit
>>> cif_data = '''data_test
... loop_
... _pdbx_nonpoly_scheme.asym_id
... _pdbx_nonpoly_scheme.auth_seq_num
... _pdbx_nonpoly_scheme.entity_id
... _pdbx_nonpoly_scheme.mon_id
... _pdbx_nonpoly_scheme.pdb_seq_num
... C 1 3  ATP 1
... D 1 4  HEM 1
... E 1 5  HOH 1
... E 2 5  HOH 2
... '''
>>> block = cif.read_string(cif_data).sole_block()
>>> # Using make_res_per_chain_counter() in add_column() will add a
>>> # column to the loop_ and populate it with values:
>>> edit.add_column(
...     block,
...     "_pdbx_nonpoly_scheme",
...     "ndb_seq_num",
...     edit.make_res_per_chain_counter("asym_id"), # CALLBACK
...     pos=5,
... )
>>> print(block.as_string())
data_test
loop_
_pdbx_nonpoly_scheme.asym_id
_pdbx_nonpoly_scheme.auth_seq_num
_pdbx_nonpoly_scheme.entity_id
_pdbx_nonpoly_scheme.mon_id
_pdbx_nonpoly_scheme.ndb_seq_num
_pdbx_nonpoly_scheme.pdb_seq_num
C 1 3 ATP 1 1
D 1 4 HEM 1 1
E 1 5 HOH 1 1
E 2 5 HOH 2 2

>>> # "ndb_seq_num" is inserted as fifth column. The ATP in chain C
>>> # ("asym_id") gets "ndb_seq_num" 1 and the HEM in chain D also gets
>>> # "ndb_seq_num" 1. But the HOH, both live in chain E together, get
>>> # "ndb_seq_num" 1 and 2. So for each chain, counting starts at 1
>>> # and per compound in a chain, the counter is increased by 1.
Parameters:

asym_id_item (str) – Item name hosting the chain name.

Returns:

Callback function usable as callback in add_column().

Return type:

Callable[[gemmi.cif.Table.Row], int]

Note

This function may be outsourced to a supporting module, if edit gets to big.

modelarchive.modelcif.edit.move_category(block, cat, idx)[source]

Move a category to a new position in a gemmi.cif.Block.

By design, ModelCIF files are not intended to be read or edited manually. Instead, dedicated applications should handle the format, providing functionality to view and modify the data. However, at ModelArchive we occasionally need to open ModelCIF files in an editor to inspect specific details. In such cases, it is helpful to have related categories grouped together, reducing the need to jump back and forth between different categories. This asks for a function to reposition categories within a ModelCIF file.

move_category() takes category cat and moves it to position idx in the CIF block block. The parameter idx is somewhat special: it can be just an integer index, specifying the exact position to move cat to. That comes in handy placing categories at the beginning (idx=0) or at the end (idx=-1) of block. However, specifying an absolute index is often less useful in practice, as categories are typically organised relative to related categories. For this purpose, idx provides a special syntax: [after|before]:<CATEGORY>. For example, if you want to put category _ma_qa_metric in front of category _ma_qa_metric_local, you can use idx="before:_ma_qa_metric_local" for cat=_ma_qa_metric

Example

>>> from gemmi import cif
>>> from modelarchive.modelcif import edit
>>> # get sample CIF data
>>> cif_data = '''data_test
... _ma_qa_metric.id 1
... _ma_qa_metric.description test_score
... loop_
... _ma_qa_metric_local.ordinal_id
... _ma_qa_metric_local.metric_value
... _ma_qa_metric_local.metric_id
... 1 1.0 1
... 2 1.5 1
... '''
>>> block = cif.read_string(cif_data).sole_block()
>>> # move _ma_qa_metric_local to BEFORE _ma_qa_metric
>>> edit.move_category(
...     block,
...     "_ma_qa_metric_local",
...     "before:_ma_qa_metric",
... )
>>> print(block.as_string())
data_test
loop_
_ma_qa_metric_local.ordinal_id
_ma_qa_metric_local.metric_value
_ma_qa_metric_local.metric_id
1 1.0 1
2 1.5 1

_ma_qa_metric.id 1
_ma_qa_metric.description test_score

>>> # move _ma_qa_metric to the front
>>> edit.move_category(block, "_ma_qa_metric", 0)
>>> print(block.as_string())
data_test
_ma_qa_metric.id 1
_ma_qa_metric.description test_score

loop_
_ma_qa_metric_local.ordinal_id
_ma_qa_metric_local.metric_value
_ma_qa_metric_local.metric_id
1 1.0 1
2 1.5 1
Parameters:
  • block (gemmi.cif.Block) – CIF block to operate on.

  • cat (str) – Name of the CIF category to be moved.

  • idx (int|str) – Position to move cat to. This can be an integer for exact positioning, or a string of form [after|before]:<CATEGORY> for relative positioning. In relative positioning, <CATEGORY> specifies the name of the category before or after which cat will be placed.

Returns:

None

Raises:
  • NotFoundCategoryError – If cat can not be found in block.

  • MoveIdxToFarError – If the target position is outside block. For example, if block contains 10 categories, trying to move a category to position 15 will raise this error.