Handling ModelCIF (modelarchive.modelcif)
The modelcif module assists the ModelArchive team to prepare deposition data before storing in the database. For the most part this involves converting files from PDB legacy format into ModelCIF, as well as refining submitted mmCIF/ ModelCIF files.
Functionality can be broadly divided into accessing and editing ModelCIF files.
One note on performance: by the nature of the task addressed here, code clarity is preferred over raw efficiency. Scripts that translate user data into ModelCIF for a deposition, are run once and offline, so it does not matter if execution takes one minute or five. In general, preparing data usually takes far longer than running the code itself.
Keep this in mind when implementing ModelCIF support in your own tool, you may prefer to draw inspiration from this module rather than use it directly… or use the python-modelcif package straight away.
Accessing ModelCIF (modelarchive.modelcif.access)
Functionality to access data in a ModelCIF file
- modelarchive.modelcif.access.get_table(block, category, items=None)[source]
Get a
gemmi.cif.Tablefrom agemmi.cif.Blockfor a category.It is much more convenient to work with
gemmi.cif.Tableobjects instead of Gemmi’s loops and pairs directly. Imagine a ModelCIF file in which a certain category is represented as loop, while another ModelCIF file stores the same category as list of pairs. Both representations may be valid ModelCIF files and would require two separate handlers implemented for essentially the same data.By using
gemmi.cif.Tableas a wrapper, loops and pairs can be treated uniformly, allowing you to handle both cases through a single code base.Gemmi provides two functions to retrieve tables,
find_mmcif_category()andfind(). One of them just needs a category name and the other requires a category name and a list of columns to fetch. So, different behaviour again and… lets just accept:get_table()hides these details away and happily returns a table, whether you provide a list of items or not. If a list of items is given, the resulting table will contain only those columns. Plus, in case the category can’t be found in block, an empty list is returned, which feels more pythonic than getting an empty table back of length 0. Retrieving an empty list also makes looping over a table easier.Examples
>>> from gemmi import cif >>> from modelarchive.modelcif import access >>> # get sample CIF data >>> cif_data = '''data_test ... _ma_qa_metric.id 1 ... _ma_qa_metric.description test_score ... loop_ ... _ma_qa_metric_local.ordinal_id ... _ma_qa_metric_local.metric_value ... _ma_qa_metric_local.metric_id ... 1 1.0 1 ... 2 1.5 1 ... ''' >>> block = cif.read_string(cif_data).sole_block() >>> table = access.get_table(block, "_ma_qa_metric") >>> len(table) 1 >>> table[-1]["description"] 'test_score' >>> table = access.get_table( ... block, ... "_ma_qa_metric_local", ... items=["metric_id", "metric_value"], ... ) >>> # table should have 2 columns and 2 rows >>> table <gemmi.cif.Table 2 x 2> >>> # columns are sorted as requested, not as stored >>> table.tags[0] '_ma_qa_metric_local.metric_id' >>> table.tags[1] '_ma_qa_metric_local.metric_value'
- Parameters:
block (
gemmi.cif.Block) – CIF data block holding the categories of the CIF document.category (str) – Category to fetch from
block, single category only, no Joins. Gemmi requires category names to end with., so this function adds it if missing.items (list[str]) – List of items to fetch as columns. Order of columns (items) follows the provided list. If
None, the whole category with all its items as columns will be fetched. In case ofNone, items are fetched in the same order as they are found in the CIF document.
- Returns:
The requested table if category can be found, otherwise empty list.
- Return type:
Editing ModelCIF (modelarchive.modelcif.edit)
Functionality to extend and modify ModelCIF files.
- exception modelarchive.modelcif.edit.MoveIdxToFarError(category, idx)[source]
Bases:
RuntimeErrorException if repositioning exceeds the size of document-category-list.
Primarily used by
move_category(), on the attempt to move a category to a position that does not exist within the correspondinggemmi.cif.Block. For example, if thegemmi.cif.Blockobject contains 10 categories, trying to move a category to position 15 will fail and should raise this exception.
- exception modelarchive.modelcif.edit.NotFoundCategoryError(category=None, msg=None)[source]
Bases:
NotFoundErrorException if a category can not be found.
This exception should be raised when a function expects a specific category to exist in the corresponding
gemmi.cif.Block, but the category cannot be retrieved.
- exception modelarchive.modelcif.edit.NotFoundError(subject, value, msg)[source]
Bases:
RuntimeErrorGeneral exception for ‘things’ that can not be found.
If
msgis omitted, generates a message “<SUBJECT> ‘<VALUE>’ does not exist”. Ifvalueis a list with more than one element, the message will be written in plural mode. Ifsubjectis a list or tuple, a second element will be used as plural of the subject.This exception should not be raised directly, it exists to define other “NotFound” exceptions inheriting from it.
- Parameters:
subject (str|list|tuple) – The ‘thing’ that can not be found, used in the generated message. If
list:ortuple, a second element is used as plural.value (str|list) – The name of what can not be found, used in the generated message. Provied a list of values to get a message fitting plural.
msg (str) – Optional alternative error message.
- exception modelarchive.modelcif.edit.NotFoundItemError(item=None, msg=None)[source]
Bases:
NotFoundErrorException if an item can not be found.
This exception should be raised when a function expects a specific item to exist in the corresponding CIF category, but the item cannot be retrieved.
- modelarchive.modelcif.edit.add_category(block, category, item_data, index=None, mod_cat_itms=None, raw=False)[source]
Introduce a new category to a
gemmi.cif.Blockand populate it.Add
categorytoblockusing data fromitem_data.item_datais a dictionary with the CIF item names as keys and values as values to the items. On single values, named-pairs will be created, on lists with more than one value, a loop will be created.indexcan be used to place the category at a certain position. Use an integer for a specific place in the category list or a string of form[after|before]:<CATEGORY>for relative positioning.Examples
>>> from gemmi import cif >>> from modelarchive.modelcif import edit >>> # start with an empty CIF document >>> cif_data = '''data_test ... ''' >>> block = cif.read_string(cif_data).sole_block() >>> # lets add entities >>> _ = edit.add_category( ... block, ... "_entity", ... { ... "id": [1, 2, 3], ... "type": ["polymer", "non-polymer", "water"], ... }, ... ) >>> print(block.as_string()) data_test loop_ _entity.id _entity.type 1 polymer 2 non-polymer 3 water >>> # lets add an "_entry" ID before the entities >>> _ = edit.add_category( ... block, "_entry", {"id": "1FOO"}, index="before:_entity" ... ) >>> print(block.as_string()) data_test _entry.id 1FOO loop_ _entity.id _entity.type 1 polymer 2 non-polymer 3 water
- Parameters:
block (
gemmi.cif.Block) – CIF data block holding the categories of the CIF document.category (str) – Name of the new category to be created.
item_data (dict[str, list[Any]|Any]) – Attributes and values to be added to the new category. Dictionary with item names as keys. Values are either a list of values or a single value. If a single value is provided (or a list containing only one element), a named key-value pair is created instead of a loop.
index (int|str) – Placement of the new category within
block. This can be an integer for exact positioning, or a string of form[after|before]:<CATEGORY>for relative positioning. In relative positioning,<CATEGORY>specifies the name of the category before or after whichcatwill be placed.mod_cat_itms (dict[str, set[str]] | None) – A record of what has been modified. Dictionary of category assigned a set of items changed. Items which already have the value of the update, are not recorded. This is meant for the revision history, most likely you can ignore it.
raw (bool, optional) – If True, do not force quoting strings containing whitespace.
- Returns:
A record of what has been modified. To be used with a revision history, most likely you can ignore it.
- Return type:
- Raises:
MoveIdxToFarError – If the target position is outside
block. For example, ifblockcontains 10 categories, trying to create a category at position 15 will raise this error.
- modelarchive.modelcif.edit.add_column(block, category, item, callback, pos=-1, raw=False)[source]
Extend a category with a new item and populate it using a callback.
Thinking of ModelCIF categories as tables, this function adds a new column (item) to a table that already exists in
block. Acallbackfunction, to be provided, is executed with each row to compute the value for the new column. This avoids having a static list to fetch the values from.make_res_per_chain_counter()is an example of a stateful implementation of a working callback.The callback has to be of form
function(row)and return the value to be set for theitemin the givenrow.Examples
>>> # Add "ndb_seq_num" to "_pdbx_nonpoly_scheme" including values >>> # Reminder: "ndb_seq_num" -> column, "_pdbx_nonpoly_scheme" -> table >>> from gemmi import cif >>> from modelarchive.modelcif import edit >>> cif_data = '''data_test ... loop_ ... _pdbx_nonpoly_scheme.asym_id ... _pdbx_nonpoly_scheme.entity_id ... _pdbx_nonpoly_scheme.mon_id ... _pdbx_nonpoly_scheme.pdb_seq_num ... C 1 ATP 1 ... D 2 HEM 1 ... E 3 HOH 1 ... E 3 HOH 2 ... ''' >>> block = cif.read_string(cif_data).sole_block() >>> edit.add_column( ... block, ... "_pdbx_nonpoly_scheme", ... "ndb_seq_num", ... edit.make_res_per_chain_counter("asym_id"), ... pos=-1, ... ) >>> print(block.as_string()) data_test loop_ _pdbx_nonpoly_scheme.asym_id _pdbx_nonpoly_scheme.entity_id _pdbx_nonpoly_scheme.mon_id _pdbx_nonpoly_scheme.pdb_seq_num _pdbx_nonpoly_scheme.ndb_seq_num C 1 ATP 1 1 D 2 HEM 1 1 E 3 HOH 1 1 E 3 HOH 2 2 >>> # "ndb_seq_num" was appended as last column according to pos=-1
- Parameters:
block (
gemmi.cif.Block) – block holding the categories of the CIF document.category (str) – The CIF category (table) to add the item to.
item (str) – The item (column) to be added.
callback (Callable[[
gemmi.cif.Table.Row], int]) – Function to be executed to compute values for each row of the new column.pos (int) – Position to insert the column at. Default is at the end (-1). Inserting at the beginning requires
pos=1.raw (bool) – Force to not quote strings containing white-spaces.
- Returns:
None
- Raises:
NotFoundCategoryError – If
categorycan not be found inblock.
- modelarchive.modelcif.edit.add_rows(block, category, row_dict, ordinal_item='ordinal', mod_cat_itms=None, raw=False)[source]
Add rows to a
categoryinblockusing an item-dictionary.Thinking of ModelCIF categories as tables, this function adds new rows (items) to a table (
category) inblock. Ifcategorydoes not yet exist, it will be created. If multiple rows are provided, the newcategorywill be created as loop, pairs otherwise. When adding row(s) to an existing pairs-category, the function will convert thecategoryinto a loop.Input data is provided via
row_dict. It must be adictoflist(for a single row, values may be single elements instead of lists). Item names are used as keys inrow_dict. Missing items that exist incategorywill be added as.in new rows. The order of items inrow_dictcan be arbitrary; this function will align them with the existing order incategory.ordinal_itemdescribes a unique numerical ID for each row. If provided, the function will automatically increment it for new rows. In ModelCIF, this column is often calledordinalthough some categories use different names.Examples
>>> from gemmi import cif >>> from modelarchive.modelcif import edit >>> # start with an empty CIF document >>> cif_data = '''data_test ... ''' >>> block = cif.read_string(cif_data).sole_block() >>> # Lets add an entity to create a category in block. ordinal_item >>> # is set to None on purpose to show how it works later. >>> _ = edit.add_rows( ... block, ... "_entity", ... {"id": 1, "details": "Protein", "type": "polymer"}, ... ordinal_item=None, ... ) >>> # see how the _entity category is created as couple of pairs >>> print(block.as_string()) data_test _entity.id 1 _entity.details Protein _entity.type polymer >>> # Add a second row (pairs will turn into a loop). This time, include >>> # ordinal_item to let the function take care of incrementing IDs. >>> _ = edit.add_rows( ... block, ... "_entity", ... {"details": ["H2O"], "type": ["water"]}, ... ordinal_item="id", ... ) >>> # Now _entity is a loop and _entity.id was incremented automatically >>> print(block.as_string()) data_test loop_ _entity.id _entity.details _entity.type 1 Protein polymer 2 H2O water >>> # As a last example, add multiple new rows at once but skip the >>> # 'details' column. >>> _ = edit.add_rows( ... block, ... "_entity", ... {"type": ["polymer", "polymer"]}, ... ordinal_item="id", ... ) >>> # Now there are two more polymer entities in the loop but since >>> # the 'details' information was missing, the function added '.' in >>> # those fields. >>> print(block.as_string()) data_test loop_ _entity.id _entity.details _entity.type 1 Protein polymer 2 H2O water 3 . polymer 4 . polymer
- Parameters:
block (
gemmi.cif.Block) – CIF data block holding the categories of the CIF document.category (str) – Name of the category to which row(s) will be added.
row_dict (dict[str, list | Any]) – Row data to be added to
category. Keys are item names of the category. Values must be lists when adding multiple rows. For a single row, values may be provided as scalars instead of lists. If an item is missing fromrow_dictbut exists in the category, ‘.’ will be assigned for that item in the new row(s).ordinal_item (str | None) – If the category includes an ordinal (in database terms a primary key), this identifies the item name of it. If
ordinal_itemis provided, the latest ordinal will be read from the category and automatically incremented for new rows. UseNonein case the category does not have an ordinal or if the ordinal should be set explicitly. The ordinal does not need to be included inrow_dict.mod_cat_itms (dict[str, set[str]] | None) – A record of what has been modified. Dictionary of category assigned a set of items changed. Items which already have the value of the update, are not recorded. This is meant for the revision history, most likely you can ignore it.
raw (bool, optional) – If True, do not force quoting strings containing whitespace.
- Returns:
A record of what has been modified. To be used with a revision history, most likely you can ignore it.
- Return type:
- Raises:
ValueError – In case item lists in
row_dictare not of equal length.
- modelarchive.modelcif.edit.make_res_per_chain_counter(asym_id_item)[source]
Returns a stateful callback function counting residues per chain.
make_res_per_chain_counter()returns a function that can be used ascallbackinadd_column().The returned callback assigns consecutive residue numbers within each chain of a table, starting at 1. When the chain identifier changes between two rows while iterating over the table, the counter is reset to 1.
Examples
>>> # Add item "ndb_seq_num" to category "_pdbx_nonpoly_scheme" >>> # Reminder: "ndb_seq_num" -> column, "_pdbx_nonpoly_scheme" -> table >>> from gemmi import cif >>> from modelarchive.modelcif import edit >>> cif_data = '''data_test ... loop_ ... _pdbx_nonpoly_scheme.asym_id ... _pdbx_nonpoly_scheme.auth_seq_num ... _pdbx_nonpoly_scheme.entity_id ... _pdbx_nonpoly_scheme.mon_id ... _pdbx_nonpoly_scheme.pdb_seq_num ... C 1 3 ATP 1 ... D 1 4 HEM 1 ... E 1 5 HOH 1 ... E 2 5 HOH 2 ... ''' >>> block = cif.read_string(cif_data).sole_block() >>> # Using make_res_per_chain_counter() in add_column() will add a >>> # column to the loop_ and populate it with values: >>> edit.add_column( ... block, ... "_pdbx_nonpoly_scheme", ... "ndb_seq_num", ... edit.make_res_per_chain_counter("asym_id"), # CALLBACK ... pos=5, ... ) >>> print(block.as_string()) data_test loop_ _pdbx_nonpoly_scheme.asym_id _pdbx_nonpoly_scheme.auth_seq_num _pdbx_nonpoly_scheme.entity_id _pdbx_nonpoly_scheme.mon_id _pdbx_nonpoly_scheme.ndb_seq_num _pdbx_nonpoly_scheme.pdb_seq_num C 1 3 ATP 1 1 D 1 4 HEM 1 1 E 1 5 HOH 1 1 E 2 5 HOH 2 2 >>> # "ndb_seq_num" is inserted as fifth column. The ATP in chain C >>> # ("asym_id") gets "ndb_seq_num" 1 and the HEM in chain D also gets >>> # "ndb_seq_num" 1. But the HOH, both live in chain E together, get >>> # "ndb_seq_num" 1 and 2. So for each chain, counting starts at 1 >>> # and per compound in a chain, the counter is increased by 1.
- Parameters:
asym_id_item (str) – Item name hosting the chain name.
- Returns:
Callback function usable as
callbackinadd_column().- Return type:
Callable[[
gemmi.cif.Table.Row], int]
Note
This function may be outsourced to a supporting module, if
editgets to big.
- modelarchive.modelcif.edit.move_category(block, cat, idx)[source]
Move a category to a new position in a
gemmi.cif.Block.By design, ModelCIF files are not intended to be read or edited manually. Instead, dedicated applications should handle the format, providing functionality to view and modify the data. However, at ModelArchive we occasionally need to open ModelCIF files in an editor to inspect specific details. In such cases, it is helpful to have related categories grouped together, reducing the need to jump back and forth between different categories. This asks for a function to reposition categories within a ModelCIF file.
move_category()takes categorycatand moves it to positionidxin the CIF blockblock. The parameteridxis somewhat special: it can be just an integer index, specifying the exact position to movecatto. That comes in handy placing categories at the beginning (idx=0) or at the end (idx=-1) ofblock. However, specifying an absolute index is often less useful in practice, as categories are typically organised relative to related categories. For this purpose,idxprovides a special syntax:[after|before]:<CATEGORY>. For example, if you want to put category_ma_qa_metricin front of category_ma_qa_metric_local, you can useidx="before:_ma_qa_metric_local"forcat=_ma_qa_metric…Examples
>>> from gemmi import cif >>> from modelarchive.modelcif import edit >>> # get sample CIF data >>> cif_data = '''data_test ... _ma_qa_metric.id 1 ... _ma_qa_metric.description test_score ... loop_ ... _ma_qa_metric_local.ordinal_id ... _ma_qa_metric_local.metric_value ... _ma_qa_metric_local.metric_id ... 1 1.0 1 ... 2 1.5 1 ... ''' >>> block = cif.read_string(cif_data).sole_block() >>> # move _ma_qa_metric_local to BEFORE _ma_qa_metric >>> edit.move_category( ... block, ... "_ma_qa_metric_local", ... "before:_ma_qa_metric", ... ) >>> print(block.as_string()) data_test loop_ _ma_qa_metric_local.ordinal_id _ma_qa_metric_local.metric_value _ma_qa_metric_local.metric_id 1 1.0 1 2 1.5 1 _ma_qa_metric.id 1 _ma_qa_metric.description test_score >>> # move _ma_qa_metric to the front >>> edit.move_category(block, "_ma_qa_metric", 0) >>> print(block.as_string()) data_test _ma_qa_metric.id 1 _ma_qa_metric.description test_score loop_ _ma_qa_metric_local.ordinal_id _ma_qa_metric_local.metric_value _ma_qa_metric_local.metric_id 1 1.0 1 2 1.5 1
- Parameters:
block (
gemmi.cif.Block) – CIF block to operate on.cat (str) – Name of the CIF category to be moved.
idx (int|str) – Position to move
catto. This can be an integer for exact positioning, or a string of form[after|before]:<CATEGORY>for relative positioning. In relative positioning,<CATEGORY>specifies the name of the category before or after whichcatwill be placed. If<CATEGORY>can not be found,catwill not be relocated.
- Returns:
None
- Raises:
NotFoundCategoryError – If
catcan not be found inblock.MoveIdxToFarError – If the target position is outside
block. For example, ifblockcontains 10 categories, trying to move a category to position 15 will raise this error.
- modelarchive.modelcif.edit.sort(table_or_block, item, category=None, key=None)[source]
Sort a
gemmi.cif.Tableorgemmi.cif.Blockin-place by the given item.This may be useful after editing a table, to sort it by a selected column (e.g. the ordinal). Numerical values are sorted numerically, all others lexicographically.
keycan take a function to extract a comparison key from each row. This is helpful for cases like_citation.id, where special values (e.g.id=primary) might need to be placed first.Works on an already loaded
gemmi.cif.Table, or on agemmi.cif.Block(requirescategory) to sort many categories one after another in less code.Examples
>>> from gemmi import cif >>> from modelarchive.modelcif import access, edit >>> # start with an empty CIF document >>> CIF_DATA = '''data_test ... loop_ ... _citation.id ... _citation.journal_full ... _citation.title ... _citation.year ... _citation.journal_volume ... 3 "The Lord of the Rings" "Return of the King" 1955 3 ... 1 "The Lord of the Rings" "The Fellowship of the Ring" 1954 2 ... 2 "The Lord of the Rings" "The Two Towers" 1954 1 ... primary . "The Hobbit or There and Back Again" 1937 . ... ''' >>> block = cif.read_string(CIF_DATA).sole_block() >>> table = access.get_table(block, "_citation") >>> # first sort without a key function >>> edit.sort(table, "id") >>> # This sorts the LOTR books properly, but the 'primary' book is at >>> # the bottom >>> print(block.as_string()) data_test loop_ _citation.id _citation.journal_full _citation.title _citation.year _citation.journal_volume 1 "The Lord of the Rings" "The Fellowship of the Ring" 1954 2 2 "The Lord of the Rings" "The Two Towers" 1954 1 3 "The Lord of the Rings" "Return of the King" 1955 3 primary . "The Hobbit or There and Back Again" 1937 . >>> # sort again (this time by block), with a lambda that puts >>> # 'primary' first >>> edit.sort( ... block, ... "id", ... category="_citation", ... key=lambda row: ( ... (0, "") if row["id"] == "primary" else (1, row["id"]) ... ), ... ) >>> print(block.as_string()) data_test loop_ _citation.id _citation.journal_full _citation.title _citation.year _citation.journal_volume primary . "The Hobbit or There and Back Again" 1937 . 1 "The Lord of the Rings" "The Fellowship of the Ring" 1954 2 2 "The Lord of the Rings" "The Two Towers" 1954 1 3 "The Lord of the Rings" "Return of the King" 1955 3
- Parameters:
table_or_block (
gemmi.cif.Table|gemmi.cif.Block) – Object to be sorted. Ongemmi.cif.Block, the corresponding table will be loaded usingcategory.item (str) – Name of the column (item) in the table to sort by.
category (str, optional) – Name of the category when sorting a
gemmi.cif.Block.key (callable, optional) – Function taking a row and returning a sortable value. Defaults to lexicographic
row[item]with a fix for numerical sorting.
- Returns:
None
- Raises:
ValueError – If
table_or_blockis agemmi.cif.Blockobject but nocategorywas provided.
Fixing AlphaFold 3 ModelCIF files (modelarchive.modelcif.fix_af3)
ModelCIF files generated by AlphaFold 3 deviate from the official ModelCIF definition dictionary in specific cases. In particular, for homomeric assemblies, each molecular entity copy is written as a separate entity in the CIF document, instead of defining a single entity referenced multiple times. This module provides functionality to correct the deviations.
- exception modelarchive.modelcif.fix_af3.NotIdentifiedContextRecordError(category, item=None, context=None)[source]
Bases:
NotIdentifiedRecordErrorException if a record for a specific context can not be identified.
- exception modelarchive.modelcif.fix_af3.NotIdentifiedDuplicatedRecordError(category, record_id)[source]
Bases:
NotIdentifiedRecordErrorException if a duplicated record is found in a table.
- exception modelarchive.modelcif.fix_af3.NotIdentifiedRecordError(msg)[source]
Bases:
RuntimeErrorGeneral exception for records that can not be identified in a table.
This exception should not be raised directly, it exists to define other “NotIdentified” exceptions inheriting from it.
- Parameters:
msg (str) – Exception message.
- exception modelarchive.modelcif.fix_af3.NotIdentifiedSingleRecordError(category, item=None, value=None)[source]
Bases:
NotIdentifiedRecordErrorException if a specific record can not be identified in a table.
- modelarchive.modelcif.fix_af3.fix_citation(block)[source]
Normalise the AlphaFold 3 citation in a ModelCIF
block.Ensures that the AlphaFold 3 publication (PMID 38718835) is not marked as the “primary” citation, assigns a numeric citation ID instead. Fixes an incomplete AlphaFold 3 citation. Replaces the author list with the full curated list of names and updates its citation ID. Reorders citations so that the primary entry appears first and links the citation to the corresponding software record.
This adjustment is not required for valid ModelCIF files, but follows ModelArchive conventions where the primary citation must refer to the deposited model rather than the software used to generate it.
Examples
>>> from gemmi import cif >>> from modelarchive.modelcif import access, fix_af3 >>> # start with an empty CIF document >>> CIF_DATA = '''data_test ... _citation.id primary ... _citation.country UK ... _citation.journal_full Nature ... _citation.journal_id_ASTM NATUAS ... _citation.journal_id_CSD 0006 ... _citation.journal_id_ISSN 0028-0836 ... _citation.journal_volume 630 ... _citation.page_first 493 ... _citation.page_last 500 ... _citation.pdbx_database_id_DOI 10.1038/s41586-024-07487-w ... _citation.pdbx_database_id_PubMed 38718835 ... _citation.title 'Accurate structure prediction of biomolecular ...' ... _citation.year 2024 ... # ... loop_ ... _citation_author.citation_id ... _citation_author.name ... _citation_author.ordinal ... primary "Google DeepMind AlphaFold Team" 1 ... primary "Isomorphic Labs Team" 2 ... # ... loop_ ... _software.classification ... _software.date ... _software.description ... _software.name ... _software.pdbx_ordinal ... _software.type ... _software.version ... other ? "Structure prediction" AlphaFold 1 package AlphaFold-beta ... ''' >>> block = cif.read_string(CIF_DATA).sole_block() >>> fix_af3.fix_citation(block) >>> # The usual block.as_string() output would be too much for a >>> # docstring, just check some important values. >>> table = access.get_table(block, "_citation") >>> assert table[0]["id"] == "1" >>> table = access.get_table(block, "_citation_author") >>> assert table[0]["name"] != "Google DeepMind AlphaFold Team" >>> table = access.get_table(block, "_software") >>> assert table[0]["citation_id"] == "1"
- Parameters:
block (
gemmi.cif.Block) – CIF block to operate on.- Returns:
None
- Raises:
edit.NotFoundCategoryError – If
_softwarecategory can not be found.NotIdentifiedSingleRecordError – If required item is missing from
_citationcategory. If item values are not as expected for_citationcategory.NotIdentifiedDuplicatedRecordError – If multiple entries for AlphaFold are found in
_softwarecategory. In that case, the “right” record can not be identified.
- modelarchive.modelcif.fix_af3.fix_model_name(block, mdl_rank)[source]
Normalise
_ma_model_list.model_namefor given rank.AlphaFold 3 sets
_ma_model_list.model_nameto “Top ranked model” for all models, regardless of their rank. This function rewrites the value such that onlymdl_rank == 1is labelled “Top ranked model”. All other ranks are renamed to “#<mdl_rank> ranked model”.Examples
>>> from gemmi import cif >>> from modelarchive.modelcif import fix_af3 >>> # get sample CIF data >>> cif_data = '''data_test ... _ma_model_list.data_id 1 ... _ma_model_list.model_name "Top ranked model" ... _ma_model_list.model_type "Ab initio model" ... _ma_model_list.ordinal_id 1 ... ''' >>> block = cif.read_string(cif_data).sole_block() >>> fix_af3.fix_model_name(block, 2) >>> print(block.as_string()) data_test _ma_model_list.data_id 1 _ma_model_list.model_name "#2 ranked model" _ma_model_list.model_type "Ab initio model" _ma_model_list.ordinal_id 1 >>> fix_af3.fix_model_name(block, 1) >>> print(block.as_string()) data_test _ma_model_list.data_id 1 _ma_model_list.model_name "Top ranked model" _ma_model_list.model_type "Ab initio model" _ma_model_list.ordinal_id 1
- Parameters:
block (
gemmi.cif.Block) – CIF block to operate on.mdl_rank (int) – Rank of the AlphaFold 3 model. If
mdl_rank == 1, the name is set to “Top ranked model”.
- Returns:
None
- Raises:
RuntimeError – If the
_ma_model_listcategory contains more than one row.edit.NotFoundCategoryError – no software entry found for AF3.
edit.NotFoundItemError – If
_ma_model_list.model_namecan not be found inblock.
- modelarchive.modelcif.fix_af3.fix_protocol(block)[source]
Fix the MA protocol to a single well-formed step.
Rewrites
_ma_data,_ma_data_group, and_ma_protocol_stepfrom scratch based on the existing_ma_target_entity,_ma_model_listand_ma_software_groupcategories. Any prior content in those three categories is silently overwritten.Data layout after the call:
_ma_data:One record per target entity (content_type “target”) followed by one record per model (content_type “model coordinates”). IDs are assigned sequentially starting at 1.
_ma_data_group:Group 1 - all target data IDs (input side).
Group 2 - all model data IDs (output side).
_ma_protocol_step:A single step referencing the AF3 software group, group 1 as input, and group 2 as output.
Examples
>>> from gemmi import cif >>> from modelarchive.modelcif import access, fix_af3 >>> # start with an empty CIF document >>> CIF_DATA = '''data_test ... # ... loop_ ... _entity.id ... _entity.pdbx_description ... _entity.type ... 1 "bestest polymer in universe" polymer ... 2 "second best polythingi in universe" polymer ... # ... loop_ ... _ma_target_entity.data_id ... _ma_target_entity.entity_id ... _ma_target_entity.origin ... 1 1 . ... 1 2 . ... # ... _ma_model_list.data_id 1 ... _ma_model_list.model_group_id 1 ... _ma_model_list.model_group_name "AlphaFold-beta-20231127 (...)" ... _ma_model_list.model_id 1 ... _ma_model_list.model_name "Top ranked model" ... _ma_model_list.model_type "Ab initio model" ... _ma_model_list.ordinal_id 1 ... # ... loop_ ... _ma_software_group.group_id ... _ma_software_group.ordinal_id ... _ma_software_group.software_id ... 1 1 1 ... # ... loop_ ... _software.classification ... _software.date ... _software.description ... _software.name ... _software.pdbx_ordinal ... _software.type ... _software.version ... other ? "Structure prediction" AlphaFold 1 package AlphaFold-beta ... ''' >>> block = cif.read_string(CIF_DATA).sole_block() >>> fix_af3.fix_protocol(block) >>> access.get_table(block, "_entity").erase() >>> access.get_table(block, "_ma_data").erase() >>> access.get_table(block, "_ma_data_group").erase() >>> access.get_table(block, "_ma_model_list").erase() >>> access.get_table(block, "_ma_software_group").erase() >>> access.get_table(block, "_ma_target_entity").erase() >>> access.get_table(block, "_software").erase() >>> print(block.as_string()) data_test loop_ _ma_protocol_step.ordinal_id _ma_protocol_step.protocol_id _ma_protocol_step.step_id _ma_protocol_step.method_type _ma_protocol_step.details _ma_protocol_step.software_group_id _ma_protocol_step.input_data_group_id _ma_protocol_step.output_data_group_id 1 1 1 modeling 'Model generated with AlphaFold 3.' 1 1 2
- Parameters:
block (
gemmi.cif.Block) – CIF block to operate on.- Returns:
None
- Raises:
edit.NotFoundCategoryError – If any required source category is absent:
_entity,_ma_target_entity,_ma_model_list, or_ma_software_group.edit.NotFoundItemError – If
_ma_target_entity.data_id,_ma_model_list.data_idor_ma_model_list.model_nameare missing.NotIdentifiedDuplicatedRecordError – If multiple
_ma_software_grouprecords exist and the AF3 entry cannot be unambiguously identified in_software.NotIdentifiedContextRecordError – If multiple
_ma_software_grouprecords exist but no AF3 entry can be found in_softwareat all.
- modelarchive.modelcif.fix_af3.fix_software_location(block)[source]
Ensures the AlphaFold 3
_softwareentry has a correct location URL.Determines whether the ModelCIF
blockoriginates from the AlphaFold 3 server or a local installation and sets the corresponding URL in_software.location. If the column does not yet exist it is created; otherwise only the row for AlphaFold 3 is updated.Examples
>>> from gemmi import cif >>> from modelarchive.modelcif import access, fix_af3 >>> # start with an empty CIF document >>> CIF_DATA = '''data_test ... _pdbx_data_usage.details "... alphafoldserver.com/output-terms." ... _pdbx_data_usage.id 1 ... _pdbx_data_usage.type license ... _pdbx_data_usage.url ? ... # ... loop_ ... _software.classification ... _software.date ... _software.description ... _software.name ... _software.pdbx_ordinal ... _software.type ... _software.version ... other ? "Structure prediction" AlphaFold 1 package AlphaFold-beta ... ''' >>> block = cif.read_string(CIF_DATA).sole_block() >>> fix_af3.fix_software_location(block) >>> # Just check that _software.location exists and has the right value >>> table = access.get_table(block, "_software") >>> assert "_software.location" in table.tags >>> assert table[0]["location"] == "https://alphafoldserver.com/" >>> # Change block to look like ModelCIF file from local installation >>> table = access.get_table(block, "_pdbx_data_usage") >>> table[0]["details"] = "...github.com/google-deepmind/alphafold3..." >>> fix_af3.fix_software_location(block) >>> # Check _software.location to point to GitHub, now >>> table = access.get_table(block, "_software") >>> assert table[0]["location"] == "https://github.com/google-deepmind/alphafold3"
- Parameters:
block (
gemmi.cif.Block) – CIF block to operate on.- Returns:
None
- Raises:
NotIdentifiedContextRecordError – If no AlphaFold 3 entry is found in the
_softwaretable.NotIdentifiedContextRecordError – If the origin of the AlphaFold 3 license could not be identified in the
_pdbx_data_usagetable.NotIdentifiedDuplicatedRecordError – If multiple entries for AlphaFold 3 are found in the
_softwaretable.