regsys
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
regsys [2012/10/28 10:53] – [Stage B. General structure check & preparation] sanmark | regsys [2013/10/19 11:36] (current) – [Process outline] rkiss | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== | + | ====== |
- | The mcule structure | + | The mcule database is curated by **MAC (Mcule Advanced Curation)** that involves a rigorous molecule |
- | **Key features:** high level data curation, stereochemical standardization, | + | **Key features |
- | ===Registration challenges=== | + | Continue reading for more information about MAC, or check our presentations from the 244th National Meeting of American Chemical Society: |
+ | |||
+ | [[http:// | ||
+ | |||
+ | [[http:// | ||
+ | |||
+ | |||
+ | ==== Quality is important ==== | ||
+ | |||
+ | The design of screening libraries and the development of predictive drug discovery models **all start with a high quality database**. Chemical correctness is crucial because mis-drawn and imperfectly defined structures result in incorrect models, misleading predictions and inconsistent hits. Problematic structures should therefore be eliminated at the earliest possible stage from a drug discovery pipeline. | ||
+ | |||
+ | The mcule structure registration system is primarily designed to correctly handle chemical structures coming from different data sources, mainly from chemical suppliers, and load the structures into the mcule database. This is a non-trivial task which requires a careful structure check and preparation procedure. To reach a high curation level, the registration system should ensure database quality in terms of structure correctness, | ||
+ | |||
+ | **All molecules with an MCULE ID have been processed by MAC**. User uploaded molecules are not processed by MAC by default. We plan to enable this option in future. | ||
+ | |||
+ | ==== Registration challenges | ||
Line 42: | Line 57: | ||
===== Process outline ===== | ===== Process outline ===== | ||
- | The whole registration process can be divided into seven different stages. It begins with the revision of stereo configurations, | + | The whole registration process can be divided into seven different stages. It begins with the revision of stereo configurations, |
|Stage A |Enforcing [[stereonotations|standard stereo representation]]; | |Stage A |Enforcing [[stereonotations|standard stereo representation]]; | ||
Line 51: | Line 66: | ||
As a result, input entries as well as their components are registered at two levels: tautomer and protonation state independent [[structurelevels|compound level]] with tautomer detection and tautomer and protonation state dependent [[structurelevels|structure level]] without tautomer detection. | As a result, input entries as well as their components are registered at two levels: tautomer and protonation state independent [[structurelevels|compound level]] with tautomer detection and tautomer and protonation state dependent [[structurelevels|structure level]] without tautomer detection. | ||
- | |||
===== Registration process ===== | ===== Registration process ===== | ||
Line 113: | Line 127: | ||
In these steps common functional groups such as nitro and azide groups are transformed to their neutral form. This standardization is necessary to get all relevant results from a [[substructuresearch|substructure search]]. Besides standardization, | In these steps common functional groups such as nitro and azide groups are transformed to their neutral form. This standardization is necessary to get all relevant results from a [[substructuresearch|substructure search]]. Besides standardization, | ||
- | |||
- | {{ : | ||
{{ : | {{ : | ||
Line 139: | Line 151: | ||
In this stage we separate components of the incoming structure. In common salts counterions can be disconnected and separated from the main component automatically. Bonds to the main component are deleted and proper charges are placed on both components. | In this stage we separate components of the incoming structure. In common salts counterions can be disconnected and separated from the main component automatically. Bonds to the main component are deleted and proper charges are placed on both components. | ||
+ | {{ : | ||
==== Stage D & E. Component registration ==== | ==== Stage D & E. Component registration ==== | ||
//Summary: individual components’ structures are normalized, unique components are registered with new mcule IDs assigned at the tautomer and protonation state independent (D) and dependent (E) levels (steps in the D & E stages are very similar except for novelty check)// | //Summary: individual components’ structures are normalized, unique components are registered with new mcule IDs assigned at the tautomer and protonation state independent (D) and dependent (E) levels (steps in the D & E stages are very similar except for novelty check)// | ||
Line 147: | Line 160: | ||
In the mcule system there are [[stereonotations|four stereo configuration types]]: absolute, relative, racemic and unknown (the “unknown” type is used to denote uncertain configurations, | In the mcule system there are [[stereonotations|four stereo configuration types]]: absolute, relative, racemic and unknown (the “unknown” type is used to denote uncertain configurations, | ||
+ | |||
+ | {{ : | ||
Normalization is needed because certain configurations can be represented with multiple structures and/or [[stereonotations|stereo configuration types]]: replacing configurations around atoms and/or the configuration type can result in stereochemically equivalent structures. This can primarily happen when the configuration is only partially specified, containing atoms with both unknown/ | Normalization is needed because certain configurations can be represented with multiple structures and/or [[stereonotations|stereo configuration types]]: replacing configurations around atoms and/or the configuration type can result in stereochemically equivalent structures. This can primarily happen when the configuration is only partially specified, containing atoms with both unknown/ | ||
Line 158: | Line 173: | ||
In stage D we use a novelty check algorithm that is based on the InChI strings but can detect an even broader set of potential tautomers than a simple InChI comparison. The system is capable to fully prevent the registration of duplicates as long as they are prototopic tautomers. | In stage D we use a novelty check algorithm that is based on the InChI strings but can detect an even broader set of potential tautomers than a simple InChI comparison. The system is capable to fully prevent the registration of duplicates as long as they are prototopic tautomers. | ||
+ | {{ : | ||
==== Stage E & F. Multicomponent structure registration ==== | ==== Stage E & F. Multicomponent structure registration ==== | ||
//Summary: additional checks are performed, component types are assigned, and unique structures are registered with new mcule IDs assigned at the tautomer and protonation state independent (F) and dependent (G) levels (steps in the F & G stages are very similar)// | //Summary: additional checks are performed, component types are assigned, and unique structures are registered with new mcule IDs assigned at the tautomer and protonation state independent (F) and dependent (G) levels (steps in the F & G stages are very similar)// | ||
Line 174: | Line 190: | ||
In most cases the system is also capable of identifying the main components, which can serve as the input set for virtual screens. | In most cases the system is also capable of identifying the main components, which can serve as the input set for virtual screens. | ||
+ | |||
+ | You can see below the index page of compound [[https:// | ||
+ | (uncertainty is marked by crossed double bond). Counter ions are marked, and component multiplicities are assigned correctly by the system. | ||
+ | |||
+ | {{: |
regsys.txt · Last modified: 2013/10/19 11:36 by rkiss