Differences

This shows you the differences between two versions of the page.

--- regsys [2012/10/28 10:52] – add figures sanmark
+++ regsys [2013/02/27 08:06] – rkiss
@@ Line 1: / Line 1: @@
-====== The mcule structure registration system ======
+====== Mcule Advanced Curation (MAC) ======
+The mcule database is curated by **MAC (Mcule Advanced Curation)** that involves a rigorous molecule registration system based on more than 80 structural checks, standardization, preparation and correction steps. MAC guarantees high quality search results and avoids common errors arising from mis-drawn and incorrect structures that can critically affect the quality of computational calculations and the efficiency of experimental results.
+==== Quality is important ====
+The design of screening libraries and the development of predictive drug discovery models **all start with a high quality database**. Chemical correctness is crucial because mis-drawn and imperfectly defined structures result in incorrect models, misleading predictions and inconsistent hits. Problematic structures should therefore be eliminated at the earliest possible stage from a drug discovery pipeline.
 The mcule structure registration system is primarily designed to handle chemical structures coming from different data sources, mainly from chemical suppliers, and load the structures into the mcule database. This is a non-trivial task which requires a careful structure check and preparation procedure. To reach a high curation level, the registration system should ensure database quality in terms of structure correctness, uniqueness and reliability as well as maintain a high level of data standardization.
+All molecules with an MCULE ID have been processed by the mcule structure registration system. User uploaded molecules are not processed by the mcule structure registration system by default. We will enable this option in future.
 **Key features:** high level data curation, stereochemical standardization, robust novelty check and isomer detection, handling salts & organometallics
@@ Line 115: / Line 124: @@
 {{ :regsys:reg_sys_5.png |}}
-{{ :regsys:reg_sys_6.png |}}
 ==Enforce standard salt & organometallic compound representation==
@@ Line 138: / Line 146: @@
 In this stage we separate components of the incoming structure. In common salts counterions can be disconnected and separated from the main component automatically. Bonds to the main component are deleted and proper charges are placed on both components.
+{{ :regsys:reg_sys_9.png |}}
 ==== Stage D & E. Component registration ====
 //Summary: individual components’ structures are normalized, unique components are registered with new mcule IDs assigned at the tautomer and protonation state independent (D) and dependent (E) levels (steps in the D & E stages are very similar except for novelty check)//
@@ Line 146: / Line 155: @@
 In the mcule system there are [[stereonotations|four stereo configuration types]]: absolute, relative, racemic and unknown (the “unknown” type is used to denote uncertain configurations, where compound provider could not confirm that the configuration type is really absolute). They are assigned in the stereo clean-up stage, and these initially assigned types are inherited by the separated components. In these steps these assigned stereo configuration types as well as the stereo configurations are further processed: for those components having no stereocenters, stereo configuration types are removed, while the stereo configuration of components with stereocenters are normalized together with their stereo configuration types.
+{{ :regsys:reg_sys_10.png |}}
 Normalization is needed because certain configurations can be represented with multiple structures and/or [[stereonotations|stereo configuration types]]: replacing configurations around atoms and/or the configuration type can result in stereochemically equivalent structures. This can primarily happen when the configuration is only partially specified, containing atoms with both unknown/undefined and well-defined configurations. As a preparation step for the novelty check the same representative structures are selected from the set of structures with equivalent configurations.
@@ Line 157: / Line 168: @@
 In stage D we use a novelty check algorithm that is based on the InChI strings but can detect an even broader set of potential tautomers than a simple InChI comparison. The system is capable to fully prevent the registration of duplicates as long as they are prototopic tautomers.
+{{ :regsys:reg_sys_11.png |}}
 ==== Stage E & F. Multicomponent structure registration ====
 //Summary: additional checks are performed, component types are assigned, and unique structures are registered with new mcule IDs assigned at the tautomer and protonation state independent (F) and dependent (G) levels (steps in the F & G stages are very similar)//
@@ Line 173: / Line 185: @@
 In most cases the system is also capable of identifying the main components, which can serve as the input set for virtual screens.
+You can see below the index page of compound [[https://mcule.com/MCULE-3198812899|MCULE-3198812899]]. This is a maleic and/or fumaric acid salt
+(uncertainty is marked by crossed double bond). Counter ions are marked, and component multiplicities are assigned correctly by the system.
+{{:regsys:reg_sys_12.png|}}