subsets
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
subsets [2016/12/22 20:01] – sanmark | subsets [2016/12/27 21:23] (current) – rkiss | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== | + | ====== |
- | We provide | + | In case you cannot search / screen the full Mcule database, you may consider using some smaller, representative subsets thereof prefiltered by physicochemical properties and diversity. Structurally diverse subsets |
===== Availability ===== | ===== Availability ===== | ||
The subsets can be | The subsets can be | ||
- | * freely downloaded in SMILES and SDF file formats | + | * freely downloaded in SMILES and SDF file formats |
- | * or can be selected as the input collection for [[screen|online screening]] | + | * or can be selected as the input collection for [[screen|online screening]] |
===== Methods ===== | ===== Methods ===== | ||
Line 13: | Line 13: | ||
==== Property based filtering ==== | ==== Property based filtering ==== | ||
- | For the drug like and fragment | + | For the drug-like and fragment subsets the [[http:// |
- | * number of components <= 1 | + | |
- | * MW >= 100 | + | * number of components < = 1 |
- | * number of N+O atoms >= 1 | + | * MW > = 100 |
- | * number of rings >= 1 | + | * number of N+O atoms > = 1 |
- | * number of halogens <= 7 | + | * number of rings > = 1 |
+ | * number of halogens < = 7 | ||
* number of inorganic atoms = 0 | * number of inorganic atoms = 0 | ||
- | |||
- | We used these rules to leave out more " | ||
==== Diversity selection ==== | ==== Diversity selection ==== | ||
- | The Mcule database contains ~5.7M stock compounds and ~30.3M virtual compounds. | + | Diversity selection was set up to prefer |
- | + | ||
- | Structural similarity was measured by Tanimoto coefficient (TC) between FP2 linear fingerprints generated by OpenBabel. The combinations of the following algorithms were applied to extract the most dissimilar subsets: | + | |
- | * we used sphere exclusion to eliminate highly similar compounds to reduce the input size where needed | + | |
- | * then [[diversitysel|stepwise elimination]] was applied to obtain the most dissimilar compounds | + | |
- | + | ||
- | In sphere exclusion we used the stock compounds first as " | + | |
- | ===== Subsets ===== | + | The [[https:// |
- | To speed up the selection, we used sphere exclusion in case of the Ro5 subsets with TC=0.8 to pass at most 3M compounds for stepwise elminiation. Then, the following subsets were saved: | + | Structural similarity was measured by Tanimoto coefficient (TC) between FP2 linear fingerprints generated by [[http:// |
+ | * sphere exclusion: to quickly eliminate highly similar compounds to reduce the input collection to a manageable size for the subsequent [[diversitysel|stepwise elimination]] algorithm | ||
+ | * [[diversitysel|stepwise elimination]]: a more thorough algorithm that eliminates one molecule of the most similar molecule pairs | ||
- | ^Subset name ^Input ^Property filter ^Diversity ^Subset size ^ | + | In sphere exclusion we used the in-stock compounds |
- | |Mcule Purchasable (In Stock Ro5 Diverse 1M) |Stock compounds|rule-of-5, max 1 violation|top diverse 1M, max TC: 0.8|1, | + | |
- | |Mcule Purchasable (In Stock Ro5 Diverse 350K)|Stock | + | |
- | |Mcule Purchasable (In Stock Ro3)|Stock compounds|rule-of-3 (max 1 violation)|-|154, | + | |
- | |Mcule Purchasable (In Stock Ro3 Diverse 50K)|Stock | + | |
- | |Mcule Purchasable (In Stock & Virtual Ro3)|Stock compounds + virtual compounds|rule-of-3 (max 1 violation)|-|789, | + | |
- | |Mcule Purchasable (In Stock & Virtual Ro3 Diverse 70K)|Stock compounds + virtual compounds|rule-of-3 (max 1 violation)|Top diverse 70K, max TC: 0.8|70,000| | + | |
+ | Sphere exclusion diversity selection was applied in case of the rule-of-5 subsets with maximum TC=0.8 and a maximum of 3M compounds that were subjected to stepwise elimination. |
subsets.txt · Last modified: 2016/12/27 21:23 by rkiss