subsets
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
subsets [2016/12/22 18:27] – created sanmark | subsets [2016/12/27 21:23] (current) – rkiss | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== | + | ====== |
- | We provide | + | In case you cannot search / screen the full Mcule database, you may consider using some smaller, representative subsets thereof prefiltered by physicochemical properties and diversity. Structurally diverse subsets |
===== Availability ===== | ===== Availability ===== | ||
The subsets can be | The subsets can be | ||
- | * freely downloaded in SMILES and SDF file formats | + | * freely downloaded in SMILES and SDF file formats |
- | * or can be selected as the input collection for [[screen|online screening]] | + | * or can be selected as the input collection for [[screen|online screening]] |
- | ===== Diversity selection | + | ===== Methods |
- | The Mcule database contains ~5.7M stock compounds and ~30.3M virtual compounds. Diversity selection was carried out in a way to prefer the stock compounds over the virtual ones. The aim is to represent only those part of the chemical space by virtual compounds space by virtual compopunds | + | ==== Property based filtering ==== |
- | We've developed a method for large scale diversity selection. The selection is carried out diverse subsets | + | For the drug-like and fragment subsets the [[http:// |
+ | |||
+ | * number of components < = 1 | ||
+ | * MW > = 100 | ||
+ | * number of N+O atoms > = 1 | ||
+ | * number of rings > = 1 | ||
+ | * number of halogens < = 7 | ||
+ | * number of inorganic atoms = 0 | ||
+ | |||
+ | ==== Diversity | ||
+ | |||
+ | Diversity selection was set up to prefer in-stock compounds over virtual ones. As a result, the chemical space is represented by in-stock compounds where possible. | ||
+ | |||
+ | The [[https:// | ||
+ | |||
+ | Structural similarity was measured by Tanimoto coefficient (TC) between FP2 linear fingerprints generated by [[http:// | ||
+ | * sphere exclusion: to quickly eliminate highly similar compounds to reduce the input collection to a manageable size for the subsequent [[diversitysel|stepwise elimination]] algorithm | ||
+ | * [[diversitysel|stepwise elimination]]: | ||
+ | |||
+ | In sphere exclusion we used the in-stock compounds first as " | ||
+ | |||
+ | Sphere exclusion diversity selection was applied in case of the rule-of-5 subsets with maximum TC=0.8 and a maximum of 3M compounds that were subjected to stepwise elimination. |
subsets.txt · Last modified: 2016/12/27 21:23 by rkiss