subsets
Differences
This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
| subsets [2016/12/22 18:27] – created sanmark | subsets [2016/12/27 21:23] (current) – rkiss | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ====== | + | ====== |
| - | We provide | + | In case you cannot search / screen the full Mcule database, you may consider using some smaller, representative subsets thereof prefiltered by physicochemical properties and diversity. Structurally diverse subsets |
| ===== Availability ===== | ===== Availability ===== | ||
| The subsets can be | The subsets can be | ||
| - | * freely downloaded in SMILES and SDF file formats | + | * freely downloaded in SMILES and SDF file formats |
| - | * or can be selected as the input collection for [[screen|online screening]] | + | * or can be selected as the input collection for [[screen|online screening]] |
| - | ===== Diversity selection | + | ===== Methods |
| - | The Mcule database contains ~5.7M stock compounds and ~30.3M virtual compounds. Diversity selection was carried out in a way to prefer the stock compounds over the virtual ones. The aim is to represent only those part of the chemical space by virtual compounds space by virtual compopunds | + | ==== Property based filtering ==== |
| - | We've developed a method for large scale diversity selection. The selection is carried out diverse subsets | + | For the drug-like and fragment subsets the [[http:// |
| + | |||
| + | * number of components < = 1 | ||
| + | * MW > = 100 | ||
| + | * number of N+O atoms > = 1 | ||
| + | * number of rings > = 1 | ||
| + | * number of halogens < = 7 | ||
| + | * number of inorganic atoms = 0 | ||
| + | |||
| + | ==== Diversity | ||
| + | |||
| + | Diversity selection was set up to prefer in-stock compounds over virtual ones. As a result, the chemical space is represented by in-stock compounds where possible. | ||
| + | |||
| + | The [[https:// | ||
| + | |||
| + | Structural similarity was measured by Tanimoto coefficient (TC) between FP2 linear fingerprints generated by [[http:// | ||
| + | * sphere exclusion: to quickly eliminate highly similar compounds to reduce the input collection to a manageable size for the subsequent [[diversitysel|stepwise elimination]] algorithm | ||
| + | * [[diversitysel|stepwise elimination]]: | ||
| + | |||
| + | In sphere exclusion we used the in-stock compounds first as " | ||
| + | |||
| + | Sphere exclusion diversity selection was applied in case of the rule-of-5 subsets with maximum TC=0.8 and a maximum of 3M compounds that were subjected to stepwise elimination. | ||
subsets.1482431235.txt.gz · Last modified: by sanmark