subsets
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revisionLast revisionBoth sides next revision | ||
subsets [2016/12/22 18:27] – created sanmark | subsets [2016/12/27 21:19] – rkiss | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== | + | ====== |
- | We provide | + | In case you cannot search / screen the full Mcule database, you may consider using some smaller, representative subsets thereof prefiltered by physicochemical properties and diversity. Structurally diverse subsets |
===== Availability ===== | ===== Availability ===== | ||
The subsets can be | The subsets can be | ||
- | * freely downloaded in SMILES and SDF file formats | + | * freely downloaded in SMILES and SDF file formats |
- | * or can be selected as the input collection for [[screen|online screening]] | + | * or can be selected as the input collection for [[screen|online screening]] |
- | ===== Diversity selection | + | ===== Methods |
- | The Mcule database contains ~5.7M stock compounds | + | ==== Property based filtering ==== |
+ | |||
+ | For the drug-like and fragment subsets the [[http:// | ||
+ | |||
+ | * number of components < = 1 | ||
+ | * MW > = 100 | ||
+ | * number of N+O atoms > = 1 | ||
+ | * number of rings > = 1 | ||
+ | * number of halogens < = 7 | ||
+ | * number of inorganic atoms = 0 | ||
+ | |||
+ | ==== Diversity selection ==== | ||
+ | |||
+ | Diversity selection was set up to prefer | ||
+ | |||
+ | The [[https:// | ||
+ | |||
+ | Structural similarity was measured | ||
+ | * sphere exclusion: to quickly eliminate highly similar compounds to reduce the input collection to a manageable size for the subsequent [[diversitysel|stepwise elimination]] algorithm | ||
+ | * [[diversitysel|stepwise elimination]]: | ||
+ | |||
+ | In sphere exclusion we used the in-stock compounds first as " | ||
+ | |||
+ | ===== Subsets ===== | ||
+ | |||
+ | To speed up the selection, we used sphere exclusion in case of the Ro5 subsets with TC=0.8 to pass at most 3M compounds for stepwise elminiation. Then, the following subsets were saved: | ||
+ | |||
+ | ^Subset name ^Input ^Property filter ^Diversity ^Subset size ^ | ||
+ | |Mcule Purchasable (In Stock Ro5 Diverse 1M) |Stock compounds|rule-of-5, | ||
+ | |Mcule Purchasable (In Stock Ro5 Diverse 350K)|Stock compounds|rule-of-5 (max 1 violation)|Top diverse 350K, max TC: | ||
+ | |Mcule Purchasable (In Stock Ro3)|Stock compounds|rule-of-3 (max 1 violation)|-|154, | ||
+ | |Mcule Purchasable (In Stock Ro3 Diverse 50K)|Stock compounds|rule-of-3 (max 1 violation)|Top diverse 50K, max TC: 0.8|50, | ||
+ | |Mcule Purchasable (In Stock & Virtual Ro3)|Stock compounds + virtual compounds|rule-of-3 (max 1 violation)|-|789, | ||
+ | |Mcule Purchasable (In Stock & Virtual Ro3 Diverse 70K)|Stock compounds + virtual | ||
- | We've developed a method for large scale diversity selection. The selection is carried out diverse subsets can be extracted while we |
subsets.txt · Last modified: 2016/12/27 21:23 by rkiss