subsets
                Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| subsets [2016/12/27 20:58] – rkiss | subsets [2016/12/27 21:23] (current) – rkiss | ||
|---|---|---|---|
| Line 28: | Line 28: | ||
| The [[https:// | The [[https:// | ||
| - | Structural similarity was measured by Tanimoto coefficient (TC) between FP2 linear fingerprints generated by OpenBabel. The combinations of the following algorithms were applied to extract the most dissimilar | + | Structural similarity was measured by Tanimoto coefficient (TC) between FP2 linear fingerprints generated by [[http:// | 
| - | * we used sphere exclusion to eliminate highly similar compounds to reduce the input size where needed | + | * sphere exclusion: to quickly | 
| - | * then [[diversitysel|stepwise elimination]] | + | * [[diversitysel|stepwise elimination]]: a more thorough algorithm that eliminates one molecule of the most similar molecule pairs | 
| - | In sphere exclusion we used the stock compounds first as " | + | In sphere exclusion we used the in-stock compounds first as " | 
| - | + | ||
| - | ===== Subsets ===== | + | |
| - | + | ||
| - | To speed up the selection, we used sphere exclusion in case of the Ro5 subsets with TC=0.8 to pass at most 3M compounds for stepwise elminiation. Then, the following subsets were saved: | + | |
| - | + | ||
| - | ^Subset name ^Input ^Property filter ^Diversity ^Subset size ^ | + | |
| - | |Mcule Purchasable (In Stock Ro5 Diverse 1M) |Stock compounds|rule-of-5, | + | |
| - | |Mcule Purchasable (In Stock Ro5 Diverse 350K)|Stock compounds|rule-of-5 (max 1 violation)|Top diverse 350K, max TC: | + | |
| - | |Mcule Purchasable (In Stock Ro3)|Stock compounds|rule-of-3 (max 1 violation)|-|154, | + | |
| - | |Mcule Purchasable (In Stock Ro3 Diverse 50K)|Stock compounds|rule-of-3 (max 1 violation)|Top diverse 50K, max TC: 0.8|50, | + | |
| - | |Mcule Purchasable (In Stock & Virtual Ro3)|Stock compounds + virtual compounds|rule-of-3 (max 1 violation)|-|789, | + | |
| - | |Mcule Purchasable (In Stock & Virtual Ro3 Diverse 70K)|Stock compounds + virtual compounds|rule-of-3 (max 1 violation)|Top diverse 70K, max TC: 0.8|70,000| | + | |
| + | Sphere exclusion diversity selection was applied in case of the rule-of-5 subsets with maximum TC=0.8 and a maximum of 3M compounds that were subjected to stepwise elimination. | ||
subsets.1482872309.txt.gz · Last modified:  by rkiss
                
                