subsets
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision | ||
subsets [2016/12/27 20:38] – rkiss | subsets [2016/12/27 21:19] – rkiss | ||
---|---|---|---|
Line 7: | Line 7: | ||
The subsets can be | The subsets can be | ||
* freely downloaded in SMILES and SDF file formats from our [[https:// | * freely downloaded in SMILES and SDF file formats from our [[https:// | ||
- | * or can be selected as the input collection for [[screen|online screening]] | + | * or can be selected as the input collection for [[screen|online screening]] |
===== Methods ===== | ===== Methods ===== | ||
Line 13: | Line 13: | ||
==== Property based filtering ==== | ==== Property based filtering ==== | ||
- | For the drug-like and fragment-like subsets the rule of 5 and rule of 3 rules were applied, allowing | + | For the drug-like and fragment subsets the [[http:// |
* number of components < = 1 | * number of components < = 1 | ||
* MW > = 100 | * MW > = 100 | ||
Line 20: | Line 21: | ||
* number of halogens < = 7 | * number of halogens < = 7 | ||
* number of inorganic atoms = 0 | * number of inorganic atoms = 0 | ||
- | |||
- | We used these rules to leave out more " | ||
==== Diversity selection ==== | ==== Diversity selection ==== | ||
- | The Mcule database contains ~5.7M stock compounds and ~30.3M virtual compounds. | + | Diversity selection was set up to prefer |
+ | |||
+ | The [[https:// | ||
- | Structural similarity was measured by Tanimoto coefficient (TC) between FP2 linear fingerprints generated by OpenBabel. The combinations of the following algorithms were applied to extract the most dissimilar | + | Structural similarity was measured by Tanimoto coefficient (TC) between FP2 linear fingerprints generated by [[http:// |
- | * we used sphere exclusion to eliminate highly similar compounds to reduce the input size where needed | + | * sphere exclusion: to quickly |
- | * then [[diversitysel|stepwise elimination]] | + | * [[diversitysel|stepwise elimination]]: a more thorough algorithm that eliminates one molecule of the most similar molecule pairs |
- | In sphere exclusion we used the stock compounds first as " | + | In sphere exclusion we used the in-stock compounds first as " |
===== Subsets ===== | ===== Subsets ===== |
subsets.txt · Last modified: 2016/12/27 21:23 by rkiss