diversity_selection
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
diversity_selection [2012/06/28 10:25] – [Limitations] gazs | diversity_selection [2012/07/02 20:50] – [Algorithm] rkiss | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Diversity selection ====== | ====== Diversity selection ====== | ||
- | This filter selects the most diverse compounds from large compound collections | + | This filter selects the most diverse compounds from large compound collections |
===== When to use ===== | ===== When to use ===== | ||
- | If you have limited experimental or computational resources, diversity selection is an unbiased way to limit the number of compounds to handle. | + | If you have limited experimental or computational resources, diversity selection is an unbiased way to limit the number of compounds to handle. |
- | + | ||
- | Using this filter you can select a diverse, representative set of your virtual hits, or effectively reduce your screening/ | + | |
+ | Using this filter you can either reduce the size of large (virtual) screening libraries, or select a diverse, representative set of your virtual hits. | ||
===== How to use ===== | ===== How to use ===== | ||
- | It is recommended | + | It is recommended |
==== Basic options ==== | ==== Basic options ==== | ||
Line 20: | Line 18: | ||
* **Number of most diverse molecules**: | * **Number of most diverse molecules**: | ||
- | If you don’t | + | If you do not limit the selection, the full collection will be returned |
==== Advanced options ==== | ==== Advanced options ==== | ||
- | You can adjust the meaning | + | Under Advanced options, you can adjust the definition |
- | * **Molecular descriptor**: | + | You will be able to set different similarity metrics as the measure of similarity. Currently, only the Tanimoto coefficient (Jaccard index)((http:// |
- | ==== Default options ==== | + | We plan to introduce more descriptors and more similarity measure types in the future. |
- | The default | + | * **Molecular |
- | If you have no suggestions to use another setup, you can rely on our choices. After implementation and evaluation of new fingerprints and metrics, the default setup can be changed. This can be tracked at the end of this document, in the Changelog section. | + | ==== Default options ==== |
+ | The default descriptor used is the linear fingerprint implemented in OpenBabel ((Open Babel v2.3.90 http:// | ||
==== Algorithm ==== | ==== Algorithm ==== | ||
- | We use an optimized implementation of the stepwise elimination algorithm((R. J. Taylor, J. Chem. Inf. Comput. Sci., 1995, 35, 59 67.)), which can be described as follows: | + | We use an optimized implementation of the stepwise elimination algorithm((R. J. Taylor, J. Chem. Inf. Comput. Sci., 1995, 35, 59-67.)), which can be described as follows: |
- calculate the similarity matrix of the molecules in the input collection | - calculate the similarity matrix of the molecules in the input collection | ||
Line 47: | Line 46: | ||
During this process, the size of the collection is reduced and diversity increases. Each elimination step throws out a compound that has close analogues in the remaining set. In result, we get a single compound, and a list of compounds with decreasing similarity values, which can be interpreted as the increasing diversity of the remaining set. | During this process, the size of the collection is reduced and diversity increases. Each elimination step throws out a compound that has close analogues in the remaining set. In result, we get a single compound, and a list of compounds with decreasing similarity values, which can be interpreted as the increasing diversity of the remaining set. | ||
- | After the algorithm finishes, structures are sorted by similarity values and are placed in the result collection. The first molecules in the resulted collection are the most dissimilar (most diverse) ones, and the lenght | + | After the algorithm finishes, structures are sorted by similarity values and are placed in the result collection. The first molecules in the resulted collection are the most dissimilar (most diverse) ones, and the length |
===== Limitations ===== | ===== Limitations ===== | ||
- | The diversity selection is freely accessible for every mcule user with a monthly limit of 10000 input compounds. The average run time for 10000 compouds is X minutes. The usage of your diversity filter can be tracked on the user profile / limits. Our technologies allow effective processing of very large collections (~10M). If you want to exceed your limits, please contact us. | + | The diversity selection is freely accessible for every mcule user with a monthly limit of 10000 input compounds. The average run time for 10000 compouds is about 5 minutes. The usage of your diversity filter can be tracked on the user profile / limits. Our technologies allow effective processing of very large collections (~10M). If you want to exceed your limits, please contact us. |
===== Changelog ===== | ===== Changelog ===== |