diversity_selection
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
diversity_selection [2012/07/02 20:42] – [Advanced options] rkiss | diversity_selection [2012/07/02 20:58] – [Algorithm] rkiss | ||
---|---|---|---|
Line 28: | Line 28: | ||
We plan to introduce more descriptors and more similarity measure types in the future. | We plan to introduce more descriptors and more similarity measure types in the future. | ||
- | * **Molecular descriptor**: | + | * **Molecular descriptor**: |
==== Default options ==== | ==== Default options ==== | ||
- | The default descriptor used is the linear fingerprint implemented in Open Babel ((Open Babel v2.3.90 http:// | + | The default descriptor used is the linear fingerprint implemented in OpenBabel |
- | + | ||
- | If you have no suggestions to use another setup, you can rely on our choices. After implementation and evaluation of new fingerprints and metrics, the default setup can be changed. This can be tracked at the end of this document, in the Changelog section. | + | |
==== Algorithm ==== | ==== Algorithm ==== | ||
- | We use an optimized implementation of the stepwise elimination algorithm((R. J. Taylor, J. Chem. Inf. Comput. Sci., 1995, 35, 59 67.)), which can be described as follows: | + | We use an optimized implementation of the stepwise elimination algorithm((R. J. Taylor, J. Chem. Inf. Comput. Sci., 1995, 35, 59-67.)), which can be described as follows: |
- | + | ||
- | - calculate the similarity matrix of the molecules in the input collection | + | |
- | - process the matrix elements as follows: | + | |
- | - select the largest off-diagonal element in the similarity matrix | + | |
- | - eliminate one molecule of the most similar molecule pair randomly | + | |
- | - go to step I. if off-diagonal elements remained | + | |
- | - sort the list of eliminated molecules by similarity values associated to the elimination steps in increasing order | + | |
- | During this process, | + | - Calculate |
+ | - Process the matrix elements as follows: | ||
+ | - Select the largest off-diagonal element | ||
+ | - Eliminate one molecule of the most similar molecule pair randomly | ||
+ | - Go to step I. if off-diagonal elements remained | ||
+ | - Sort the list of eliminated molecules by similarity values | ||
- | After the algorithm finishes, structures are sorted by similarity values and are placed in the result collection. The first molecules in the resulted | + | During this process, the size of the collection |
+ | After the algorithm finishes, structures are sorted by similarity values and are placed in the result collection. The first molecules in the resulted collection are the most dissimilar (most diverse) ones. The length of the result list is determined by input parameters: maximum number of compounds and similarity threshold. | ||
===== Limitations ===== | ===== Limitations ===== | ||