diversity_selection
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision | ||
diversity_selection [2012/07/02 20:47] – [Default options] rkiss | diversity_selection [2012/07/03 17:42] – sanmark | ||
---|---|---|---|
Line 16: | Line 16: | ||
* **Similarity threshold**: | * **Similarity threshold**: | ||
- | * **Number | + | * **Max number |
If you do not limit the selection, the full collection will be returned ordered by diversity. This means that the top N molecules in the resulting collection will be the most diverse N molecules. The //maximum similarity// | If you do not limit the selection, the full collection will be returned ordered by diversity. This means that the top N molecules in the resulting collection will be the most diverse N molecules. The //maximum similarity// | ||
Line 33: | Line 33: | ||
The default descriptor used is the linear fingerprint implemented in OpenBabel ((Open Babel v2.3.90 http:// | The default descriptor used is the linear fingerprint implemented in OpenBabel ((Open Babel v2.3.90 http:// | ||
- | |||
- | If you have no preference, you can use the default settings. | ||
==== Algorithm ==== | ==== Algorithm ==== | ||
- | We use an optimized implementation of the stepwise elimination algorithm((R. J. Taylor, J. Chem. Inf. Comput. Sci., 1995, 35, 59 67.)), which can be described as follows: | + | We use an optimized implementation of the stepwise elimination algorithm((R. J. Taylor, J. Chem. Inf. Comput. Sci., 1995, 35, 59-67.)), which can be described as follows: |
- | - calculate | + | - Calculate |
- | - process | + | - Process |
- | - select | + | - Select |
- | - eliminate | + | - Eliminate |
- | - go to step I. if off-diagonal elements remained | + | - Go to step I. if off-diagonal elements remained |
- | - sort the list of eliminated molecules by similarity values associated to the elimination steps in increasing order | + | - Sort the list of eliminated molecules by similarity values associated to the elimination steps in increasing order |
- | During this process, the size of the collection is reduced | + | During this process, the size of the collection is reduced |
- | + | ||
- | After the algorithm finishes, structures are sorted by similarity | + | |
+ | After the algorithm finishes, structures are sorted by similarity values and are placed in the result collection. The first molecules in the resulted collection are the most dissimilar (most diverse) ones. The length of the result list is determined by input parameters: maximum number of compounds and similarity threshold. | ||
===== Limitations ===== | ===== Limitations ===== | ||
- | The diversity | + | Diversity |
+ | |||
+ | The average run time for 10,000 input molecules about a minute. | ||
===== Changelog ===== | ===== Changelog ===== |