diversity_selection
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision | ||
diversity_selection [2012/07/02 20:42] – [Advanced options] rkiss | diversity_selection [2012/07/03 17:42] – sanmark | ||
---|---|---|---|
Line 16: | Line 16: | ||
* **Similarity threshold**: | * **Similarity threshold**: | ||
- | * **Number | + | * **Max number |
If you do not limit the selection, the full collection will be returned ordered by diversity. This means that the top N molecules in the resulting collection will be the most diverse N molecules. The //maximum similarity// | If you do not limit the selection, the full collection will be returned ordered by diversity. This means that the top N molecules in the resulting collection will be the most diverse N molecules. The //maximum similarity// | ||
Line 28: | Line 28: | ||
We plan to introduce more descriptors and more similarity measure types in the future. | We plan to introduce more descriptors and more similarity measure types in the future. | ||
- | * **Molecular descriptor**: | + | * **Molecular descriptor**: |
==== Default options ==== | ==== Default options ==== | ||
- | The default descriptor used is the linear fingerprint implemented in Open Babel ((Open Babel v2.3.90 http:// | + | The default descriptor used is the linear fingerprint implemented in OpenBabel |
+ | ==== Algorithm ==== | ||
- | If you have no suggestions to use another setup, you can rely on our choices. After implementation | + | We use an optimized |
- | ==== Algorithm ==== | + | - Calculate the similarity matrix of the molecules in the input collection |
+ | - Process the matrix elements as follows: | ||
+ | - Select the largest off-diagonal element in the similarity matrix | ||
+ | - Eliminate one molecule of the most similar molecule pair randomly | ||
+ | - Go to step I. if off-diagonal elements remained | ||
+ | - Sort the list of eliminated molecules by similarity values associated to the elimination steps in increasing order | ||
- | We use an optimized implementation | + | During this process, the size of the collection is reduced while the diversity of the collection is increased. Each elimination |
- | - calculate | + | After the algorithm finishes, structures are sorted by similarity |
- | - process the matrix elements as follows: | + | ===== Limitations ===== |
- | - select the largest off-diagonal element in the similarity matrix | + | |
- | - eliminate one molecule of the most similar molecule pair randomly | + | |
- | - go to step I. if off-diagonal elements remained | + | |
- | - sort the list of eliminated molecules by similarity | + | |
- | During this process, the size of the collection | + | Diversity selection filter |
- | After the algorithm finishes, structures are sorted by similarity values and are placed in the result collection. The first molecules | + | The average run time for 10,000 input molecules |
- | + | ||
- | ===== Limitations ===== | + | |
- | The diversity selection is freely accessible for every mcule user with a monthly limit of 10000 input compounds. The average run time for 10000 compouds is about 5 minutes. The usage of your diversity filter can be tracked on the user profile / limits. Our technologies allow effective processing of very large collections (~10M). If you want to exceed your limits, please contact us. | ||
===== Changelog ===== | ===== Changelog ===== |