This filter selects the most diverse (dissimilar) molecules from collections by eliminating the closest analogs. Diversity selection reduces the size of the input collection and maximizes the coverage of the chemical space at the same time.
If you have limited experimental or computational resources, diversity selection is an unbiased way to limit the number of compounds to handle. Collecting compounds from different regions of the chemical space is an efficient strategy to maximize the diversity of the identified active scaffolds.
Using this filter you can either reduce the size of large (virtual) screening libraries, or select a diverse, representative set of your virtual hits.
Exotic (non-druglike) molecules typically show large structural diversity. Consequently, they might be over-represented in the output of the Diversity selection filter. It is therefore recommended to eliminate such molecules prior to Diversity selection e.g. by the REOS filter.
No molecule pairs in the output collection can be more similar to each other than this number (Tanimoto coefficient).
The diversity selection algorithm will output this number of molecules or less (if the “Similarity threshold” is reached first).
Mathematical representation of molecules used in the similarity/dissimilarity calculation. The The OpenBabel Linear Fingerprint (referred to FP2), and the Indigo Similarity Fingerprint (referred to sim) can be selected.
Tanimoto coefficient, the most widely used similarity metric, has been implemented and used as similarity measure by default.
Diversity selection available in the Free package is limited to 10,000 input molecules per month. To get access to unlimited Diversity selection, subscribe to our Library Design package.
Diversity selection utilizes an optimized implementation of the stepwise elimination algorithm (R. J. Taylor, J. Chem. Inf. Comput. Sci., 1995, 35, 59-67.), which can be described as follows:
During this process, the size of the collection is reduced while the diversity of the collection is increased. Each elimination step filters out one molecule that has close analogues in the remaining set. As a result, the remaining molecules will have a decreased similarity (increased diversity).