User Tools

Site Tools


diversitysel

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
diversitysel [2012/10/13 17:49] – [Results] sanmarkdiversitysel [2016/12/27 21:16] (current) – [Algorithm] rkiss
Line 1: Line 1:
 ====== Diversity selection ====== ====== Diversity selection ======
  
 +{{:divsel_new2.png?800|}}
  
 This filter selects the most diverse (dissimilar) molecules from collections by eliminating the closest analogs. Diversity selection reduces the size of the input collection and maximizes the coverage of the chemical space at the same time. This filter selects the most diverse (dissimilar) molecules from collections by eliminating the closest analogs. Diversity selection reduces the size of the input collection and maximizes the coverage of the chemical space at the same time.
Line 14: Line 15:
  
  
-Exotic (non-druglike) molecules typically show large structural diversity. Consequently, they might be over-represented in the output of the Diversity selection filter. It is therefore recommended to eliminate such molecules prior to Diversity selection e.g. by REOS (link), Property (link) or SMARTS query (link) filters.+Exotic (non-druglike) molecules typically show large structural diversity. Consequently, they might be over-represented in the output of the Diversity selection filter. It is therefore recommended to eliminate such molecules prior to Diversity selection e.g. by [[REOS|the REOS filter]].
  
 ==== Basic options ==== ==== Basic options ====
  
  
-Similarity threshold +== Similarity threshold == 
-No molecule pairs in the output collection can be more similar to each other than this number (Tanimoto coefficient).+ 
 +No molecule pairs in the output collection can be more similar to each other than this number ([[http://en.wikipedia.org/wiki/Jaccard_index|Tanimoto coefficient]]). 
 + 
 +== Max number of most diverse molecules ==
  
-Max number of most diverse molecules 
 The diversity selection algorithm will output this number of molecules or less (if the “Similarity threshold” is reached first). The diversity selection algorithm will output this number of molecules or less (if the “Similarity threshold” is reached first).
  
Line 28: Line 31:
  
  
-Descriptor +== Descriptor ==
-Mathematical representation of molecules used in the similarity/dissimilarity calculation. The OpenBabel Linear Fingerprint (referred to FP2 here (http://openbabel.org/wiki/Tutorial:Fingerprints)), and the Indigo Similarity Fingerprint (referred to sim here (http://ggasoftware.com/opensource/indigo/api#fingerprints)) can be selected.+
  
-Tanimoto coefficient (http://en.wikipedia.org/wiki/Jaccard_index), the most widely used similarity metric, has been implemented and used as similarity measure by default.+Mathematical representation of molecules used in the similarity/dissimilarity calculation. The The OpenBabel Linear Fingerprint (referred to [[http://openbabel.org/wiki/Tutorial:Fingerprints| FP2]]), and the Indigo Similarity Fingerprint (referred to [[http://ggasoftware.com/opensource/indigo/api#fingerprints|sim]]) can be selected. 
 + 
 +[[http://en.wikipedia.org/wiki/Jaccard_index|Tanimoto coefficient]], the most widely used similarity metric, has been implemented and used as similarity measure by default.
  
 ===== Results ===== ===== Results =====
  
   * diverse subset of the input collection satisfying the filter criteria   * diverse subset of the input collection satisfying the filter criteria
-  * maximum similarity (found between any molecule pairs if the rank list is cut exactly after this molecule) will be displayed as a single column in Table (link) and List (link) views+  * maximum similarity (found between any molecule pairs if the rank list is cut exactly after this molecule) will be displayed as a single column in Table and List views
   * output molecules will be ordered by maximum similarity (most diverse molecules are ranked highest)   * output molecules will be ordered by maximum similarity (most diverse molecules are ranked highest)
  
 ===== Limits ===== ===== Limits =====
  
-Diversity selection available in the Free package (link) is limited to 10,000 input molecules per month. To get access to unlimited Diversity selection, subscribe to our Library Design (link) package.+Diversity selection available in the [[freepackage|Free package]] is limited to 10,000 input molecules per month. To get access to unlimited Diversity selection, [[subscriptionpackages|subscribe]] to our Library Design package.
  
 ===== Algorithm ===== ===== Algorithm =====
  
-Diversity selection utilizes an optimized implementation of the stepwise elimination algorithm(ref4), which can be described as follows:+Diversity selection utilizes an optimized implementation of the stepwise elimination algorithm (R. J. Taylor, J. Chem. Inf. Comput. Sci., 1995, 35, 59-67.), which can be described as follows:
  
-Calculate the similarity matrix of the molecules in the input collection +  * Calculate the similarity matrix of the molecules in the input collection 
-Process the matrix elements as follows: +  Process the matrix elements as follows: 
-Select the largest off-diagonal element in the similarity matrix +  Select the largest off-diagonal element in the similarity matrix 
-Eliminate one molecule of the most similar molecule pair randomly +  - Eliminates one molecule of the most similar molecule pair randomly 
-Go to step I. if off-diagonal elements remained +  Go to step 1. if off-diagonal elements remained 
-Sort the list of eliminated molecules by similarity values associated to the elimination steps in increasing order +  Sort the list of eliminated molecules by similarity values associated to the elimination steps in increasing order
-During this process, the size of the collection is reduced while the diversity of the collection is increased. Each elimination step filters out one molecule that has close analogues in the remaining set. As a result, the remaining molecules will have a decreased similarity (increased diversity).+
  
-The average run time for 10,000 input molecules is about minute.+During this processthe size of the collection is reduced while the diversity of the collection is increased. Each elimination step filters out one molecule that has close analogues in the remaining set. As a result, the remaining molecules will have decreased similarity (increased diversity).
  
-1) http://en.wikipedia.org/wiki/Jaccard_index 
-2) Open Babel v2.3.90 http://openbabel.sourceforge.net/ 
-4) R. J. Taylor, J. Chem. Inf. Comput. Sci., 1995, 35, 59-67. 
diversitysel.1350150542.txt.gz · Last modified: 2012/10/13 17:49 by sanmark