User Tools

Site Tools


diversity_selection

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
diversity_selection [2012/07/02 20:41] – [Advanced options] rkissdiversity_selection [2012/07/02 20:50] – [Algorithm] rkiss
Line 24: Line 24:
 Under Advanced options, you can adjust the definition of similarity/dissimilarity of molecules. You can select the descriptor used for calculating the similarity scores. Currently two fingerprints (OpenBabel Linear Fingerprint and Indigo Similarity Fingerprint) are available.  Under Advanced options, you can adjust the definition of similarity/dissimilarity of molecules. You can select the descriptor used for calculating the similarity scores. Currently two fingerprints (OpenBabel Linear Fingerprint and Indigo Similarity Fingerprint) are available. 
  
-You will be able to set different similarity metrics as the measure of similarity. Currently, only the Tanimoto coefficient (Jaccard index)((http://en.wikipedia.org/wiki/Jaccard_index)) is implementedas the measure of similarity.+You will be able to set different similarity metrics as the measure of similarity. Currently, only the Tanimoto coefficient (Jaccard index)((http://en.wikipedia.org/wiki/Jaccard_index)) is implemented as the measure of similarity.
  
 We plan to introduce more descriptors and more similarity measure types in the future. We plan to introduce more descriptors and more similarity measure types in the future.
  
-  * **Molecular descriptor**: the molecular descriptor used to represent chemical structures during the calculation+  * **Molecular descriptor**: the molecular descriptor applied for representing chemical structures during the calculation
  
 ==== Default options ==== ==== Default options ====
  
-The default descriptor used is the linear fingerprint implemented in Open Babel ((Open Babel v2.3.90 http://openbabel.sourceforge.net/)), which is similar to Daylight’s fingerprint and Chemaxon’s linear fingerprint, and the Tanimoto coefficient is calculated as the similarity of fingerprints. +The default descriptor used is the linear fingerprint implemented in OpenBabel ((Open Babel v2.3.90 http://openbabel.sourceforge.net/)), which is similar to Daylight’s fingerprint and ChemAxon’s ((http://www.chemaxon.com/jchem/doc/user/fingerprint.html)) linear fingerprint, and the Tanimoto coefficient is calculated as the similarity of fingerprints.
- +
-If you have no suggestions to use another setup, you can rely on our choices. After implementation and evaluation of new fingerprints and metrics, the default setup can be changed. This can be tracked at the end of this document, in the Changelog section. +
 ==== Algorithm ==== ==== Algorithm ====
  
-We use an optimized implementation of the stepwise elimination algorithm((R. J. Taylor, J. Chem. Inf. Comput. Sci., 1995, 35, 59 67.)), which can be described as follows:+We use an optimized implementation of the stepwise elimination algorithm((R. J. Taylor, J. Chem. Inf. Comput. Sci., 1995, 35, 59-67.)), which can be described as follows:
  
-  - calculate the similarity matrix of the molecules in the input collection +  - Calculate the similarity matrix of the molecules in the input collection 
-  - process the matrix elements as follows: +  - Process the matrix elements as follows: 
-    - select the largest off-diagonal element in the similarity matrix +    - Select the largest off-diagonal element in the similarity matrix 
-    - eliminate one molecule of the most similar molecule pair randomly +    - Eliminate one molecule of the most similar molecule pair randomly 
-    - go to step I. if off-diagonal elements remained +    - Go to step I. if off-diagonal elements remained 
-  - sort the list of eliminated molecules by similarity values associated to the elimination steps in increasing order+  - Sort the list of eliminated molecules by similarity values associated to the elimination steps in increasing order
  
 During this process, the size of the collection is reduced and diversity increases. Each elimination step throws out a compound that has close analogues in the remaining set. In result, we get a single compound, and a list of compounds with decreasing similarity values, which can be interpreted as the increasing diversity of the remaining set.  During this process, the size of the collection is reduced and diversity increases. Each elimination step throws out a compound that has close analogues in the remaining set. In result, we get a single compound, and a list of compounds with decreasing similarity values, which can be interpreted as the increasing diversity of the remaining set.