This is an old revision of the document!

WORKFLOW VALIDATION

If you have a workflow and you are planning to execute it on a large dataset, it is suggested to first test the efficiency of the workflow. A virtual screening workflow, for example, can be tested using known reference ligands and “decoy” (presumable) inactive compounds by analyzing how the workflow can discriminate the known reference ligands from the decoys. Using the Workflow validation feature, you can generate a rank ordered list of the actives and decoys and the most critical graphs are displayed that help you judge if the workflow is worth to be executed on a larger collection and will be likely identify new actives.

To do a workflow validation, you first need to execute your workflow on a set of reference ligands. You might upload a set of actives as SMILES strings or as SDF file.

After that, you can run your workflow on a set of “decoys”. These are compounds either proved to NOT act on your target (inactives), or presumable inactive compounds. The chances that a randomly selected compound is inactive on a target is magnitudes larger than the chances that it will be active. Therefore randomly selected compounds might be also used as decoys. It is also suggested to select decoys with similar physicochemical properties as the reference ligands to make sure your workflow don't discriminates on the basis of a few basic physicochemical properties.

After you ran your workflow on both the reference ligand and decoy collections you can go to Hit identification / Workflow Validation and select these two collections together with the “Metric” (parameter that will be used for ranking/scoring). The metric can be e.g. a similarity or a docking score that has been generated for your collection after you ran your workflow on it. You also have to specify how many reference ligands and decoys you had originally (this is required, because some entries might not make it to the output collection, and therefore the number of entries in the input and output collections might differ. For example, it is possible that a docking pose could not be generated for an input entry.

After the collections, the metrics and the number of reference ligands/decoys have been specified, you can select if you would like to download the result of the Workflow validation analysis as a CSV file or display it on the screen.

During workflow validation, all entries of the reference ligand and decoy collections will be united into a single database and all entries will be ranked by the selected Metric. When results are displayed on the screen, the number of retrieved actives and the enrichment factors will be plotted as a function of the ranked database. As a reference “ideal” and “random” discrimination will be also shown.