QSAR is an acronym for quantitative structure-activity relationship, which is a widely used ligand-based virtual screening approach for quantitatively correlating the structural features for a set of compounds with their respective biological activity (or simply “bioactivity”). QSAR modeling may be a cumbersome task owing to the flexibility of the individual components of the QSAR workflow.
QSAR workflow
A typical QSAR workflow comprises of the following steps:
- Compile a dataset for QSAR modeling
- Calculating the molecular descriptors for describing the structural features of the compounds in the dataset
- Select a subset of descriptors to use via rational selection or feature selection
- Perform data splitting (perhaps via Kennard-Stone algorithm) to separate the dataset into internal and external sets (i.e. corresponding to 80 and 20% from the original dataset)
- Construct the QSAR model using the internal set as training data
- Apply the above QSAR model against the external set
- Compare the statistical performances of QSAR models from internal and external sets
- Assess the robustness of the QSAR model (i.e. possibility of chance correlation?) via R^2-Q^2, Y-scrambling (if regression model), applicability domain, etc.
Links to QSAR paper
General overview
- Nantasenamat et al. A practical overview of quantitative structure-activity relationship EXCLI J 2009;8:74-88.
- Advances in computational methods to predict the biological activity of compounds
Cherkasov et al. QSAR modeling: where have you been? Where are you going to? J Med Chem. 2014;57(12):4977-5010.
QSAR workflow
Tropsha and Golbraikh. Predictive QSAR modeling workflow, model applicability domains, and virtual screening Curr Pharm Des 2007;13:3494-3504.
Tropsha. Best Practices for QSAR Model Development, Validation, and Exploitation. Mol Inf 2010;29:476-488.
Data splitting
- Golbraikh and Tropsha. Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection (J Comput Aid Mol Des 2002;16:357-369.)(Mol Divers 2000;5:231-243.)
- Martin et al. Does rational selection of training and test sets improve the outcome of QSAR modeling?
Statistical measures
- Golbraikh and Tropsha. Beware of q2!