QSAR is an acronym for quantitative structure-activity relationship, which is a widely used ligand-based virtual screening approach for quantitatively correlating the structural features for a set of compounds with their respective biological activity (or simply “bioactivity”). QSAR modeling may be a cumbersome task owing to the flexibility of the individual components of the QSAR workflow.

### QSAR workflow

A typical QSAR workflow comprises of the following steps:

- Compile a dataset for QSAR modeling
- Calculating the molecular descriptors for describing the structural features of the compounds in the dataset
- Select a subset of descriptors to use via rational selection or feature selection
- Perform data splitting (perhaps via Kennard-Stone algorithm) to separate the dataset into internal and external sets (i.e. corresponding to 80 and 20% from the original dataset)
- Construct the QSAR model using the internal set as training data
- Apply the above QSAR model against the external set
- Compare the statistical performances of QSAR models from internal and external sets
- Assess the robustness of the QSAR model (i.e. possibility of chance correlation?) via
*R^2-Q^2*, Y-scrambling (if regression model), applicability domain, etc.

### Links to QSAR paper

#### General overview

*Nantasenamat et al.*A practical overview of quantitative structure-activity relationship EXCLI J 2009;8:74-88.- Advances in computational methods to predict the biological activity of compounds
*Cherkasov et al.*QSAR modeling: where have you been? Where are you going to? J Med Chem. 2014;57(12):4977-5010.#### QSAR workflow

*Tropsha and Golbraikh*. Predictive QSAR modeling workflow, model applicability domains, and virtual screening Curr Pharm Des 2007;13:3494-3504.*Tropsha*. Best Practices for QSAR Model Development, Validation, and Exploitation. Mol Inf 2010;29:476-488.

#### Data splitting

*Golbraikh and Tropsha*. Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection (J Comput Aid Mol Des 2002;16:357-369.)(Mol Divers 2000;5:231-243.)*Martin et al.*Does rational selection of training and test sets improve the outcome of QSAR modeling?

#### Statistical measures

*Golbraikh and Tropsha*. Beware of q2!