#### INTRODUCTION

Inhibition of the Rho-associated protein kinase (ROCK) signaling pathway induces various neuronal functions such as activation of neurite outgrowth, axonal regeneration, and pro-survival Akt. A recent outcome revealed the new site of ROCK as a new strategy to treat neurological disorders. One of the best-defined effectors of the small guanosine triphosphate (GTP) binding proteins of the Rho subfamily (RhoGTPases) is Rho-associated coiled-coil-containing protein kinase/ROCK/Rho kinase (ROK), associated with the protein kinase A/G and C family of serine-threonine kinases. Rho GTPases are members of the Ras superfamily of GTP hydrolase enzymes, used as molecular devices that can handle various signaling pathways by converting biochemically GDP-bound (inactive) to GTP-bound (active) state,^{1-3} and the whole process is controlled by two main classes of proteins: GTPase-activating proteins and guanine nucleotide-exchange factors, which increase intrinsic GTPase activity, and gear up the conversion of GDP to GTP, respectively, which activates numerous downstream effectors such as (ROK/ROCK).^{4} Activation of downstream Nogo receptor family members or chondroitin sulfate proteoglycan receptors by ROCK signaling stimulates axon growth inhibition; thus ROCK signaling pathway inhibition may be considered a promising strategy for the axon regeneration process.^{5} In the treatment of spinal cord injuries, analysis of 30 preclinical studies reported that inhibition of ROCK improves locomotor recovery by 15%.^{6} A phase 1 and 2 clinical trial (Japan Primary Registries Network identifier UMIN000000825) was designed to consider the safety and viability of the combination of fasudil and olfactory mucosa autograft in patients, but the results were not divulged, and no results have been found effectual for spinal cord injury. Thus, there is an urgent call for safe and affordable neuro-regenerative agents.

Optimization and development of a new drug involve a lot of effort and cost nearly $900 million to pharmaceutical or biotechnology companies. To overcome these unavoidable problems, applications of computer assisted drug design or *in silico* techniques have been exclusively used to develop new safe, cheap, and biologically active chemical entities and establish their role of specificity in determining the biological activity of ROK, and also gave the prospect of possible molecular interactions of a specific inhibitor that binds to the active site of the concerned ROK enzyme. Thus, the current study aimed to build a Quantitative structure–activity relationship (QSAR) model by implementing diverse chemometric techniques to achieve improved inhibitory action towards ROK enzyme as a neuro-regenerative agent.

#### *2D QSAR or Hansch–Fujita analysis*

QSAR analysis ascertains mathematical correlations among structural and/or physicochemical properties called descriptors and correlated to the experimentally measured (biological) activity. The most important recent developments in the field involve a substantial increase in the size of experimental datasets available for analysis and an increased application of QSAR models as virtual screening tools to discover biologically active molecules in chemical databases and/or virtual chemical libraries. To date no data (QSAR model) have been reported for the treatment of the central nervous system (CNS) spinal cord injuries, and thus we rely on our current model, which provided important information for finding improved new and safe oral bioavailable molecules.

#### MATERIALS AND METHODS

#### *Experimental data set (generation of 3D structure, charge calculation, and optimization)*

The chemical structures of compounds^{7} were sketched using ChemDraw (8.0) and were imported in the Tools for structure–activity relationship (TSAR) 3.3 sheet, i.e. TSAR (TSAR: assimilated analysis package). Once the structures are imported, the negative logarithm of the IC_{50 }is necessary as actual activities are often skewed and are measures of the free energy of binding and so it was introduced in another column of the TSAR sheet. Prior to the descriptors’ calculation, the structures were subjected to CORINA, which is often used to create 3D structures of typically drug-like molecules.^{8} The geometry and energy were optimized in order to obtain the minimum potential energy conformer, which is a measure of the stability of the conformer.^{9} Molecular energies are evaluated by summing (bond length, bond angle, torsion angle, Van der Waal’s, and coulombic) terms for all suitable sets of atoms.

#### *Preparation of the data set and data reduction*

The main reason behind descriptor calculation is to decode the information concerning the physicochemical properties of each and every molecular structure liable for specific biological activity of the molecule. Up to 500 descriptors were computed in the TSAR 3D sheet (inbuilt programming of calculating physicochemical properties) for a single molecule. In order to choose only relevant and significant sets of descriptors, data reduction was carried out to eliminate the prevalence of coincidental correlations as well as data redundancy. Firstly, descriptors with “0” values for each compound were deleted. The data were reduced on the basis of a correlation matrix developed between two descriptors. These correlation values indicate the height of co-linearity. Values somewhat close to 1 indicate the extent of better fitting of the regression model. If the intercorrelation values of the two variables were 0.5 or higher then those descriptors were retained, but if they were below 0.5 then they were eliminated from the data sheet. Finally, five descriptors, i.e., inertia moment 2 size (WM), vesicle-associated membrane protein (VAMP) polarization YY (WM), VAMP dipole Y component (WM), VAMP dipole Z component (WM), and Kier chiV6 path (WM), were found to be highly correlated with the actual activity.

#### *Training and test set*^{10}

Series of 41 analogues were segregated into training (29 compounds) and test (11 compounds) sets on the basis of diversity in the structures and the biological activity of the compounds. The compounds in the training and test sets were employed to develop and validate the predictability of the concerned model. Some of the compounds may behave as outliers and thus they have to be discarded during the process of model development.

#### *Model development and its validation*

#### *Linear regression analysis*

Linear regression analysis helps to establish the correlation between the independent and dependent variables (log1/IC_{50}) of the series. The construction of the regression model was done using training set compounds and the significance of every model was determined on the basis of statistical values^{11} such as r^{2} value (correlation between dependent and independent variables), cross validation r^{2}_{cv} of the training set (which should be greater than 0.8), f value (degree of statistical confidence, which should be high) and s-value (standard error of estimate, which ought to be minimum). The various statistical parameters are shown in Table 1. Compounds of the test set are for the prognostic ability of the purposed model.^{12} For a predictive model, the value of the correlation coefficient (r^{2}) of the test set should be greater than 0.6 and less than the r^{2} of the training set.

Partial least square analysis was used to check the robustness of the model generated by the multiple least square regression analysis. To validate the results obtained from the multiple linear regression (MLR) technique, the same data set was subjected to partial least squares (PLS) analysis. The correlation coefficient r^{2} and the r^{2}_{cv} value of the training and test sets were evaluated to ascertain the quality of the developed PLS models.^{13}

#### *Nonlinear regression analysis*^{14}

Artificial neural networks mimic the functioning of simulating the learning process by neural networks. They employ interconnections of artificial neurons with the help of computational studies and are capable of dealing with nonlinear and irregular data. Neural network analysis was carried out on the same descriptors of MLR analysis. The NNA involved three layers: input, hidden, and output layers. The input layer worked by receiving data, whereas the output layer generated the dependent variable. The hidden layer interconnected the two abovementioned layers. The final structural design of the generated NN model was (5-2-1).

#### *Validation of the model*

__Cross-validation (r2cv):__ This validation technique was employed to appraise the reliability of the developed statistical models. Number of compounds was shuffled from the test to the training set and vice versa for generation of the precise model. Leave-one-out methodology (when a molecule is removed from the training set and included in the test set and vice versa to predict the activity of a compound) was performed to get the final model. The cross-validation test (r^{2}_{cv}) value should be greater than 0.60.^{15}

__Activity prediction of test set compounds: __In general, r^{2 }of the test set greater than 0.6 represents good prognostic ability of the model.^{16}

#### *Outliers*

Outliers are the data points that are fitted far apart from the linear model and have some different mode of binding. An outlier of a QSAR model refers to a data point that falls outside the confidence interval of the regression line. The compounds with higher residual values that deviated from the regression line were deleted as outliers and they adversely affected the robustness of the prospective model.

#### RESULTS AND DISCUSSION

Data sets of various analogues were employed for the present studies. Their chemical structures and biological activity (Log/IC_{50}) are presented in Table 2. In order to assess the major molecular factors that greatly affect the ROK inhibition of derivatives belonging to anilines and benzylamines, three chemometric tools (MLR, PLS, and NN) were used to construct classical descriptor-based (QSAR) models. The compounds were divided into training and test sets and the training set comprised 29 compounds (excluding 1 outlier) and the test set 11.

#### *Statistical analysis*

#### *MLR and PLS (linear regression method)*

Initially, more than 200 descriptors were calculated for regression analysis. Due to the copious and redundant data, there was a very low value for r^{2}_{cv} (0.312), implying inadequate internal predictability. There was a strong necessity to build a reliable and informative set of descriptors having good correlation with the biological activity with no intercorrelation. After the deletion of the undesirable set of descriptors, the correlation matrix technique was employed and eventually five distinct physicochemical descriptors were obtained: inertia moment 2 size, VAMP polarization YY, VAMP dipole Y component, VAMP dipole Z component, and Kier chiV6 path. For QSAR analysis, the training set molecules were used to construct (linear and nonlinear) models so that a precise relationship could be established between molecules and biological activity and the molecules of test set served to check the robustness of the model. During the creation of the purposed model, one compound was shown as an outlier and not to fit either the training set or the test set; also the residual value was more than two orders of magnitude. Owing to this, outlier 5b with different prediction was omitted from the training set. The correlation matrixes of the descriptors are shown in Table 3. The selected model was assessed on the basis of different statistical values, such as regression coefficient (r), coefficient of determination (r^{2}), prognostic power of the model (r^{2}_{cv}), standard deviation (SD), sequential Fisher’s ratio (F), and test for statistical significance (t). The value of r^{2 }ought to be >0.6 and the value of r^{2}_{cv} >0.7. Statistical outputs of the MLR and PLS models are summarized in Table 4. The statistical worthiness of the developed model was evaluated in terms of square of the correlation coefficient, where r^{2} (MLR=0.913 and PLS=0.912) values explain 91% variance in activity, which indicates a measure of good fit by the regression equation. A small difference in (MLR) r^{2} (0.913) and r^{2}_{cv} (0.862) values implies high prognostic ability of the model. Similarly in the PLS analysis, there is a slight difference in r^{2} (0.912) and r^{2}_{cv} (0.869) values, which further ascertains the robustness of the model. The r^{2}_{cv} values for the MLR and PLS models (r^{2}_{cvMLR}=0.862 and r^{2}_{cvPLS}=0.869) were evaluated and both models had comparable r^{2}_{cv} values. The F-test indicates the significance level of the model. The F-value of the final MLR model, 48.41, clearly shows the statistical significance of the derived model. The ‘s’ value is the (standard) error of the regression model, and it should be low for better QSAR model generation; this is an approximation of how precisely the model will predict unknown ‘Y’ values. The value of ‘s’ for the best MLR model is 0.373 and for PLS 0.307. It signifies that regression with an ‘s’ value of 0.3 should predict Y values with a standard error of 0.3 units. The final model has the highest correlation coefficient (0.91), confirming the robustness of the model and also the model was cross validated and the r^{2}_{cv} value of 0.86 depicted the strength of the model. For further assurance of the significance of the descriptors, their t-test values, coefficient values, jackknife standard error (SE), and covariance SE values (Table 5) were evaluated. The values of all these parameters for all five descriptors confirmed the significance of an individual descriptor in determining the importance of structural design in exhibiting ROK inhibitory activity by a molecule. The generated model was used to understand the structural dependency of the biological activity demonstrated by the certain set of ROK inhibitors, and generated following equation 1:

*Original data*
Y = -0.0012895674*X1 + 0.092464715*X2 + 0.22125481*X3 +0.34796265*X4 + 1.5525457*X5 – 3.9303896 *Standardized data*
Y = -0.2757968*S1 + 0.45911208*S2 + 0.35567212*S3 + 0.4776836*S4 – 0.64006561*S5 – 1.344354 |

Here X1= inertia moment 2 size (whole molecule), X2= VAMP polarization YY (whole molecule), X3= VAMP dipole Y component (whole molecule), X4= VAMP dipole Z component (whole molecule), X5= Kier ChiV6 (path) index (whole molecule), and Y represents the biological activity.

The PLS method is an addition to the MLR approach that evaluates the results to obtain assurance of the consequences when they show minimal variation and have comparable results. Statistically, the PLS equation evaluated the robustness of the developed model on the basis of statistical parameters, i.e., r^{2}.

The regression equation obtained by the PLS method generated: *Equation 2;*

Y = -0.0013*X1 + 0.0914*X2 + 0.2306*X3 + 0.3482*X4 – 1.5262*X5 – 3.8318 |

#### *Prediction of test set compounds*

The predictive ability of the intended model was validated by a set of eleven compounds. The difference between (actual and predicted) activity values of MLR and PLS analysis should be minimal for assessment of the quality of the developed models.

#### *Artificial neural networks (artificial nonlinear regression analysis method)*

Artificial neural networks (ANN) analyses were carried out with the same descriptors that were used for linear regression analysis in an endeavor to improve the results obtained for ROK inhibition. In the present study, inputs for the neural network were the descriptors, while the output was the log1/IC_{50 }values. In artificial neural network analysis, TSAR software automatically computes the number of hidden neurons, as well as patterns of the training and test sets. The number of neurons in the hidden layer and the number of rows in the training set are balanced to achieve the optimum predictive power for the neural network. The NN with three hidden neurons and 30% of compounds excluded as the test set was the most successfully trained NN model for* in vitro* ROK inhibition data as compared to the other NN models. Each analysis was repeated several times so that test RMS fit and best RMS fit were nearer to each other. Values of actual and predicted activity of the training/test set of compounds and their corresponding graph and also dependence plots of the descriptors with the output value (log1/IC_{50}) clearly demonstrate that three descriptors, VAMP polarization YY, VAMP dipole Y component, and VAMP dipole Z component, are positively correlated to the output value (log1/IC_{50 }value), while two descriptors, inertia moment 2 size and Kier chiV6 path index, are negatively correlated to log1/IC_{50}, which is in agreement with the results of MLR analysis.

#### *Comparison between linear and nonlinear methods*

Different statistical approaches were employed to quantify the predictive ability of the generated models in terms of statistical fit (r^{2}). Both MLR and PLS (two linear methods) were studied and had comparable results. The r^{2} values for the training set for QSAR analysis were r^{2}_{MLR}=0.91, r^{2}_{PLS}=0.91, and r^{2}_{ANN}=0.93 and for the test set the values were r^{2}_{MLR}=0.75, r^{2}_{PLS}=0.75, and r^{2}_{ANN}=0.73. On the basis of the predictive power of the model it is very clear that conventional MLR and PLS as well as ANN analysis generated a highly robust and predictive model. The actual and predicted activity for the compounds of the training and test sets were obtained after MLR, PLS, and NNA and are shown in Table 6 and their corresponding graph are shown in Figures 1, 2, 3. A summary of the performance of the different prediction models is given in Table 7.

#### *Mechanistic interpretation of the descriptors*

The results of MLR, PLS, and NN reveal the importance of inertia moment 2 size (WM), VAMP polarization YY (WM), VAMP dipole Y component (WM), VAMP dipole Z component (WM), and Kier chiV6 path (WM) and in fact a strong correlation was observed between ROK inhibitory activity and the five descriptors (Table 8). The dependence plots of the descriptors with the output value (log1/IC_{50}) are shown in Figure 4 and the structure–activity relationship derived from QSAR analysis is shown in Figure 5.

#### *Kier ChiV6 path index*

Out of the five parameters, Kier ChiV6 path index (WM) was shown to be an essential descriptor in defining the biological activity of urea-based derivatives as evident from the correlation matrix and the t-value. Kier ChiV6 path index was initially defined by Randic and subsequently by Kier and Hall. It illustrates a number of series represented by “order” and “subgraph” type. By definition, a chi index is a calculation of a known type of subgraph such as path (P), cluster (C), path/cluster (PC), and ring (CH), weighted by a function of the delta values. The descriptor highlights different aspects of atom connectivity within a molecule. It also helps us to examine the substitution pattern in benzene rings and the amount of branching rings.^{17} In our study it was observed that the Kierchiv6 (path index) descriptor is negatively correlated with permeability; according to the study the sixth order valence connectivity index (Kierchiv6) encodes structural complexity, such as the size and heteroatom content of the rings. This complexity is observed in the least and most active compound. As the size of the compound decreases, biological activity increases.

#### *VAMP polarization YY*

VAMP polarization YY is a spatial descriptor that calculates the electronic properties of a compound and projects polarization towards YY planes. There is a direct relation between polarizability and the number of valence electrons on every atom.^{18} A positive correlation of VAMP polarization YY with the biological activity reveals a direct link between the chemical reactivity index and biological activity. The compounds 12a, 12b, 12c, 12e, and 14c, having high values of VAMP polarization YY, are the most active, whereas compound 8e, with low VAMP polarization YY, is the least active.

#### *VAMP dipole Y component*

VAMP dipole Y component is an electronic parameter and is due to the degree of charge separation in a molecule. It describes the substituent point of attachment with the bond sited along the Y-axis.^{19} A positive correlation between VAMP dipole Y and biological activity reveals a direct link between the chemical reactivity index and biological activity.

#### *Moment of inertia 2 (size)*

Inertia moment 2 size (WM) with reference to an axis is defined as the product of the mass times the distance from the axis squared. The higher positive correlation coefficient of inertia moment with permeability data and the high t-values (t-values define the statistical significance of a descriptor) suggest that the orientation behavior with respect to the size of whole molecule is of utmost importance in the binding interaction with the receptor site as well as in imparting greater permeability. In the present study the biological activity increases with a decrease in moment of inertia 2 (size) of the whole molecule. This phenomenon can be explained by taking the example of most active compound that has low value of moment of inertia 2 (size) of the whole molecule in comparison to the least active compound that has high value of moment of inertia 2 (size) of the whole molecule. Hence it can be concluded that by decreasing the moment of inertia 2 (size) ROK inhibitory activity can be increased.

#### *VAMP dipole X component *

Vamp is a semiempirical molecular orbital package used to determine electrostatic properties and perform optimizing of structure such us total energy, electronic energy, nuclear repulsion energy, accessible surface area, atomic charge, mean polarizability, heat of formation, highest occupied molecular orbital and lowest unoccupied molecular orbital eigenvalues, ionization potential, total dipole, polarizability, and dipole components. The positive coefficient of this expression in the proposed model elucidates that the higher the value, the better is the activity, and indicates that biological activity with respect to ROK inhibition is directly dependent upon the chemical permanence of the compounds in biochemical systems. Some exciting facts were revealed during the analysis of the derived descriptors and their correlation with the structural design of the molecules. In the comparison of the least active molecule with the most active compound of the selected series, we found that when the least active compound (8e) was substituted with dimethylamine the shape and volume of the molecule were altered, which eventually changed the optimal binding affinity of the molecule. However, in the most active compounds (12a, 12b, 12c, 14c, and 12e) dimethylamine was replaced with either simple alkyl or oxygen-containing alkyl substitutions, which gradually reduced the size and volume of the molecule, thus resulting in enhancement of the ROK inhibitory activity. Interestingly, both alkyl and oxygen-containing alkyl chains have analogous molecular mass and, thus, no further bulk was loaded onto the molecule. Attribution of alkyl or oxygen-containing alkyl substitutions positively affects the electrostatic nature of the substituent and thus additional atoms contributing to the energy, resulting in enhanced biological activity, and also showing clearly that hydroxyl or oxygen substitutions are highly electronegative, which augments the overall polarizability of the molecule. However, the negative correlation of Kier Chiv6 (path) index at R_{1} clearly reveals that reductions in the bulkiness and volume at certain positions in the whole molecule lead to an increased biological profile. Additionally, compounds 12a, 12b, 12c, 14c, and 12e have lower molecular mass (378.52, 350.46, 362.47, 336.43, and 366.46) than compound 8e (407.57), also confirming the authentication of the descriptor, which leads to compression in the shape of the molecule, allowing it to conveniently enter the binding site and align in such a way that it fits snugly with the walls of the active site.

#### *Absorption, distribution, metabolism, and excretion studies*

The ‘rule of five’, given by Lipinski, is known as the therapeutic relevance or property of drug-likeness. It is an empirical approach traditionally utilized for calculating drug-like properties in a molecule that clearly postulates that molecules with a molecular weight less than 500, log p<5, hydrogen bond donors less than 5, and hydrogen bond acceptors less than 10 exhibit an excellent pharmacokinetics profile in terms of absorption or permeation through the biological membrane.^{20} This rule explains the absorption, distribution, metabolism, and excretion of bioactive compounds in a superior organism. Lipinski’s rule of five was calculated for the particular series of compounds and no molecule was found to have violated the above stated set of rules (Table 9). This overtly indicates that all compounds showed adequate pharmacokinetic profiles.

#### *Study limitations*

The findings of the present QSAR analysis will be advantageous only for the modeling of potent ROK inhibitors as active neurological agents. For the future aspects we will try to screen multitargeted novel ROK inhibitors.

#### CONCLUSIONS

A 2D QSAR study was performed to establish a structural and physicochemical relationship required for the inhibition of a molecular target against a neurological disease, i.e., ROK. The statistically significant model highlighted the significance of electronic, topological, and steric descriptors. The authenticity of the projected model was checked by validation and cross-validation (r^{2}_{cv}) based on leave-one-out methodology. Overfitting of the models was checked by considering the difference between r^{2} and r^{2}_{cv}. In a planned study, an attempt was made to understand the dependence of biological activity on the structural design accountable for their specific ROK inhibition. The main problem regarding the CNS-related drug is its ability to cross the blood–brain barrier, and for crossing this barrier there should be optimal log P, volume, shape, molecular mass, and polarizability. The proposed model overtly points towards the introduction of optimal bulk or charge distribution along with the shape and size of the molecules to determine the binding efficacy of the molecule to the receptor domain, which eventually increases the inhibitory profiles of the selected molecules. VAMP polarization YY component (WM), VAMP dipole Y component (WM), and VAMP dipole Z component (WM) descriptors were positively correlated with activity while the other two, Kier ChiV6 path index (WM) and moment of inertia 2 (size) (WM), were negatively correlated descriptors, and projected molecular structure information with reference to a specific rotation axis or the rotational analogue to mass, and groups that decrease Kier ChiV6 and inertia of moment at substitutions will increase the predictability of the model. The above final model reveals the significance of selected descriptors and their correlation with biological activity and provided substantial insights to plan new chemical scaffolds with improved selectivity outlines. Design of ROK inhibitors incorporating the appropriate essential features or descriptors will increase the chances of getting new better molecules with enhanced inhibitory profiles.

ACKNOWLEDGEMENT

The authors are thankful to the Vice-chancellor of Banasthali University for providing the necessary research facilities.

*Conflict of Interest: No conflict of interest was declared by the authors.*