Hyperparameter tuning using GridSearchCV and RandomizedSearchCV in Python

In the previous post, we had a brief discussion about the GridSearchCV and RandomizedSearchCV. Now, in this post, we will demonstrate that how we can use the GridSearchCV and RandomizedSearchCV methods available with the Sci-kit learn library for hyperparameter tuning in Python. We will use the sklearn built-in diabetes dataset in this demo. However, if you want, you can use any other dataset also.

Hyperparameter tuning using GridSearchCV

In the previous post, we have already discussed that how GridSearchCV works. We know that it loops through each and every combination of the parameters fed into the parameter grid and finds out the model score using the given scoring method.

Below are the steps that explain how it works:

  1. Feed a bunch of hyperparameter values into the parameter grid.
  2. Instantiate a GridSearchCV model
  3. Fit each of the hyperparameter value separately to tge GridSearchCV and get the performance score
  4. Choose the best performing set of values from the parameter grid
  5. Use the best performing values in the estimator

Let’s use the above steps in Python using Scikit learn library.

# Import built-in diabetes dataset from sklearn library
from sklearn.datasets import load_diabetes

# Import Lasso regularized regression model
from sklearn.linear_model import Lasso

# Import GridSearchCV from model_selection module
from sklearn.model_selection import GridSearchCV

# Load diabetes datasets into a variable
data = load_diabetes()

# Import the numpy library
import numpy as np

# Let's create a list of alpha values which is a hyperparameter for Lasso regression
alphas = np.array([0.01, 
                    0.001, 
                    0.0001, 
                    0.0002, 
                    0.0003, 
                    0.0004, 
                    0.0005])

# Create a parameter grid dictionary with hyperparameter values
paramgrid = {'alpha': alphas}

# Instantiate a Lasso regularized model 
model = Lasso()

# Instantiate the GridSearchCV method
grid = GridSearchCV(estimator = model, param_grid = paramgrid)

# Fit the data into model using GridSearchCV
grid.fit(data.data, data.target)

# Get the best score of Lasso regression model and print it
print('\n')
print('*' * 100)
print("The best score of the model using Lasso regresssion and given alpha values is: {0}".format(grid.best_score_))
print('*' * 100)
print('\n')

# Get the best score of hyperparameter alpha for Lasso regression and print it
print('*' * 100)
print("The best hyperparameter value for alpha for Lasso regression is: {0}".format(grid.best_estimator_.alpha))
print('*' * 100)
print('\n')

Output

Hyperparameter tuning using GridSearchCV
Hyperparameter tuning using GridSearchCV

In this demo, we are using GridSearchCV to find the best alpha value for Lasso regression. Lasso regression is a regularized regression model that can be used to solve regression problems. First, we have created a list of alpha values and then fed that into the parameter grid. Finally, we have passed these values into the GridSearchCV to test and score to get the best hyperparameter value out of the list of values.

Hyperparameter tuning using RandomizedSearchCV

All the above-mentioned steps apply to the RandomizedSearchCV also. The only difference is the way it scans the given parameter grid. Rather than scanning and trying the grid values in sequence, it chooses the values randomly from the given grid.

# Import built-in diabetes dataset from sklearn library
from sklearn.datasets import load_diabetes

# Import Lasso regularized regression model
from sklearn.linear_model import Lasso

# Import RandomizedSearchCV from model_selection module
from sklearn.model_selection import RandomizedSearchCV

# Load diabetes datasets into a variable
data = load_diabetes()

# Import the numpy library
import numpy as np

# Let's create a list of alpha values which is a hyperparameter for Lasso regression
alphas = np.array([0.01, 
                    0.001, 
                    0.0001, 
                    0.0002, 
                    0.0003, 
                    0.0004, 
                    0.0005,
                    0.0006,
                    0.0007,
                    0.0008,
                    0.0009])

# Create a parameter grid dictionary with hyperparameter values
paramgrid = {'alpha': alphas}

# Instantiate a Lasso regularized model 
model = Lasso()

# Instantiate the RandomizedSearchCV method
grid = RandomizedSearchCV(estimator = model, param_distributions = paramgrid, n_iter=10)

# Fit the data into model using RandomizedSearchCV
grid.fit(data.data, data.target)

# Get the best score of Lasso regression model and print it
print('\n')
print('*' * 100)
print("The best score of the model using Lasso regresssion and given alpha values is: {0}".format(grid.best_score_))
print('*' * 100)
print('\n')

# Get the best score of hyperparameter alpha for Lasso regression and print it
print('*' * 100)
print("The best hyperparameter value for alpha for Lasso regression is: {0}".format(grid.best_estimator_.alpha))
print('*' * 100)
print('\n')

Output

Hyperparameter tuning using RandomizedSearchCV
Hyperparameter tuning using RandomizedSearchCV

Conclusion

We should use GridSearchCV when the number of parameters to try is small. And, if the number of parameters to try is large and the training time of the model with the given dataset is significantly high, we should prefer to use RandomizedSearchCV over GridSearchCV.

Thanks. for the reading. Please share your inputs in the comment section.

Rate This
[Total: 1 Average: 5]

Leave a Comment

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.

This site uses Akismet to reduce spam. Learn how your comment data is processed.