Using Reinforcement Learning for Hyperparameter Tuning#

Introduction#

In modern machine learning tasks, choosing the best hyperparameters for a model to perform well is essential. In this tutorial, we use a reinforcement learning approach to automatically tune the hyperparameters of a random forest classifier. We simulate a dataset and use a simple Q-learning algorithm to search for the best combination. This approach combines global exploration with local fine-tuning (called “exploitation”) to avoid getting stuck in a local optimum.

Prerequisites#

Dataiku 13.3
Python 3.9
A code environment with the following packages:
```
numpy
scikit-learn
```

Importing the required packages#

We first import all the libraries needed for this tutorial.

import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

Preparing the dataset#

We simulate a dataset for a classification task (binary by default) with 1000 samples and 20 features. The data is then split into training and validation sets.

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

Defining the hyperparameter space#

We define the options for the number of estimators and the maximum tree depth. These arrays represent our search space.

n_estimators_options = np.arange(50, 201, 50)
max_depth_options = np.arange(1, 11)

Initializing the Q table and reinforcement learning parameters#

We create a Q table filled with zeros. The table dimensions correspond to the number of options in each hyperparameter. We also set reinforcement learning parameters like epsilon, alpha, gamma, and the number of episodes.

q_table = np.zeros((len(n_estimators_options), len(max_depth_options)))

epsilon = 0.1   # probability of exploration (vs. exploitation)
alpha = 0.1     # learning rate
gamma = 0.9     # discount factor (importance of future rewards)
episodes = 50

Run reinforcement learning for hyperparameter Tuning#

We run a loop over several episodes and inner steps. In each step, we select a new state by either exploring (choosing random hyperparameters) or exploiting (choosing the best combination seen so far). Then, we train a random forest classifier with the selected hyperparameters and compute its accuracy on the validation set. This accuracy serves as our reward for updating the Q table.

Note

Using both episodes & inner steps is a way to restart the learning process from different initial conditions. Like the exploitation/exploration balance, it allows for global exploration and local fine-tuning and helps avoid getting stuck in local optima.

for episode in range(episodes):
    # Choose an initial state with random hyper parameters
    ne_idx = np.random.randint(len(n_estimators_options))
    md_idx = np.random.randint(len(max_depth_options))

    for step in range(10):  # Limit the number of steps per episode
        # Epsilon greedy action selection
        if np.random.rand() < epsilon:
            # Explore: choose random hyper parameters
            ne_idx_new = np.random.randint(len(n_estimators_options))
            md_idx_new = np.random.randint(len(max_depth_options))
        else:
            # Exploit: choose the hyper parameters with the highest Q value so far
            ne_idx_new, md_idx_new = np.unravel_index(np.argmax(q_table), q_table.shape)

        # Train model with the selected hyper parameters
        model = RandomForestClassifier(
            n_estimators = int(n_estimators_options[ne_idx_new]),
            max_depth = int(max_depth_options[md_idx_new])
        )
        model.fit(X_train, y_train)

        # Evaluate model on the validation set
        y_pred = model.predict(X_val)
        accuracy = accuracy_score(y_val, y_pred)

        # Update the Q value; here the reward is the accuracy
        q_table[ne_idx, md_idx] = q_table[ne_idx, md_idx] + alpha * (
            accuracy + gamma * np.max(q_table[ne_idx_new, md_idx_new]) - q_table[ne_idx, md_idx]
        )

        # Move to the next state
        ne_idx, md_idx = ne_idx_new, md_idx_new

Retrieving the best hyperparameters#

After the learning loop, we find the best hyperparameter combination by taking the indexes of the maximum values in our Q-table.

best_ne_idx, best_md_idx = np.unravel_index(np.argmax(q_table), q_table.shape)
best_n_estimators = n_estimators_options[best_ne_idx]
best_max_depth = max_depth_options[best_md_idx]

print("Best Number of Estimators:", best_n_estimators)
print("Best Max Depth:", best_max_depth)

Training the final model on the full dataset#

We train the random forest classifier on the full dataset using the best hyperparameters obtained from the reinforcement learning process.

best_model = RandomForestClassifier(
    n_estimators = int(best_n_estimators),
    max_depth = int(best_max_depth)
)
best_model.fit(X, y)

Wrapping up#

In this tutorial, we applied a reinforcement learning approach to tuning the hyperparameters of a random forest classifier. The Q learning algorithm helped us explore the hyperparameter space and gradually fine-tune the selection using the validation accuracy as the reward. This method can help avoid the common pitfalls of manual tuning and finding a balanced solution between exploration and exploitation. As a next step, you can follow this tutorial to import your trained model into a Dataiku Saved Model.

Here is the complete code of this tutorial:

notebook.py

import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

n_estimators_options = np.arange(50, 201, 50)
max_depth_options = np.arange(1, 11)

q_table = np.zeros((len(n_estimators_options), len(max_depth_options)))

epsilon = 0.1   # probability of exploration (vs. exploitation)
alpha = 0.1     # learning rate
gamma = 0.9     # discount factor (importance of future rewards)
episodes = 50

for episode in range(episodes):
    # Choose an initial state with random hyper parameters
    ne_idx = np.random.randint(len(n_estimators_options))
    md_idx = np.random.randint(len(max_depth_options))

    for step in range(10):  # Limit the number of steps per episode
        # Epsilon greedy action selection
        if np.random.rand() < epsilon:
            # Explore: choose random hyper parameters
            ne_idx_new = np.random.randint(len(n_estimators_options))
            md_idx_new = np.random.randint(len(max_depth_options))
        else:
            # Exploit: choose the hyper parameters with the highest Q value so far
            ne_idx_new, md_idx_new = np.unravel_index(np.argmax(q_table), q_table.shape)

        # Train model with the selected hyper parameters
        model = RandomForestClassifier(
            n_estimators = int(n_estimators_options[ne_idx_new]),
            max_depth = int(max_depth_options[md_idx_new])
        )
        model.fit(X_train, y_train)

        # Evaluate model on the validation set
        y_pred = model.predict(X_val)
        accuracy = accuracy_score(y_val, y_pred)

        # Update the Q value; here the reward is the accuracy
        q_table[ne_idx, md_idx] = q_table[ne_idx, md_idx] + alpha * (
            accuracy + gamma * np.max(q_table[ne_idx_new, md_idx_new]) - q_table[ne_idx, md_idx]
        )

        # Move to the next state
        ne_idx, md_idx = ne_idx_new, md_idx_new

best_ne_idx, best_md_idx = np.unravel_index(np.argmax(q_table), q_table.shape)
best_n_estimators = n_estimators_options[best_ne_idx]
best_max_depth = max_depth_options[best_md_idx]

print("Best Number of Estimators:", best_n_estimators)
print("Best Max Depth:", best_max_depth)

best_model = RandomForestClassifier(
    n_estimators = int(best_n_estimators),
    max_depth = int(best_max_depth)
)
best_model.fit(X, y)