Unsupervised Transfer Learning#

Deep learning models are powerful tools for uncovering insights in data and driving value, however, the cost of data labeling and computation often creates a barrier to entry. Enter: transfer learning. At its core, transfer learning is the process of taking the “knowledge” (features, patterns, or weights) a model has acquired from one task and applying it to a different but related task. By leveraging a pre-trained foundation, users can significantly reduce the time and resources required to reach high performance.

For unsupervised transfer learning, instead of relying on labeled datasets to guide the adaptation, the model extracts underlying structures from a source domain to interpret unlabeled data in a new domain. This allows organizations to adapt a model to a new, target task, even when high-quality labels are scarce or non-existent.

This tutorial will be focused on implementing unsupervised transfer learning techniques. The starter model for this exercise is a convolutional autoencoder (CAE) trained on black and white, handwritten zeroes from the MNIST dataset to identify any non-zero handwritten digits as anomalous. In this example, we will walk through the steps required to teach the starter model how to identify colorful, non-zero handwritten digits from the MNIST-M dataset as anomalous. This exercise of a model learning patterns from an unlabeled dataset is known as self-supervised learning (SSL).

Prerequisites#

Dataiku >= 12.0
Python >= 3.11

A code environment with the following packages:

transformers
tokenizers
datasets
tensorflow
torch
pillow
peft
mlflow==2.22.1
tf-keras

Expected initial state:
- Base familiarity with neural networks and transformers
- A pre-trained image anomaly detection model

Unsupervised Transfer Learning for Image Anomaly Detection#

Importing the required packages#

The first thing you need to do is to import all necessary packages, as shown in the code below:

Code 1 – Import the needed packages#

# For interacting with the Dataiku API + loading data
import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
import numpy as np
# For training a convolutional autoencoder
import tensorflow as tf
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D, Conv2DTranspose, Reshape, Flatten, Dense
from tensorflow.keras.models import Model, model_from_json
from tensorflow.keras import backend as K
# For converting an image into a numpy array
from PIL import Image
import io
import tempfile

Loading the saved model and target transfer dataset#

The model loaded for use in this tutorial is a convolutional autoencoder (CAE). CAEs are a type of unsupervised neural network often used for anomalous image detection. These neural networks compress images into a lower dimensional representation via convolutional layers (the encoder), then reconstruct the image from the lower dimensional representation (the decoder). For image anomaly detection, the degree of error is calculated by comparing the input image to its reconstruction, and if the error is above a defined threshold, the input image is considered anomalous.

The architecture and weights of the source model used in this tutorial were stored in a managed folder in the flow. The original model can be reconstructed from these two artifacts.

This tutorial can be executed within a jupyter notebook or as a code recipe in Dataiku. If a code recipe is used, the folder containing the source model as well as the target dataset should be specified as inputs, and the output specified as the folder for the output model.

Note

If the starter model is not in memory, make sure to load the weights from that starter model and load them into the encoder.

Code 2 – Loading the saved model#

# Load model stored in Dataiku
folder = dataiku.Folder('1bXPjNsj')

with folder.get_download_stream('/base_model_artifacts/model_architecture.json') as f:
    json_config = f.read().decode('utf-8')

autoencoder = model_from_json(json_config)

with folder.get_download_stream('/base_model_artifacts/cae_mnist_base_model.weights.h5') as f_stream:
    with tempfile.NamedTemporaryFile(suffix='.h5', delete=False) as tmp_file:
        tmp_file.write(f_stream.read())
        temp_file_path = tmp_file.name

autoencoder.load_weights(temp_file_path, by_name=True)

# Load MNIST-M data from HuggingFace
splits_mnist_m = {'train': 'data/train-00000-of-00001-571b6b1e2c195186.parquet',
                  'test': 'data/test-00000-of-00001-ba3ad971b105ff65.parquet'}
mnist_m_train = pd.read_parquet("hf://datasets/Mike0307/MNIST-M/" + splits_mnist_m["train"])
mnistm_test = pd.read_parquet("hf://datasets/Mike0307/MNIST-M/" + splits_mnist_m["test"])

Preparing target dataset for transfer learning#

The MNIST-M dataset consists of 60,000 colorful versions of the original MNIST dataset. The first step is to convert the images into numpy representations the same size as the original MNIST images used to train the starter model. Next, we will filter for just the zero images to be our transfer target dataset and prepare a dataset of mixed digits for evaluation in which the majority class is 0 and the anomalous class is all other digits.

Code 3 – Preparing the targeted dataset#

def bytes_to_numpy(img_bytes):
    """Converts a PNG string back to a normalized (28, 28, x) numpy array."""
    try:
        # Open image from bytes using PIL
        img = Image.open(io.BytesIO(img_bytes))

        # Convert to numpy array and normalize
        img_np_array = np.array(img, dtype=np.float32) / 255.0

        if (img_np_array.shape[0] == 28) and (img_np_array.shape[1] == 28):
            return img_np_array
        else:
            # Crop image
            cropped_np_array = img_np_array[2:30, 2:30, :]
            return cropped_np_array
    except Exception as e:
        print(f"Error decoding base64 image: {e}")
        return None


# Prepare transfer target training dataset
mnistm_train_filtered = mnist_m_train[mnist_m_train['label'] == 0]
mnistm_train_filtered['image_np'] = mnistm_train_filtered['image'].apply(
    lambda x: bytes_to_numpy(x['bytes'])
)

img_np_list = [x for x in mnistm_train_filtered['image_np'] if x is not None]
x_train_mnistm = np.array(img_np_list)
y_train_mnistm = np.array(mnistm_train_filtered['label'])

# Prepare evaluation dataset
mnistm_test_filtered = pd.concat([
    mnistm_test[mnistm_test['label'] == 0],
    mnistm_test[mnistm_test['label'] != 0].sample(110, random_state=42)
])
mnistm_test_filtered['image_np'] = mnistm_test_filtered['image'].apply(
    lambda x: bytes_to_numpy(x['bytes'])
)
mnistm_test_img_list = [x for x in mnistm_test_filtered['image_np'] if x is not None]
x_test_mnistm = np.array(mnistm_test_img_list)
y_test_mnistm = np.array(mnistm_test_filtered['label'])

Updating input layer to reflect color channel#

Since the starter model was trained on black and white images, the input layer currently expects image representations with the dimensions 28 x 28 x 1, where the values 28 x 28 indicate the pixel size of the image and 1 refers to the color channel. Grayscale images have only one channel for pigment values, because all of the pixels are a shade of gray between black and white. On the other hand, color images require three channels, one for each of the primary colors red, green, and blue.

For our updated model, we will replace the original input layer to reflect the updated input dimensions, and initialize a new convolutional layer to be the new input adapter.

Code 4 – Updating input layer#

# Update input layer
TRANSFER_SHAPE = (28, 28, 3)
transfer_input = Input(shape=TRANSFER_SHAPE, name='transfer_input_28x28_3ch')
# Input adapter layer
x = Conv2D(32, (3, 3), activation='relu', padding='same', name='transfer_conv1')(transfer_input)

Loading original encoder and decoder layers#

In transfer learning, only the final layer(s) of a model are adjusted according to new data, so we will re-use the original encoder and bottleneck layers from the starter model and mark them as not trainable.

Code 5 - Loading the encoders layer#

original_encoder = autoencoder.get_layer('encoder')
# Freeze intermediate layers
# Layer 1: pool1
x = original_encoder.get_layer('pool1')(x)
original_encoder.get_layer('pool1').trainable = False

# Layer 2: conv2
x = original_encoder.get_layer('conv2')(x)
original_encoder.get_layer('conv2').trainable = False

# Layer 3: pool2
x = original_encoder.get_layer('pool2')(x)
original_encoder.get_layer('pool2').trainable = False

# Bottleneck Transition Layers (Flatten and Dense)
flat = original_encoder.get_layer('flatten')(x)
original_encoder.get_layer('flatten').trainable = False

latent = original_encoder.get_layer('latent_bottleneck')(flat)
original_encoder.get_layer('latent_bottleneck').trainable = False

# Consolidated transfer encoder
transfer_encoder = Model(transfer_input, latent, name='transfer_encoder')
# New decoder input layer
transfer_latent_input = Input(shape=(32,), name='transfer_latent_input')

We will similarly load the original decoder layers but expect the weights in these layers to be adjusted during the transfer learning process.

Code 6 – Loading the decoders layer#

original_decoder = autoencoder.get_layer('decoder')
# Keep original decoder initial layers
x = original_decoder.get_layer('decoder_dense_start')(transfer_latent_input)
x = original_decoder.get_layer('decoder_reshape')(x)
# Add original decoder layers, these layers will have weights adjusted during transfer learning process
x = original_decoder.get_layer('deconv1')(x)
x = original_decoder.get_layer('up_sampling2d')(x)
x = original_decoder.get_layer('deconv2')(x)
x = original_decoder.get_layer('up_sampling2d_1')(x)
# New final output layer to reflect 3 channels, trainable
transfer_decoded = Conv2D(3, (3, 3), activation='sigmoid', padding='same', name='output_28x28_3ch_final')(x)
transfer_decoder = Model(transfer_latent_input, transfer_decoded, name='transfer_decoder')

Transfer learning#

Consolidate all the layers into an updated CAE model, and train the model on the target dataset.

Code 7 – Transfer learning#

# Consolidate transfer input, encoder, and decoder layers
transfer_autoencoder = Model(transfer_input, transfer_decoder(transfer_encoder(transfer_input)),
                             name='CAE_TRANSFER_MNISTM')
# Fit model to target MNIST-M data
transfer_autoencoder.compile(optimizer='adam', loss='mse')
transfer_autoencoder.fit(
    x_train_mnistm,
    x_train_mnistm,
    epochs=5,
    batch_size=32,
    shuffle=True,
    validation_split=0.1,
    verbose=0
)

Performance assessment#

First, determine the threshold for anomalousness by calculating the 95th percentile of the mean squared error (MSE) of the target train data images minus their reconstructions.

Code 8 – Determining the threshold#

reconstructions_mnistm = transfer_autoencoder.predict(x_train_mnistm, verbose=0)
mse_mnistm = np.mean(np.power(x_train_mnistm - reconstructions_mnistm, 2), axis=(1, 2, 3))
anomaly_threshold_transfer_95th = np.percentile(mse_mnistm, 95)
# Threshold = 0.058

Next, make predictions for the normal evaluation data (zeroes) and the abnormal evaluation data (non-zeroes). Measure the percentage of images in each category that are considered anomalous.

Code 9 – Making predictions#

mnistm_normal_test_errors = np.mean(np.power(
    x_test_mnistm[y_test_mnistm == 0] - transfer_autoencoder.predict(x_test_mnistm[y_test_mnistm == 0], verbose=0), 2),
    axis=(1, 2, 3))

mnistm_anomaly_test_errors = np.mean(np.power(
    x_test_mnistm[y_test_mnistm != 0] - transfer_autoencoder.predict(x_test_mnistm[y_test_mnistm != 0], verbose=0), 2),
    axis=(1, 2, 3))

mnistm_normal_test_errors[mnistm_normal_test_errors <= anomaly_threshold_transfer_95th].shape[0] / \
mnistm_normal_test_errors.shape[0]

mnistm_anomaly_test_errors[mnistm_anomaly_test_errors <= anomaly_threshold_transfer_95th].shape[0] / \
mnistm_anomaly_test_errors.shape[0]

Results:

84.5% of the colorful, abnormal evaluation images were correctly identified as abnormal
93.5% of the colorful, normal evaluation images were correctly identified as normal

Performance comparison to source model on MNIST-M images#

The architecture of the source model makes direct comparison impossible without adjustments to the model itself. However, by converting the evaluation dataset to grayscale, we can still approximate the impact of non uniform background color and texture has on the model’s performance.

Code 10 – Performance comparison#

# Convert evaluation dataset images to grayscale
grayscale_weights = np.array([0.2989, 0.5870, 0.1140], dtype=np.float32)
grayscale_x_test_mnistm = np.dot(x_test_mnistm, grayscale_weights)
grayscale_x_test_mnistm = np.expand_dims(grayscale_x_test_mnistm, axis=-1)

# Load anomalousness threshold from original trained model
anomaly_threshold_transfer_95th = 0.01845815759152173

# Measure original CAE accuracy
baseline_mnistm_normal_test_errors = np.mean(np.power(
    grayscale_x_test_mnistm[y_test_mnistm == 0] - autoencoder.predict(grayscale_x_test_mnistm[y_test_mnistm == 0],
                                                                      verbose=0), 2), axis=(1, 2, 3))

baseline_mnistm_anomaly_test_errors = np.mean(np.power(
    grayscale_x_test_mnistm[y_test_mnistm != 0] - autoencoder.predict(grayscale_x_test_mnistm[y_test_mnistm != 0],
                                                                      verbose=0), 2), axis=(1, 2, 3))

baseline_mnistm_normal_test_errors[baseline_mnistm_normal_test_errors <= anomaly_threshold_transfer_95th].shape[0] / \
baseline_mnistm_normal_test_errors.shape[0]

baseline_mnistm_anomaly_test_errors[baseline_mnistm_anomaly_test_errors > anomaly_threshold_transfer_95th].shape[0] / \
baseline_mnistm_anomaly_test_errors.shape[0]

The original CAE over-predicted the anomaly rate by a factor of 7, despite abnormal images comprising ~10% of the evaluation set. The CAE fails because its convolutional layers were trained on images with a uniformly black background, whereas the grayscale MNIST-M images have non-uniform texture and noise in their background. This differential in pre and post transfer learning model performance underscores the importance of transfer learning techniques in adapting existing models to new domains.

Wrapping Up#

In this tutorial, we learned how to perform unsupervised transfer learning to adapt an existing model to a new domain. We now have a better understanding of the benefits of unsupervised transfer learning, how to adjust layers within a neural network to new input types, as well as which layers to tune. The image anomaly detection exercise showed that the original CAE was highly transferable, and by only training some of the model’s parameters, we were able to adapt the existing model to a totally new domain without any labeled data.

To learn more about transfer learning, we recommend this literature review and this textbook chapter.

Here is the complete code of this tutorial:

Complete code

# For interacting with the Dataiku API + loading data
import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
import numpy as np
# For training a convolutional autoencoder
import tensorflow as tf
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D, Conv2DTranspose, Reshape, Flatten, Dense
from tensorflow.keras.models import Model, model_from_json
from tensorflow.keras import backend as K
# For converting an image into a numpy array
from PIL import Image
import io
import tempfile

# Load model stored in Dataiku
folder = dataiku.Folder('1bXPjNsj')

with folder.get_download_stream('/base_model_artifacts/model_architecture.json') as f:
    json_config = f.read().decode('utf-8')

autoencoder = model_from_json(json_config)

with folder.get_download_stream('/base_model_artifacts/cae_mnist_base_model.weights.h5') as f_stream:
    with tempfile.NamedTemporaryFile(suffix='.h5', delete=False) as tmp_file:
        tmp_file.write(f_stream.read())
        temp_file_path = tmp_file.name

autoencoder.load_weights(temp_file_path, by_name=True)

# Load MNIST-M data from HuggingFace
splits_mnist_m = {'train': 'data/train-00000-of-00001-571b6b1e2c195186.parquet',
                  'test': 'data/test-00000-of-00001-ba3ad971b105ff65.parquet'}
mnist_m_train = pd.read_parquet("hf://datasets/Mike0307/MNIST-M/" + splits_mnist_m["train"])
mnistm_test = pd.read_parquet("hf://datasets/Mike0307/MNIST-M/" + splits_mnist_m["test"])


def bytes_to_numpy(img_bytes):
    """Converts a PNG string back to a normalized (28, 28, x) numpy array."""
    try:
        # Open image from bytes using PIL
        img = Image.open(io.BytesIO(img_bytes))

        # Convert to numpy array and normalize
        img_np_array = np.array(img, dtype=np.float32) / 255.0

        if (img_np_array.shape[0] == 28) and (img_np_array.shape[1] == 28):
            return img_np_array
        else:
            # Crop image
            cropped_np_array = img_np_array[2:30, 2:30, :]
            return cropped_np_array
    except Exception as e:
        print(f"Error decoding base64 image: {e}")
        return None


# Prepare transfer target training dataset
mnistm_train_filtered = mnist_m_train[mnist_m_train['label'] == 0]
mnistm_train_filtered['image_np'] = mnistm_train_filtered['image'].apply(
    lambda x: bytes_to_numpy(x['bytes'])
)

img_np_list = [x for x in mnistm_train_filtered['image_np'] if x is not None]
x_train_mnistm = np.array(img_np_list)
y_train_mnistm = np.array(mnistm_train_filtered['label'])

# Prepare evaluation dataset
mnistm_test_filtered = pd.concat([
    mnistm_test[mnistm_test['label'] == 0],
    mnistm_test[mnistm_test['label'] != 0].sample(110, random_state=42)
])
mnistm_test_filtered['image_np'] = mnistm_test_filtered['image'].apply(
    lambda x: bytes_to_numpy(x['bytes'])
)
mnistm_test_img_list = [x for x in mnistm_test_filtered['image_np'] if x is not None]
x_test_mnistm = np.array(mnistm_test_img_list)
y_test_mnistm = np.array(mnistm_test_filtered['label'])

# Update input layer
TRANSFER_SHAPE = (28, 28, 3)
transfer_input = Input(shape=TRANSFER_SHAPE, name='transfer_input_28x28_3ch')
# Input adapter layer
x = Conv2D(32, (3, 3), activation='relu', padding='same', name='transfer_conv1')(transfer_input)

original_encoder = autoencoder.get_layer('encoder')
# Freeze intermediate layers
# Layer 1: pool1
x = original_encoder.get_layer('pool1')(x)
original_encoder.get_layer('pool1').trainable = False

# Layer 2: conv2
x = original_encoder.get_layer('conv2')(x)
original_encoder.get_layer('conv2').trainable = False

# Layer 3: pool2
x = original_encoder.get_layer('pool2')(x)
original_encoder.get_layer('pool2').trainable = False

# Bottleneck Transition Layers (Flatten and Dense)
flat = original_encoder.get_layer('flatten')(x)
original_encoder.get_layer('flatten').trainable = False

latent = original_encoder.get_layer('latent_bottleneck')(flat)
original_encoder.get_layer('latent_bottleneck').trainable = False

# Consolidated transfer encoder
transfer_encoder = Model(transfer_input, latent, name='transfer_encoder')
# New decoder input layer
transfer_latent_input = Input(shape=(32,), name='transfer_latent_input')

original_decoder = autoencoder.get_layer('decoder')
# Keep original decoder initial layers
x = original_decoder.get_layer('decoder_dense_start')(transfer_latent_input)
x = original_decoder.get_layer('decoder_reshape')(x)
# Add original decoder layers, these layers will have weights adjusted during transfer learning process
x = original_decoder.get_layer('deconv1')(x)
x = original_decoder.get_layer('up_sampling2d')(x)
x = original_decoder.get_layer('deconv2')(x)
x = original_decoder.get_layer('up_sampling2d_1')(x)
# New final output layer to reflect 3 channels, trainable
transfer_decoded = Conv2D(3, (3, 3), activation='sigmoid', padding='same', name='output_28x28_3ch_final')(x)
transfer_decoder = Model(transfer_latent_input, transfer_decoded, name='transfer_decoder')

# Consolidate transfer input, encoder, and decoder layers
transfer_autoencoder = Model(transfer_input, transfer_decoder(transfer_encoder(transfer_input)),
                             name='CAE_TRANSFER_MNISTM')
# Fit model to target MNIST-M data
transfer_autoencoder.compile(optimizer='adam', loss='mse')
transfer_autoencoder.fit(
    x_train_mnistm,
    x_train_mnistm,
    epochs=5,
    batch_size=32,
    shuffle=True,
    validation_split=0.1,
    verbose=0
)

reconstructions_mnistm = transfer_autoencoder.predict(x_train_mnistm, verbose=0)
mse_mnistm = np.mean(np.power(x_train_mnistm - reconstructions_mnistm, 2), axis=(1, 2, 3))
anomaly_threshold_transfer_95th = np.percentile(mse_mnistm, 95)
# Threshold = 0.058

mnistm_normal_test_errors = np.mean(np.power(
    x_test_mnistm[y_test_mnistm == 0] - transfer_autoencoder.predict(x_test_mnistm[y_test_mnistm == 0], verbose=0), 2),
    axis=(1, 2, 3))

mnistm_anomaly_test_errors = np.mean(np.power(
    x_test_mnistm[y_test_mnistm != 0] - transfer_autoencoder.predict(x_test_mnistm[y_test_mnistm != 0], verbose=0), 2),
    axis=(1, 2, 3))

mnistm_normal_test_errors[mnistm_normal_test_errors <= anomaly_threshold_transfer_95th].shape[0] / \
mnistm_normal_test_errors.shape[0]

mnistm_anomaly_test_errors[mnistm_anomaly_test_errors <= anomaly_threshold_transfer_95th].shape[0] / \
mnistm_anomaly_test_errors.shape[0]

# Convert evaluation dataset images to grayscale
grayscale_weights = np.array([0.2989, 0.5870, 0.1140], dtype=np.float32)
grayscale_x_test_mnistm = np.dot(x_test_mnistm, grayscale_weights)
grayscale_x_test_mnistm = np.expand_dims(grayscale_x_test_mnistm, axis=-1)

# Load anomalousness threshold from original trained model
anomaly_threshold_transfer_95th = 0.01845815759152173

# Measure original CAE accuracy
baseline_mnistm_normal_test_errors = np.mean(np.power(
    grayscale_x_test_mnistm[y_test_mnistm == 0] - autoencoder.predict(grayscale_x_test_mnistm[y_test_mnistm == 0],
                                                                      verbose=0), 2), axis=(1, 2, 3))

baseline_mnistm_anomaly_test_errors = np.mean(np.power(
    grayscale_x_test_mnistm[y_test_mnistm != 0] - autoencoder.predict(grayscale_x_test_mnistm[y_test_mnistm != 0],
                                                                      verbose=0), 2), axis=(1, 2, 3))

baseline_mnistm_normal_test_errors[baseline_mnistm_normal_test_errors <= anomaly_threshold_transfer_95th].shape[0] / \
baseline_mnistm_normal_test_errors.shape[0]

baseline_mnistm_anomaly_test_errors[baseline_mnistm_anomaly_test_errors > anomaly_threshold_transfer_95th].shape[0] / \
baseline_mnistm_anomaly_test_errors.shape[0]