The most surprising thing about autoencoder-based anomaly detection is that the model isn’t actually trained to detect anomalies; it’s trained to reconstruct normal data.
Let’s see this in action. Imagine we have a stream of sensor readings from a healthy machine. We feed this data into an autoencoder. The autoencoder’s job is to learn a compressed representation (the "encoding") of this normal data and then reconstruct it back to its original form (the "decoding").
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
# Assume X_train is a numpy array of shape (num_samples, num_features)
# containing only normal data.
# Example:
num_samples = 1000
num_features = 10
X_train = np.random.rand(num_samples, num_features)
# Define the autoencoder model
class Autoencoder(keras.Model):
def __init__(self, latent_dim):
super(Autoencoder, self).__init__()
self.latent_dim = latent_dim
self.encoder = tf.keras.Sequential([
layers.Input(shape=(num_features,)),
layers.Dense(64, activation='relu'),
layers.Dense(32, activation='relu'),
layers.Dense(latent_dim, activation='relu')
])
self.decoder = tf.keras.Sequential([
layers.Input(shape=(latent_dim,)),
layers.Dense(32, activation='relu'),
layers.Dense(64, activation='relu'),
layers.Dense(num_features, activation='sigmoid') # Sigmoid for reconstruction, assumes data in [0,1]
])
def call(self, x):
encoded = self.encoder(x)
decoded = self.decoder(encoded)
return decoded
# Instantiate and compile the model
latent_dim = 4
autoencoder = Autoencoder(latent_dim)
autoencoder.compile(optimizer='adam', loss='mse') # Mean Squared Error is common for reconstruction
# Train the model
# We use X_train as both input and target because we want to reconstruct the input
history = autoencoder.fit(X_train, X_train,
epochs=50,
batch_size=32,
shuffle=True,
validation_split=0.2) # Use a portion for validation
# Now, let's simulate predicting on new data
# Simulate some normal data and some anomalous data
X_normal_test = np.random.rand(10, num_features)
X_anomaly_test = X_train[0:5] + np.random.normal(0, 0.5, (5, num_features)) # Add some noise to normal data
# Predict reconstructions
reconstructions_normal = autoencoder.predict(X_normal_test)
reconstructions_anomaly = autoencoder.predict(X_anomaly_test)
# Calculate reconstruction error (e.g., Mean Squared Error)
mse_normal = np.mean(np.power(X_normal_test - reconstructions_normal, 2), axis=1)
mse_anomaly = np.mean(np.power(X_anomaly_test - reconstructions_anomaly, 2), axis=1)
print("Reconstruction errors for normal data:", mse_normal)
print("Reconstruction errors for anomalous data:", mse_anomaly)
# We would then set a threshold on these errors.
# A common approach is to use the errors on the validation set to determine this threshold.
# For example, if 95% of validation errors are below 0.01, we might set our anomaly
# threshold to 0.01. Any new data point with an error above this is flagged as an anomaly.
The core problem this solves is identifying deviations from a learned "normal" pattern without needing pre-labeled anomaly data. Traditional supervised learning requires examples of what constitutes an anomaly, which is often scarce or impossible to define exhaustively. Autoencoders bypass this by learning the structure of the majority class (normal data) and flagging anything that doesn’t fit that learned structure.
Internally, the autoencoder consists of two main parts: the encoder and the decoder. The encoder maps the input data to a lower-dimensional latent space. This latent space captures the essential features of the data. The decoder then tries to reconstruct the original input from this compressed latent representation. During training, the goal is to minimize the difference between the original input and the reconstructed output. This difference is often measured using a loss function like Mean Squared Error (MSE).
The "levers" you control are primarily in the model architecture and training parameters. The latent_dim dictates the degree of compression; a smaller latent_dim forces the model to learn more salient features but risks losing too much information, while a larger latent_dim might not compress enough to be useful for anomaly detection. The number of layers and units in the encoder and decoder, activation functions (e.g., relu, sigmoid), the optimizer (e.g., adam, sgd), and the choice of loss function (mse, binary_crossentropy if data is binary) are all crucial. The epochs and batch_size during training also significantly impact convergence and the model’s ability to generalize.
When you train an autoencoder, you’re essentially teaching it the boundaries of what "normal" looks like by forcing it to compress and decompress this normal data with minimal error. Anomalies, by definition, lie outside these learned boundaries. When an anomalous data point is fed through the trained autoencoder, the encoder will struggle to find a meaningful representation in the latent space, and the decoder will fail to reconstruct it accurately. This results in a high reconstruction error. The key is that the model is optimized for reconstruction accuracy on the training data distribution. Any significant deviation from this distribution leads to a poor reconstruction.
The threshold for anomaly detection is not learned by the model directly. It’s a hyperparameter that you set after training, typically by examining the reconstruction errors on a validation set composed of normal data. You might choose a threshold such that 99% of your normal validation data falls below it.
The common practice of using a sigmoid activation in the final decoder layer, especially when your input data is normalized to be between 0 and 1, is because it squashes the output values into that same range, making the reconstruction error calculation directly comparable to the input. If your data isn’t strictly in [0,1], you might use a linear activation or adjust your loss function accordingly.
The next step after successfully implementing autoencoder anomaly detection is often exploring more complex autoencoder architectures like Variational Autoencoders (VAEs) or Generative Adversarial Networks (GANs) for anomaly detection.