Dark launching a feature flag means you’re deploying new code to production, but it’s not actually visible or active for any users.

Consider this a production canary for your code, not your users. The system is running the new code path alongside the old, but all requests are still hitting the old, stable code. This lets you test performance, stability, and even basic functionality in the real production environment without any risk of impacting your users. Think of it as a dress rehearsal in the actual theater, but with the audience still seeing the old play.

Here’s a typical dark launch setup in action. Let’s say we’re rolling out a new recommendation engine.

# feature_flags.yaml
features:
  new_recommendation_engine:
    description: "New ML-based recommendation engine"
    rollout:
      type: "all" # Initially, all traffic goes to the old engine
    variants:
      - name: "control"
        percentage: 100 # 100% of traffic is directed to the control (old) path
        enabled: true
      - name: "treatment"
        percentage: 0 # 0% of traffic is directed to the treatment (new) path
        enabled: true # The new code is deployed, but not being hit

In our application code, we’d check the flag like this:

# app.py
from feature_flags import FeatureFlagClient

flag_client = FeatureFlagClient()

def get_recommendations(user_id):
    if flag_client.is_enabled("new_recommendation_engine", user_id=user_id, variant="treatment"):
        # This code path is deployed but won't be executed in a dark launch
        return call_new_recommendation_service(user_id)
    else:
        # This is the current, stable recommendation service
        return call_old_recommendation_service(user_id)

During a dark launch, flag_client.is_enabled(...) for the "treatment" variant would return False because the percentage is set to 0. However, the code for call_new_recommendation_service is already deployed. This allows us to monitor the new service’s resource consumption (CPU, memory, network), error rates, and latency without any user traffic hitting it. We can compare the metrics of the new code running in isolation against the baseline of the old code.

The core problem this solves is the "production paradox": you can’t truly test new code until it’s in production, but you can’t put new code in production until it’s tested. Dark launching breaks this cycle by allowing production testing with zero user-facing risk. It bridges the gap between your staging environment and a full user rollout.

Internally, the feature flag system evaluates the flag configuration for a given request or user. When the "treatment" variant has percentage: 0, the system simply doesn’t return True for that variant’s is_enabled check, effectively short-circuiting the code path for all users. The key here is that the code for the treatment path is already running on the production servers, loaded into memory, and ready to accept requests. We’re just not directing any traffic to it.

The levers you control are the rollout strategy and the variants. For a dark launch, you set rollout.type: "all" (or it defaults to this) and ensure your "treatment" variant has percentage: 0. The enabled: true on the treatment variant is crucial; it means the code is available to be served, even if no traffic is currently being sent.

Once you’re confident the dark launch is performing as expected (metrics look good, no unexpected resource spikes), you’d transition to progressive delivery. This is where you’d start shifting a small percentage of actual user traffic to the new feature. You might update the flag configuration like this:

# feature_flags.yaml
features:
  new_recommendation_engine:
    description: "New ML-based recommendation engine"
    rollout:
      type: "percentage" # Now we're rolling out by percentage
    variants:
      - name: "control"
        percentage: 99 # 99% of traffic still goes to the old engine
        enabled: true
      - name: "treatment"
        percentage: 1 # 1% of traffic is now directed to the new engine
        enabled: true

And in your application:

# app.py
from feature_flags import FeatureFlagClient

flag_client = FeatureFlagClient()

def get_recommendations(user_id):
    # The flag client now returns True for the treatment variant for 1% of users
    if flag_client.is_enabled("new_recommendation_engine", user_id=user_id, variant="treatment"):
        return call_new_recommendation_service(user_id)
    else:
        return call_old_recommendation_service(user_id)

This gradual shift allows you to monitor real user impact. If the 1% rollout shows any issues (increased error rates, negative user feedback, performance degradation), you can immediately roll back by setting the "treatment" variant percentage back to 0 and the "control" to 100, all without redeploying code.

The one thing most people don’t realize is that even when a variant is at percentage: 0, the code for that variant is still being executed and processed by the application layer. The feature flag system acts as an intelligent router before the actual business logic of that variant is fully engaged. It’s not just about whether the code is deployed; it’s about whether the request is allowed to reach that code path based on the flag’s current configuration. This distinction is critical for understanding why you can monitor resource usage and errors for a dark-launched feature – the code is there, it’s being processed, just not for the end-user’s request.

The next logical step after a successful progressive delivery is to fully enable the new feature for all users and then, eventually, remove the old code.

Want structured learning?

Take the full Sre course →