Share via


Choose where your MLflow data is stored

MLflow tracking servers store and manage your experiment data, runs, and models. Configure your tracking servers to control where your MLflow data is stored and how to access experiments across different environments.

Databricks-hosted tracking server

By default, Databricks provides a managed MLflow tracking server that:

  • Requires no additional setup or configuration
  • Stores experiment data in your workspace
  • Integrates seamlessly with Databricks notebooks and clusters

Set the active experiment

By default all MLflow runs are logged to workspace's tracking server using the active experiment. If no experiment is explicitly set, runs are logged to the notebook experiment.

Control where runs are logged in Databricks by setting the active experiment:

Mlflow.set_experiment()

Set an experiment for all subsequent runs in the execution.

import mlflow

mlflow.set_experiment("/Shared/my-experiment")

Mlflow.start_run()

Set the experiment for a specific run.

with mlflow.start_run(experiment_id="12345"):
    mlflow.log_param("learning_rate", 0.01)

Environment variables

Set an experiment for all runs in the environment.

import os
os.environ["MLFLOW_EXPERIMENT_NAME"] = "/Shared/my-experiment"
# or
os.environ["MLFLOW_EXPERIMENT_ID"] = "12345"

Set up tracking to a remote MLflow tracking server

You may need to set up a connection to a remote MLflow tracking server. This could be because you are developing locally and want to track against the Databricks hosted server, or you want to track to a different MLflow tracking server. For example, one that's in a different workspace.

Common scenarios for remote tracking:

Scenario Use Case
Cross-workspace tracking Centralized experiment tracking across multiple workspaces
Local development Develop locally but track experiments in Databricks
Remote self-hosted Custom MLflow infrastructure with specific compliance requirements

Set up the tracking URI and experiment

To log experiments to a remote tracking server, configure both the tracking URI and experiment path:

import mlflow

# Set the tracking URI to the remote server
mlflow.set_tracking_uri("databricks://remote-workspace-url")

# Set the experiment path in the remote server
mlflow.set_experiment("/Shared/centralized-experiments/my-project")

# All subsequent runs will be logged to the remote server
with mlflow.start_run():
    mlflow.log_param("model_type", "random_forest")
    mlflow.log_metric("accuracy", 0.95)

Authentication methods

Remote tracking server connections require proper authentication. Choose between Personal Access Tokens (PAT) or OAuth using service principals.

PAT

Use PATs for simple token-based authentication.

Pros: Simple setup, good for development

Cons: User-specific, requires manual token management

import os

# Set authentication token
os.environ["DATABRICKS_TOKEN"] = "your-personal-access-token"

# Configure remote tracking
mlflow.set_tracking_uri("databricks://remote-workspace-url")
mlflow.set_experiment("/Shared/remote-experiment")

OAuth (service principal)

Use OAuth with service principal credentials for automated workflows.

Pros: Better for automation, centralized identity management

Cons: Requires service principal setup and OAuth configuration

Create a service principal. See Manage service principals.

import os

# Set service principal credentials
os.environ["DATABRICKS_CLIENT_ID"] = "your-service-principal-client-id"
os.environ["DATABRICKS_CLIENT_SECRET"] = "your-service-principal-secret"

# Configure remote tracking
mlflow.set_tracking_uri("databricks://remote-workspace-url")
mlflow.set_experiment("/Shared/remote-experiment")