Skip to content

You are viewing documentation for Immuta version 2022.5.

For the latest version, view our documentation for Immuta SaaS or the latest self-hosted version.

Databricks Project Workspaces Pre-Configuration Details

Audience: Project members

Content Summary: This page outlines prerequisites and provides an overview of the integration process for Databricks project workspaces.

See the Overview page for information on the utility of project workspaces and see the Configuration page for installation instructions.

Prerequisites

Project Workspace Workflow

  1. An Immuta User with the CREATE_PROJECT permission creates a new project with Databricks data sources.
  2. The Immuta Project Owner enables Project Equalization which balances every Project Members’ access to the data to be the same.
  3. The Immuta Project Owner creates a Databricks Project Workspace which automatically generates a subfolder in the root path specified by the Application Admin and remote database associated with the project.
  4. The Immuta Project Members query equalized data within the context of the project, collaborate, and write data back to Immuta, all within Databricks.
  5. The Immuta Project Members use their newly written derived data and register the derived tables in Immuta as derived data sources. These derived data sources inherit the necessary Immuta policies to be securely shared outside of the project.

Root Directory Details

  • Immuta only supports a single root location, so all projects will write to a subdirectory under this single root location.

  • If an administrator changes the default directory, the Immuta user must have full access to that directory. Once any workspace is created, this directory can no longer be modified.

  • Administrators can place a configuration value in the cluster configuration (core-site.xml) to mark that cluster as unavailable for use as a workspace.

Read and Write Data

  • When acting in the workspace project, users can read data using calls like spark.read.parquet("immuta:///some/path/to/a/workspace").

  • To write delta lake data to a workspace and then expose that delta table as a data source in Immuta, you must specify a table when creating the derived data source (rather than a directory) in the workspace for the data source.