Skip to content

You are viewing documentation for Immuta version 2022.5.

For the latest version, view our documentation for Immuta SaaS or the latest self-hosted version.

Connect Data to Your Cluster

Immuta clusters use the configured metastore owner personal access token (PAT) to interact with the Unity Catalog metastore. Before registering the table as a data source in Immuta, the catalog, schema, and table being registered must be granted to the configured Unity Catalog metastore owner using one of two methods so that the table is visible to Immuta:

Automatically Grant Access in Privilege Model 1.0

Automatically grant select access to everything in a catalog by running the SQL statement below as the metastore owner or catalog owner:

GRANT USE CATALOG, USE SCHEMA, SELECT ON CATALOG mycatalog TO `myadministrator@mycompany.com`;

Manually Grant Access

If you are not using Privilege Model 1.0, manually grant access to specific tables by running the SQL statements below as the administrator or table owner:

GRANT USE CATALOG ON CATALOG mycatalog TO `myadministrator@mycompany.com`;
GRANT USAGE ON SCHEMA myschema TO `myadministrator@mycompany.com`;
GRANT SELECT ON TABLE myschema.mytable TO `myadministrator@mycompany.com`;

Register Data Sources

To register a Databricks table as an Immuta data source, Immuta requires a running Databricks cluster that it can use to determine the schema and metadata of the table in Databricks. This cluster can be either

  • a non-Immuta cluster: Use a non-Immuta cluster if you have over 1,000 tables to register as Immuta data sources. This is the fastest and least error-prone method to add many data sources at a time.
  • an Immuta-enabled cluster: Use an Immuta-enabled cluster if you have a few tables to register as Immuta data sources.

Limited enforcement (available until protected by policy access model) is not supported

You must set IMMUTA_SPARK_DATABRICKS_ALLOW_NON_IMMUTA_READS and IMMUTA_SPARK_DATABRICKS_ALLOW_NON_IMMUTA_WRITES to false in your cluster policies manually or by selecting Protected until made available by policy in the Databricks integration section of the App Settings page. See the Databricks Spark integration with Unity Catalog support limitations for details.

Once your cluster is running,

  1. Register your data from your non-Immuta or Immuta-enabled cluster.
  2. If you used a non-Immuta cluster, convert the cluster to an Immuta cluster with Immuta cluster policies once data sources have been created.

Note: When the Unity Catalog integration is enabled, a schema must be specified when registering data sources backed by tables in the legacy hive_metastore.