Databricks SQL Integration Overview (Public Preview)
Audience: System Administrators, Data Governors, and Data Owners
Content Summary: This page provides an overview of the Databricks SQL integration in Immuta. For a tutorial detailing how to enable this integration, see the installation guide. Databricks SQL is currently in Public Preview. Please provide feedback on any issues you encounter, as well as insight regarding how you would like this feature to evolve in the future.
Overview
Immuta’s Databricks SQL integration provides users direct access to views in a protected database Immuta creates inside Databricks SQL when the integration is configured. This protected database includes
- several tables and views Immuta creates to enable policy enforcement (storage of user entitlements, UDFs, etc.).
- views that contain policy logic corresponding to the target data source exposed in Immuta by a Data Owner. This view is exposed to all users in Databricks SQL.
Architecture
When an administrator configures the Databricks SQL integration with Immuta, Immuta creates
an immuta
database and Databricks SQL creates a default
database in the SQL Endpoint. Data sources
registered in Immuta are added as tables to the default database, and a view is created in the
immuta
database for each of these tables.
The credentials provided to set up the integration must have the ability to
- create an integration database
- configure procedures and functions
- maintain state between Databricks and Immuta
De-Conflicting Tables
Databricks SQL has a two-level structure with databases and tables. To de-conflict these table names when Immuta
creates views in the Immuta-protected database, Immuta prepends each table name with its parent database
in Databricks SQL (which is configured in the Immuta UI). The following example illustrates a scenario
where multiple Databricks SQL databases are configured in Immuta
(whose protected database is named immuta_databricks_sql
in the SQL Endpoint):
Datasource A:
- parent Databricks SQL database: public
- table name: HR_data
Datasource B:
- parent Databricks SQL database: default
- table name: HR_data
Resulting Immuta views created:
- Data Source A: immuta_databricks_sql.public_HR_data
- Data Source B: immuta_databricks_sql.default_HR_data
Policy Enforcement
Immuta uses dynamic views to enforce row- and column-level security in Databricks SQL. These dynamic views allow Immuta to manage which users have access to a view’s rows, columns, or specific records by filtering or masking their values.
When a Data Owner exposes a Databricks SQL table as a data source in Immuta and applies a policy to it,
Immuta updates the policy definition in the protected immuta
database in Databricks SQL. Then, Immuta creates a dynamic
view based on the table in the default
database, the querying users' entitlements, and policies that apply
to that table. Finally, Databricks SQL users query the view through the protected immuta
database.
Data Flow
- A Databricks SQL Administrator creates a Databricks SQL endpoint.
- Databricks creates a
default
database. Note: Immuta doesn’t lock down access to the default database; an administrator must do that within Databricks SQL itself. - The Databricks Admin creates a table of 10 million people and queries the table.
- An Immuta Application Admin configures the Databricks SQL integration
- Immuta creates a protected database inside the Databricks SQL endpoint.
- A Data Owner
creates data sources
in Immuta from the
default
Databricks database. - A user adds or edits a policy, or adds a user to a group that changes a policy on a data source.
- Immuta updates the policy or user profile information in Databricks.
- Immuta creates dynamic views based on tables in the
default
database, users, groups, attributes, and policies. - Users query views in the protected database created by Immuta.