Bulk Create Snowflake Data Sources
Private preview
This feature is only available to select accounts. Reach out to your Immuta representative to enable this feature.
Requirements
- Snowflake Enterprise Edition
- Snowflake X-Large or Large warehouse is strongly recommended
Create Snowflake data sources
Make a request to the Immuta V2 API create data source endpoint, as the Immuta UI does not support creating more than 1000 data sources. The following options must be specified in your request to ensure the maximum performance benefits of bulk data source creation. The Skip Stats Job
tag is only required if you are using specific policies that require stats; otherwise, Snowflake data sources automatically skip the stats job.
```json
"options": {
"disableSensitiveDataDiscovery": true,
"tableTags": [
"Skip Stats Job"
]
}
```
Specifying disableSensitiveDataDiscovery
as true
ensures that
sensitive data discovery will not be applied
when the new data sources are created in Immuta, regardless of how it is configured for the Immuta tenant.
Disabling sensitive data discovery improves performance during data source creation.
Applying the Skip Stats Job
tag using the tableTag
value will ensure that some jobs that are not vital to data source
creation are skipped, specifically the fingerprint and high cardinality check jobs.
When the Snowflake bulk data source creation feature is configured, the create data source endpoint operates asynchronously
and responds immediately with a bulkId
that can be used for monitoring progress.
Monitor progress
To monitor the progress of the background jobs for the bulk data source creation, make the following
request using the bulkId
from the response of the previous step:
curl \
--request POST \
--header "Content-Type: application/json" \
--header "Authorization: Bearer dea464c07bd07300095caa8" \
--data @example_payload.json
https://your-immuta-url.com/jobs?bulkId=<your-bulkId>
The response will contain a list of job states and the number of jobs currently in each state. If errors were encountered during processing, a list of errors will be included in the response:
{
"total":"99893",
"completed":"99892",
"failed":"0",
"pending":"1",
"errors":null
}
With these recommended configurations, bulk creating 100,000 Snowflake data sources will take between six and seven hours for all associated jobs to complete.