Create a Regex Classifier
Audience: Data Governors
Content Summary: In addition to built-in classifiers, Sensitive Data Discovery can use custom classifiers to discover and apply tags to sensitive data. This page details how to create a custom regex classifier. For specific details and examples of other classifiers, see the Create a Custom Dictionary Classifier or Create a Custom Column Name Regex Classifier tutorials.
Use Case: Custom Regex Classifier
Scenario: You've listed Immuta's built-in classifiers for Sensitive Data Discovery, but you discover there is no classifier that can automatically detect and tag columns that contain account numbers in your database.
A regular expression (regex) custom classifier allows you to create your own detectors that enable Immuta's
Sensitive Data Discovery to find matches based on a regex pattern. For example, if a table contains account
numbers in the form of xxxxxxxxx-xxx-x
, you could define a regex pattern in a custom classifier to identify and
tag these columns. The tutorial below uses this scenario to illustrate creating this classifier.
Attributes of the Custom Regex Classifier
Attributes of all custom classifiers are provided on the Sensitive Data Discovery API page. However, attributes specific to the custom regex classifier are outlined in the table below.
Attribute | Description | Required |
---|---|---|
name | string Unique, request-friendly classifier name. |
Yes |
displayName | string Unique, human-readable classifier name. |
Yes |
description | string The classifier description. |
Yes |
type | string The type of classifier: regex . |
Yes |
config | object Includes config.minConfidence , config.tags , and config.regex . *See descriptions for these below. |
Yes |
minConfidence* | number When the detection confidence is at least this percentage, tags are applied. |
Yes |
tags* | array[string] The name of the tags to apply to the data source. Note: All tags must start with Discovered. . |
Yes |
regex* | string A case-insensitive regular expression to match against column values. |
Yes |
Create a Custom Regex Classifier
-
Generate your API key on the API Keys tab on your profile page and save the API key somewhere secure. You will include this API key in the authorization header when you make a request to the Immuta API or use it to configure your instance with the Immuta CLI.
-
Save the custom regex classifier payload in a .json file.
{ "name": "ACCOUNT_NUMBER_CLASSIFIER", "displayName": "Account Number Classifier", "description": "This classifier identifies account numbers using a regex", "type": "regex", "config": { "regex": "^[0-9]{9}-[0-9]{3}-[0-9]{1}$", "minConfidence": 0.5, "tags": ["Discovered.account-number"] } }
-
Create the classifier using one of these methods:
Immuta CLI
immuta api sdd/classifier -X POST --input ./example-payload.json
HTTP API
curl \ --request POST \ --header "Content-Type: application/json" \ --header "Authorization: 12345678900000" \ --data @example-payload.json \ https://your-immuta-url.immuta.com/sdd/classifier
-
If the request is successful, you will receive a response that contains details about the classifier.
{ "createdBy": { "id": 1, "name": "John", "email": "john@example.com" }, "name": "ACCOUNT_NUMBER_CLASSIFIER", "displayName": "Account Number Classifier", "description": "This classifier identifies account numbers using a regex", "type": "regex", "config": { "tags": [ "Discovered.account-number" ], "regex": "[0-9]{9}-[0-9]{3}-[0-9]{1}", "minConfidence": 0.5 }, "id": 1, "createdAt": "2021-10-14T18:48:56.289Z", "updatedAt": "2021-10-14T18:48:56.289Z" }
What's Next
Continue to one of the following tutorials:
- Run Sensitive Data Discovery on Data Sources: Trigger SDD to run on specified data sources.
- Create a Template: Although only Data Governors can create classifiers, Data Owners
can add classifiers to templates, which they then apply to their data sources to override
minConfidence
or tags for classifiers within the template.