Skip to content

You are viewing documentation for Immuta version 2022.5.

For the latest version, view our documentation for Immuta SaaS or the latest self-hosted version.

Create a Column Name Regex Classifier

Audience: Data Governors

Content Summary: In addition to built-in classifiers, Sensitive Data Discovery can use custom classifiers to discover and apply tags to sensitive data. This page details how to create a custom column name regex classifier. For specific details and examples of other classifiers, see the Create a Custom Dictionary Classifier or Create a Custom Regex Classifier tutorials.

Use Case: Custom Column Name Regex Classifier

Scenario: You've listed Immuta's built-in classifiers for Sensitive Data Discovery, but you discover there is no classifier that can automatically detect and tag columns that contain account numbers in your database.

A custom column name regular expression (regex) classifier allows you to create your own detectors that enable Immuta's Sensitive Data Discovery to find column name matches based on a regex pattern. For example, if your database contains tables with social security numbers, you could define a regex pattern to match against the names of the column instead of the values within the column. The tutorial below uses this scenario to illustrate creating this classifier.

Attributes of the Custom Column Name Regex Classifier

Attributes of all custom classifiers are provided on the Sensitive Data Discovery API page. However, attributes specific to the custom column name regex classifier are outlined in the table below.

Attribute Description Required
name string Unique, request-friendly classifier name. Yes
displayName string Unique, human-readable classifier name. Yes
description string The classifier description. Yes
type string The type of classifier: columnNameRegex. Yes
config object Includes config.columnNameRegex and config.tags. *See descriptions for these below. Yes
tags* array[string] The name of the tags to apply to the data source. Note: All tags must start with Discovered.. Yes
columnNameRegex* string A case-insensitive regular expression to match against column names. Yes

Create a Custom Column Name Regex Classifier

  1. Generate your API key on the API Keys tab on your profile page and save the API key somewhere secure. You will include this API key in the authorization header when you make a request to the Immuta API or use it to configure your instance with the Immuta CLI.

  2. Save the custom column name regex classifier payload in a .json file. The regex ^ssn|social ?security$ looks for column names that match ssn, socialsecurity, or social security.

    {
      "name": "SOCIAL_SECURITY_NUMBER_COLUMNS_CLASSIFIER",
      "displayName": "Social Security Number Columns Classifier",
      "description": "This classifier identifies column names that match the defined regex pattern.",
      "type": "columnNameRegex",
      "config": {
        "columnNameRegex": "^ssn|social ?security$",
        "tags": ["Discovered.Social Security Numbers"]
      }
    }
    
  3. Create the classifier using one of these methods:

    Immuta CLI

    immuta api sdd/classifier -X POST --input ./example-payload.json
    

    HTTP API

    curl \
        --request POST \
        --header "Content-Type: application/json" \
        --header "Authorization: 12345678900000" \
        --data @example-payload.json \
        https://your-immuta-url.immuta.com/sdd/classifier
    
  4. If the request is successful, you will receive a response that contains details about the classifier.

    {
      "createdBy": {
        "id": 1,
        "name": "John",
        "email": "john@example.com"
      },
      "name": "SOCIAL_SECURITY_NUMBER_COLUMNS_CLASSIFIER",
      "displayName": "Social Security Number Columns Classifier",
      "description": "This classifier identifies column names that match the defined regex pattern.",
      "type": "columnNameRegex",
      "config": {
        "tags": [
          "Discovered.Social Security Number"
        ],
        "columnNameRegex": "^ssn|social ?security$"
      },
      "id": 2,
      "createdAt": "2021-10-14T18:48:56.289Z",
      "updatedAt": "2021-10-14T18:48:56.289Z"
    }
    

What's Next

Continue to one of the following tutorials:

  • Run Sensitive Data Discovery on Data Sources: Trigger SDD to run on specified data sources.
  • Create a Template: Although only Data Governors can create classifiers, Data Owners can add classifiers to templates, which they then apply to their data sources to override minConfidence or tags for classifiers within the template.