Skip to content

You are viewing documentation for Immuta version 2022.5.

For the latest version, view our documentation for Immuta SaaS or the latest self-hosted version.

Run Sensitive Data Discovery on Data Sources

Audience: Data Governors and Data Owners

Content Summary: This page outlines how to run Sensitive Data Discovery on your data sources using the Immuta CLI and HTTP API. See the Additional Tutorials section for details about triggering SDD in the Immuta UI.

Additional Tutorials:

Attributes Overview

Attributes of all custom classifiers and templates are provided on the Sensitive Data Discovery API page. However, attributes specific to this section are outlined below.

Attribute Description
sources string The name of the data sources to apply the template to.
all boolean If true, SDD will run on all Immuta data sources. The default is false.
wait integer The number of seconds to wait for the SDD jobs to finish. The value -1 will wait until the jobs complete. The default is -1.
dryRun boolean When true, SDD will not update the tags on the data source(s) and will just return what tags would have been applied or removed. See this section for an example. Default is false.
template string If passed, Immuta will run SDD with this template instead of the applied template on the data source(s). Passing template when dryRun is false will cause an error.

Run SDD on Data Sources

  1. Specify the data sources you would like to run SDD on, and save the payload in a .json file.

    {
      "sources": [
        "Insurance Data"
      ]
    }
    

    Or choose to run SDD on all the data sources in Immuta, and save the payload in a .json file.

    {
      "all": true
    }
    
  2. Trigger SDD using one of these methods:

    Immuta CLI

    immuta api sdd/run -X POST --input ./example-payload.json
    

    HTTP API

    curl \
        --request POST \
        --header "Content-Type: application/json" \
        --header "Authorization: Bearer dea464c07bd07300095caa8" \
        --data @example-payload.json \
        https://your-immuta-url.immuta.com/sdd/run
    

If Sensitive Data Discovery was successfully run, you will receive a response similar to this:

{
  "Insurance Data": {
    "id": "d2edc1d0-328c-11ec-9d5a-6793988ccf95",
    "state": "completed",
    "output": {
      "diff": {
        "addedTags": {
          "ssn": [
            "Discovered.PII"
          ],
          "email": [
            "Discovered.PII"
          ]
        },
        "removedTags": {
          "ssn": [
            "Discovered.Country.US"
          ]
        }
      },
      "sddTagResult": {
        "ssn": [
          "Discovered.Entity.Social Security Number",
          "Discovered.Identifier Direct",
          "Discovered.PHI",
          "Discovered.PII"
        ],
        "email": [
          "Discovered.Entity.Electronic Mail Address",
          "Discovered.Identifier Direct",
          "Discovered.PHI",
          "Discovered.PII"
        ]
      }
    }
  }
}

Additional Tutorials

Test SDD on a Data Source

Users can test how SDD will apply tags to their data sources by completing a dryRun, which allows users to test templates and tags:

  • test templates: If a template is specified in the payload when the dryRun is true, SDD will use this template instead of the template applied to the data source. Note: SDD will error if a template is specified here when dryRun is false.

  • test tags: Instead of applying tags, SDD just returns the tags that would be applied to the data source. This allows users to evaluate whether or not classifiers or templates are applying tags correctly without updating the data source.

After evaluating whether or not the tags have been applied appropriately, users can then make necessary changes to a template before triggering SDD again.

To complete a dryRun,

  1. Specify the data sources you would like to run Sensitive Data Discovery on and set dryRun to true in the payload in a .json file. Note: You can also apply a template to a data source as a dryRun, like in the example below. However, when dryRun is false, a template cannot be included in the payload. Instead, the template must be added to the data source before running SDD.

    {
      "sources": [
        "Medical Claims"
      ],
      "dryRun": true,
      "template": "PII_REVISION"
    }
    
  2. Trigger SDD using one of these methods:

    Immuta CLI

    immuta api sdd/run -X POST --input ./example-payload.json
    

    HTTP API

    curl \
        --request POST \
        --header "Content-Type: application/json" \
        --header "Authorization: Bearer dea464c07bd07300095caa8" \
        --data @example-payload.json \
        https://your-immuta-url.immuta.com/sdd/run
    
  3. You will receive a response that illustrates tags that will be added, tags that will be removed, and the final SDD result:

    {
      "Medical Claims": {
        "id": "86fc4f70-380f-11ec-a432-81748c911385",
        "state": "completed",
        "output": {
          "diff": {
            "addedTags": {},
            "removedTags": {
              "dob": [
                "Discovered.Entity.Date",
                "Discovered.Entity.Date of Birth",
                "Discovered.Identifier Indirect",
                "Discovered.PHI",
                "Discovered.PII"
              ],
              "ssn": [
                "Discovered.Country.US",
                "Discovered.Entity.Social Security Number",
                "Discovered.Identifier Direct",
                "Discovered.PHI"
              ],
              "state": [
                "Discovered.Country.US",
                "Discovered.Entity.Location",
                "Discovered.Entity.State",
                "Discovered.Identifier Indirect"
              ],
              "gender": [
                "Discovered.Entity.Gender",
                "Discovered.Identifier Indirect",
                "Discovered.PHI",
                "Discovered.PII"
              ],
              "date_of_service": [
                "Discovered.Entity.Date",
                "Discovered.Identifier Indirect",
                "Discovered.PHI",
                "Discovered.PII"
              ]
            }
          },
          "sddTagResult": {
            "ssn": [
              "Discovered.PII"
            ]
          }
        }
      }
    }
    
  4. Once you are satisfied with how tags are applied by SDD, set dryRun to false (or omit it from the payload).

    {
      "sources": [
        "Medical Claims"
      ],
      "dryRun": false
    }
    
  5. Trigger SDD again:

    Immuta CLI

    immuta api sdd/run -X POST --input ./example-payload.json
    

    HTTP API

    curl \
        --request POST \
        --header "Content-Type: application/json" \
        --header "Authorization: Bearer dea464c07bd07300095caa8" \
        --data @example-payload.json \
        https://your-immuta-url.immuta.com/sdd/run
    
  6. If the request was successful, you will receive a response similar to this one:

    {
      "Medical Claims": {
        "id": "2afcfe00-3813-11ec-b171-9331e3d3aa04",
        "state": "completed",
        "output": {
          "diff": {
            "addedTags": {},
            "removedTags": {
              "dob": [
                "Discovered.Entity.Date",
                "Discovered.Entity.Date of Birth",
                "Discovered.Identifier Indirect",
                "Discovered.PHI",
                "Discovered.PII"
              ],
              "ssn": [
                "Discovered.Country.US",
                "Discovered.Entity.Social Security Number",
                "Discovered.Identifier Direct",
                "Discovered.PHI"
              ],
              "state": [
                "Discovered.Country.US",
                "Discovered.Entity.Location",
                "Discovered.Entity.State",
                "Discovered.Identifier Indirect"
              ],
              "gender": [
                "Discovered.Entity.Gender",
                "Discovered.Identifier Indirect",
                "Discovered.PHI",
                "Discovered.PII"
              ],
              "date_of_service": [
                "Discovered.Entity.Date",
                "Discovered.Identifier Indirect",
                "Discovered.PHI",
                "Discovered.PII"
              ]
            }
          },
          "sddTagResult": {
            "ssn": [
              "Discovered.PII"
            ]
          }
        }
      }
    }
    

Trigger SDD in the Immuta UI

  1. Select a data source from your My Data Sources page.
  2. Click the Health Check dropdown menu.
  3. In the Sensitive Data Discovery (SDD) section, click Re-run.

    Run SDD

What's Next

Continue to one of the following tutorials: