Classifier Reference
Audience: Data Owners and Governors
Content Summary: Immuta's Sensitive Data Discovery comes with built-in classifiers that are used to detect and apply tags to sensitive data. This page defines these built-in classifier references, which you can view by alphabetical order.
Classifier Descriptions and Tags
Classifier | Description |
---|---|
AGE | Detects numeric strings between 10 and 199, provided the column header contains text such as age , year , years , yr , or yrs . Tags include Discovered.PII , Discovered.Identifier , Indirect Discovered.PHI , Discovered.Entity.Age . |
ARGENTINA_DNI_NUMBER | Detects strings consistent with Argentina National Identity (DNI) Number. Requires an eight-digit number with optional periods between the second and third and fifth and sixth digit. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.Argentina , Discovered.PHI , Discovered.Entity.DNI Number . |
AUSTRALIA_MEDICARE_NUMBER | Detects numeric strings consistent with Australian Medicare number. Requires a ten- or eleven-digit number. The starting digit must be between 2 and 6, inclusive. Optional spaces can be placed between the fourth and fifth and ninth and tenth digit. The optional 11th digit separated by a / can be present. A checksum is required. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.Australia , Discovered.PHI , Discovered.Entity.Medicare Number . |
AUSTRALIA_PASSPORT | Detects strings consistent with Australian Passport number. An 8- or 9-character string is required, with a starting upper case character (N, E, D, F, A, C, U, X) or a two-character starting character (P followed by A, B, C, D, E, F, U, W, X, or Z) followed by seven digits. Tags include Discovered.PII , Discovered.Identifier Direct Discovered.Country.Australia , Discovered.PHI , Discovered.Entity.Passport . |
AUSTRALIA_TAX_FILE_NUMBER | Detects strings consistent with Australian Tax File number. Requires a nine-digit number with optional spaces between the third and fourth and sixth and seventh digits. A checksum is required. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.Australia , Discovered.PHI , Discovered.Entity.Tax File Number . |
BELGIUM_NATIONAL_ID_CARD_NUMBER | Detects numeric strings consistent with Belgium's National ID card. Requires a twelve-digit number with hyphen (- ) between the third and fourth digit and tenth and eleventh digits. A two checksum is required. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.Belgium , Discovered.PHI , Discovered.Entity.National ID Card Number . |
BITCOIN_INVOICE_ADDRESS | Detects strings consistent with the following Bitcoin Invoice Address formats: P2PKH, P2SH, and Bech32. P2PKH and P2SH must start with a 1 or a 3, respectively, followed by 25 - 34 alphanumeric characters, excluding l, I, O, and 0. Bech32 formats must begin with bc1 and be followed by 39 characters. To be identified, any addresses must have a valid checksum. Tags include Discovered.Entity.CRYPTO , Discovered.PCI . |
BRAZIL_CPF_NUMBER | Detects a numeric string consistent with Brazil's CPF (Cadastro Pessoal de Pessoa Física) number. An eleven-digit numeric string with non-numeric separators after the third, sixth, and ninth digits. A two digit checksum is required. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.Brazil , Discovered.PHI , Discovered.Entity.CPF Number . |
CANADA_BC_PHN | Detects numeric strings consistent with British Columbia's Personal Health Number (PHN). Requires a ten-digit numeric string with optional hyphen (- ) or spaces after the fourth and seventh digits. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.Canada , Discovered.PHI , Discovered.Entity.British Columbia Health Network Number . |
CANADA_DRIVERS_LICENSE_NUMBER | Detects strings consistent with Canadian driver's license numbers from each province. Looks for strings to be consistent with at least one of seven patterns. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.Canada , Discovered.PHI , Discovered.Entity.Drivers License Number . |
CANADA_OHIP | Detects alphanumeric strings consistent with Ontario's Health Insurance Plan (OHIP). Requires a twelve-digit alphanumeric code. Optional hyphens (- ) or spaces can appear after the fourth, seventh, and tenth digits. The final two characters are a checksum. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.Canada , Discovered.PHI , Discovered.Entity.Ontario Health Insurance Number . |
CANADA_PASSPORT | Detects strings consistent with the Canadian Passport Number format as described here. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.Canada , Discovered.PHI , Discovered.Entity.Passport . |
CANADA_QUEBEC_HIN | Detects alphanumeric strings consistent with Quebec's Health Insurance Number (HIN). Requires four alphabetic characters followed by an optional space or hyphen (- ), and then eight digits with an optional hyphen or space after the fourth digit. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.Canada , Discovered.PHI , Discovered.Entity.Quebec Health Insurance Number . |
CANADA_SOCIAL_INSURANCE_NUMBER | Detects numeric strings consistent with the Canadian Social Insurance number format. Requires a nine-digit numeric string with optional hyphens or spaces after the third and sixth digit. The last digit is a checksum. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.Canada , Discovered.PHI , Discovered.Entity.Social Insurance Number . |
CREDIT_CARD_NUMBER | Detects strings consistent with a credit card number. Must include a valid checksum. Tags include Discovered.PCI , Discovered.Entity.Credit Card Number . |
DATE | Detects strings consistent with dates. These can include days of the week, dates, and date times. Tags include Discovered.Entity.Date . |
DATE_OF_BIRTH | Detects date strings as Date of Birth if the column heading is dob , birth , etc. Tags include Discovered.PII , Discovered.Identifier Indirect , Discovered.PHI , Discovered.Entity.Date of Birth . |
DENMARK_CPR_NUMBER | Detects numeric strings consistent with Personal Identification Number (CPR-number or Person-number). Requires a ten-digit number with either a DDMMYY-SSSS or DDMMYYSSSS format. The first six digits are an individual's birth date in Day, Month, Year format. The final four digits comprise the sequence number. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.Denmark , Discovered.PHI , Discovered.Entity.CPR Number . |
DOMAIN_NAME | Detects domain names using a very broad pattern. Tags include Discovered.Entity.Domain Name |
EMAIL_ADDRESS | Detect strings consistent with an email address. Usernames are required to be fewer than 255 characters, follow by @a , a domain of fewer than 255 characters, and a top level domain of between 2 and 20 characters. Tags include Discovered.PHI , Discovered.Entity.Electronic Mail Address , Discovered.Identifier Direct . |
ETHNIC_GROUP | Detects strings consistent with the US Census race designations. Tags include Discovered.PII , Discovered.Entity.Ethnic Group . |
FDA_CODE | Detects a string consistent with a drug or ingredient registered with Food and Drug Administration (FDA). Must start with between 4 to 6 digits, followed by a hyphen, followed by 3 to 4 digits, followed by a hyphen, and finishing with one to two digits. Tags include Discovered.Country.US , Discovered.Entity.FDA Code . |
FINLAND_NATIONAL_ID_NUMBER | Detects a string consistent with Finland's National ID number. Requires an eleven-character string in a DDMMYYCZZZQ format. The first six digits are an individual's birth date in Day, Month, Year format. The C character is a century of birth indicator (+ for the years 1800-1899, - for years 1900-1999, and A for years 2000-2099). ZZZ is an individual ID number, and Q is a required checksum. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.Finland , Discovered.PHI , Discovered.Entity.National ID Number . |
FRANCE_CNI | Detects numeric strings consistent with the French National ID card number (carte nationale d'identité). Requires a twelve-digit numeric string. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.France , Discovered.PHI , Discovered.Entity.CNI . |
FRANCE_NIR | Detects numeric strings consistent with France's National ID number (Numéro d'Inscription au Répertoire). Requires a fifteen-digit numeric string. An optional hyphen (- ) or space can appear after the 13th digit. The 14th and 15th digits act as a checksum. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.France , Discovered.PHI , Discovered.Entity.NIR . |
FRANCE_PASSPORT | Detects alphanumeric strings consistent with the French Passport number. Requires two numbers followed by two upper case letters and ends with 5 digits. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.France , Discovered.PHI , Discovered.Entity.Passport . |
GENDER | Detects strings consistent with gender. Tags include Discovered.PII , Discovered.Identifier Indirect , Discovered.PHI , Discovered.Entity.Gender . |
GERMANY_DRIVERS_LICENSE_NUMBER | Detects alphanumeric strings consistent with Germany's Driver's License number. Requires an eleven-element string, with a digit or a letter followed by two digits, 6 digits or letters, one digit, and one digit or letter. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.Germany , Discovered.PHI , Discovered.Entity.Drivers License Number . |
GERMANY_IDENTITY_CARD_NUMBER | Detects alphanumeric strings consistent with Germany's Identity Card number. Requires a single letter followed by eight digits. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.Germany , Discovered.PHI , Discovered.Entity.Identity Card Number . |
IBAN_CODE | Detects strings consistent with an International Bank Account Number (IBAN). Must contain a valid country code. Tags include Discovered.Entity.IBAN Code . |
ICD10_CODE | Detects strings consistent with codes from the International Statistical Classification of Diseases and Related Health Problems (ICD), as drawn from the Clinical Modification lexicon from the year 2020. Tags include Discovered.Entity.ICD10 Code . |
IMEI_HARDWARE_ID | Detects strings consistent with an International Mobile Equipment Identity (IMEI) number. Must contain 15 digits with optional hyphens or spaces after the second, 8th, and 14th digits. Tags include Discovered.Entity.IMEI . |
IP_ADDRESS | Detects IP Addresses in the V4 and V6 formats. Tags include Discovered.Entity.IP Address . |
LOCATION | Detects strings consistent with Countries, States, Addresses, or Municipalities. By default focuses on locations in the United States. Tags include Discovered.Entity.Location . |
MAC_ADDRESS | Detects strings consistent with a Media Access Control (MAC) address. Must contain twelve hexadecimal digits, with every two digits separated by a colon. Tags include Discovered.Entity.MAC Address . |
MAC_ADDRESS_LOCAL | Detects strings consistent with a local Media Access Control (MAC) address. Tags include Discovered.Entity.MAC Address Local . |
PERSON_NAME | Detects strings consistent with a dictionary of people's names. US person names are drawn from the US Social Security database. Tags include Discovered.PII , Discovered.PHI , Discovered.Entity.Person Name , Discovered.Identifier Indirect . |
PHONE_NUMBER | Detects strings consistent with telephone numbers. Primarily looks for strings consistent with the United States telephone numbers naming convention. Tags include Discovered.Entity.Telephone Number . |
POSTAL_CODE | Detects strings consistent with a valid US zip code with an optional +4. Only valid 5 digit zip codes are detected. Tags include Discovered.Entity.Postal Code . |
SPAIN_DRIVERS_LICENSE_NUMBER | Detects alphanumeric strings consistent with Spain's Driver's License number. Requires eight digits followed by a single letter or digit. The final digit acts as a checksum. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.Spain , Discovered.PHI , Discovered.Entity.Drivers License Number . |
SPAIN_NIE_NUMBER | Detects strings consistent with Spain's Foreigner Identification number. Requires an eight-character string. The initial character must be X, Y, or Z, followed by seven digits, then by an optional hyphen or space and a single checksum character. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.Spain , Discovered.PHI , Discovered.Entity.NIE Number . |
SPAIN_NIF_NUMBER | Detects strings consistent with Spain's Tax Identification number. Requires an eight-character string. Requires eight digits followed by an optional hyphen or space and a single checksum character. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.Spain , Discovered.PHI , Discovered.Entity.NIF Number . |
SPAIN_PASSPORT | Detects strings consistent with Spain's Passport number. Requires an eight- or nine-character string, starting with either two or three letters followed by six digits. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.Spain , Discovered.PHI , Discovered.Entity.Passport . |
STREET_ADDRESS | Detects strings consistent with street addresses. Primarily looks for strings consistent with the United States street naming convention. Tags include Discovered.Entity.Location . |
SWEDEN_NATIONAL_ID_NUMBER | Detects numeric strings consistent with Sweden's Nation ID number. Requires a ten- or twelve-digit string that must start with a date in either the YYMMDD or YYYYMMDD formats. An optional - or + character then separates four ending digits. The final digit is a checksum. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.Sweden , Discovered.PHI , Discovered.Entity.National ID Number . |
SWEDEN_PASSPORT | Detects numeric strings consistent with Sweden's Passport number. Requires an 8-digit number. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.Sweden , Discovered.PHI , Discovered.Entity.Passport . |
SWIFT_CODE | Detects alphanumeric strings consistent with a SWIFT code (or Bank Identifier Code (BIC)) format. Tags include Discovered.Entity.Swift Code . |
THAILAND_NATIONAL_ID_NUMBER | Detects strings consistent with Thailand's National ID number. Requires a 13-digit number with optional spaces or hyphens (- ) after the first, fifth, tenth, and twelfth digits. The final digit is a checksum. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.Thailand , Discovered.PHI , Discovered.Entity.National ID Number . |
TIME | Detects strings consistent with times. Can contain both date and time pieces. Tags include Discovered.Entity.Date . |
UK_DRIVERS_LICENSE_NUMBER | Detects alphanumeric strings consistent with the United Kingdom's Driver's License number. Requires either a 16- or 18-character string. The first five characters represent the driver's surname, padded with 9 s, followed by a single digit for decade of birth, two digits for month of birth (incremented by 50 for female drivers), two digits for day of birth, one digit for year of birth, two letters, an arbitrary digit, and two digits. Two additional digits can be present for each license issuance. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.UK , Discovered.PHI , Discovered.Entity.Drivers License Number . |
UK_NATIONAL_INSURANCE_NUMBER | Detects alphanumeric strings consistent with the United Kingdom's National Insurance number. Requires a nine-character string. The first two digits must be letters, followed by an optional space, then six digits with optional spaces or hyphens (- ) every two digits, ending with a letter. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.UK , Discovered.PHI , Discovered.Entity.National Insurance Number . |
UK_PASSPORT | Detects numeric strings consistent with the United Kingdom's passport number. Requires a nine-digit numeric string. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.UK , Discovered.PHI , Discovered.Entity.Passport . |
UK_TAXPAYER_REFERENCE | Detects ten-digit numeric strings consistent with UK Taxpayer Reference (UTR) numbers. The final digit is a checksum. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.UK , Discovered.PHI , Discovered.Entity.Taxpayer Reference . |
URL | Detects string consistent with a Uniform Resource Locator (URL). String must begin with http:// , https:// , ftp:// , file:/// , or mailto: , followed by a string and ending with a top level domain of no more than 128 characters. Tags include Discovered.Entity.URL . |
US_ADOPTION_TAXPAYER_IDENTIFICATION_NUMBER | Detects a numeric string consistent United States Adoption Taxpayer Identification Number (ATIN). Requires a string similar in format to a US Social Security Number, but starting with a 9 in the Area Number and having 93 as an allowed Group Number. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.US , Discovered.PHI , Discovered.Entity.Adoption Taxpayer ID Number . |
US_BANK_ROUTING_MICR | Detects numeric string consistent with an American Bankers Association (ABA) Routing Number. Must be a nine-digit number starting with 0, 1, 2, 3, 6, or 7, followed by eight digits. The final digit is a checksum. Tags include Discovered.Country.US , Discovered.Entity.Bank Routing MICR . |
US_DEA_NUMBER | Detects alphanumeric strings consistent with a Drug Enforcement Administration (DEA) number that is assigned to a health care provider. Must be a length of nine characters. The first two digits must be alphanumeric, and the last seven digits must be digits. The final digit is a checksum. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.US , Discovered.Entity.DEA Number . |
US_DRIVERS_LICENSE_NUMBER | Detects strings consistent with some US Driver's license numbers. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.US , Discovered.PHI , Discovered.Entity.Drivers License Number . |
US_EMPLOYER_IDENTIFICATION_NUMBER | Detects numeric string consistent United States Employer Identification Number (EIN). Strings must contain nine digits with a hyphen after the second digit. Tags include Discovered.Country.US , Discovered.Entity.Employer ID Number . |
US_HEALTHCARE_NPI | Detects numeric strings consistent with US National Provider Identifier (NPI). Strings must be either 10 or 15 digits with the final digit being a valid checksum. Tags include Discovered.PII , Discovered.Country.US , Discovered.Entity.Healthcare NPI , Discovered.Identifier Undetermined . |
US_INDIVIDUAL_TAXPAYER_IDENTIFICATION_NUMBER | Detects a numeric string consistent United States Individual Taxpayer Identification Number (ITIN). Requires a string similar in format to a US Social Security Number, but starting with a 9 in the Area Number and having a limited set of allowed Group Numbers. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.US , Discovered.PHI , Discovered.Entity.Individual Taxpayer ID Number . |
US_PASSPORT | Detects numeric strings consistent with United States Passport number. Strings must contain nine digits. Columns should have a name or label consistent with a passport. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.US , Discovered.PHI , Discovered.Entity.Passport . |
US_PREPARER_TAXPAYER_IDENTIFICATION_NUMBER | Detects strings consistent with a Preparer Taxpayer ID number. Strings must have nine characters, starting with a P that is followed by 8 digits. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.US , Discovered.Entity.Preparer Taxpayer ID Number . |
US_SOCIAL_SECURITY_NUMBER | Detects strings consistent with a US Social Security Number. Strings must contain nine digits and comprise three parts: the three left-most digits designating the area number, the middle two digits designating the group number, and the four right-most digits designating the serial number. For a column to be tagged, none of these parts can contain all zeroes, and area numbers must not be 666 or in the range of 900-999. Tags include Discovered.PII , Discovered.Identifier Direct , Discovered.Country.US , Discovered.PHI , Discovered.Entity.Social Security Number . |
US_STATE | Detects strings consistent with either a full name or two-letter abbreviation of a US state or territory. Tags include Discovered.Country.US , Discovered.Entity.State . |
US_TOLLFREE_PHONE_NUMBER | Detects strings consistent with a US toll-free telephone number. Allowed area codes are 800, 88+any digit, or 899. Tags include Discovered.Country.US , Discovered.Entity.Tollfree Telephone Number . |
VEHICLE_IDENTIFICATION_NUMBER | Detects strings consistent with Vehicle Identification Numbers. A checksum is required as well as a valid World Manufacturer Identifier. Tags include Discovered.Country.US , Discovered.Entity.Vehicle Identifier or Serial Number . |