icon-unified.svg
Experience Center

Understanding Predefined DLP Dictionaries

Zscaler provides the following Data Loss Prevention (DLP) dictionaries. Dictionaries marked with an asterisk (*) are not supported for Endpoint DLP. To learn more, see About Endpoint Data Loss Prevention (DLP). To learn more about configuring predefined DLP dictionaries, see Editing Predefined DLP Dictionaries.

  • This dictionary detects Aadhaar Unique Identification (UID) numbers from India.

    The popular format for an Aadhaar UID number is a 12-digit number, separated at the 4th and 8th digits by a delimiter. The delimiter can be a period, hyphen, or space.

    The following are examples of popular formats:

    • NNNNNNNNNNNN
    • NNNN-NNNN-NNNN

    This dictionary uses the Verhoeff checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    Low

    The dictionary counts an instance as a violation if the Aadhaar UID number matches a valid range.

    The Aadhaar UID number can contain:

    • A period, hyphen, or space as delimiters. Multiple periods, hyphens, and spaces are allowed.
    • An alphabetical boundary.

    The number formats that can trigger the dictionary are:

    • Q2161 6729 3627O
    • 8384-2795-9970

    A number format like 2@075-8515-612d5 does not trigger the dictionary.

    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The Aadhaar UID number is in a popular format.

    The Aadhaar UID number can contain a non-alphanumeric boundary. It cannot be bound by alphabetical characters.

    The Aadhaar UID number must use the same delimiters for the full number.

    The number formats that can trigger the dictionary are:

    • @216167293627@
    • 8384-2795-9970

    The number formats that do not trigger the dictionary are:

    • 2075-8515/6125
    • F838427959970B
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The Aadhaar UID number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, Aadhaar Card, UID, or UIDAI number.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • @216167293627@
    • 8384-2795-9970

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 2075-8515/6125
    • F838427959970B
    Close
  • This dictionary detects ABA routing transit numbers from the United States.

    The popular format for an ABA routing transit number is a 9-digit number that has no delimiters.

    An example of a popular format is NNNNNNNNN.

    This dictionary uses the Luhn checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    Low

    The dictionary counts an instance as a violation if the ABA routing transit number matches a valid range.

    The ABA routing transit number can contain a period, hyphen, or space as delimiters. Multiple periods, hyphens, and spaces are allowed.

    The number formats that can trigger the dictionary are:

    • 021---302---567
    • 053 9021 97
    • QQ036001808//
    • 011...600...033

    The number formats that do not trigger the dictionary are:

    • 50dd86d97067
    • aa8543210bb74
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The ABA routing transit number is in a popular format.

    The ABA routing transit number can not contain a period, hyphen, or space as delimiters. No delimiters are allowed.

    The number formats that can trigger the dictionary are:

    • 053902197
    • 036001808
    • 011600033

    The number formats that do not trigger the dictionary are:

    • 021---302---567
    • 50dd86d97067
    • aa8543210bb74
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The ABA routing transit number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, aba routing number, aba number, american bank association routing number, bank routing number, or routing transit number.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 021302567
    • 053902197
    • 036001808
    • 011600033

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 50dd86d97067
    • aa8543210bb74
    Close
  • This dictionary is disabled by default. To access this feature, contact your Zscaler Account team.

    This dictionary detects content related to addresses from Japan.

    This dictionary does not use a checksum.

    The following are examples of acceptable formats for Japanese Addresses:

    • 〒064-0807 北海道札幌市中央区南七条西2ー4ー4 (full address with hyphen as delimiter)
    • 〒064-0807 北海道札幌市中央区南七条西2丁目4番4号 (full address with kanji]
    • 札幌市中央区南七条西2の4の4 (postal code optional)
    • 札幌市中央区南七条西2丁目4-4 (postal code is optional)
    • 新宿区高田馬場1-7-2 (postal code and prefecture are optional)
    • 新宿区高田馬場1丁目7の2 (postal code and prefecture are optional)

    You can also specify an Action to configure how the dictionary evaluates matching names:

    • Count All: The dictionary counts all matches of the name, including identical names, toward the match count.
    • Count Unique: The dictionary counts each unique match of the name toward the match count only once, regardless of how many times the name appears.
    Close
  • This dictionary detects adult or mature content.

    The popular format for adult or mature content is sexually explicit keywords, as well as foul language words.

    This dictionary does not use a checksum.

    You can modify the Confidence Score Threshold. Confidence scores inform the dictionary how high it must raise the bar, or threshold, for identifying violations and triggering them. To learn more, see Configuring the Confidence Score Threshold.

    Close
  • This dictionary detects passport numbers (AUPP) from Australia.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the AUPP number matches a valid range.

    A number format like N1234567 triggers the dictionary.

    The number formats that do not trigger the dictionary are:

    • N 1234567
    • N12345 67
    • Q1234567 (invalid first letter)
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The AUPP number is in a popular format.

    A number format like N1234567 triggers the dictionary.

    The number formats that do not trigger the dictionary are:

    • N 1234567
    • N12345 67
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The AUPP number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, passport, passport number, passportno, and passport num.

    A number format like N1234567 triggers the dictionary.

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • N 1234567
    • N12345 67
    Close
  • This dictionary detects uniform civil numbers (EGN) from Bulgaria.

    This dictionary uses the Modulo 11 checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the EGN number matches a valid range.

    The number formats that can trigger the dictionary are:

    • 752 316 9263
    • 7 -52/.316 926...- 3

    A number format like 7523169264 (invalid checksum) will not trigger the dictionary.

    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The EGN number is in a popular format.

    A number format like 7523169263 triggers the dictionary.

    The number formats that do not trigger the dictionary are:

    • 752 316 9263
    • 7 -52/.316 926...- 3
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The PTTIN number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, bucn, egn, vim, uniform civil number, and unified civil number.

    A number format like 7523169263 triggers the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases.

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 752 316 9263
    • 7 -52/.316 926...- 3
    Close
  • This dictionary detects citizen service numbers (BSN) from the Netherlands.

    The popular format for a citizen service number is a 9-digit number that can be separated by a hyphen after 3 digits.

    An example of a popular format is NNNNNNNNN.

    This dictionary uses the Mod 11 Check Digit checksum. This checksum is similar to the Luhn checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the citizen service number matches a valid range.

    The number formats that can trigger the dictionary are:

    • 132...240774
    • 096195435
    • 25 8662700
    • 2444--52155
    • 041143401
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The citizen service number is in a popular format.

    The number formats that can trigger the dictionary are:

    • 096195435
    • @244452155@
    • 041143401

    The number formats that do not trigger the dictionary are:

    • 132...240774
    • 25 8662700
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The citizen service number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, Citizen service number, BSN, Burgersservicenumber, Sofinummer, Persoonsgebonden nummer, Persoonsnummer, or Personal Number.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 096195435
    • @244452155@
    • 041143401

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 132...240774
    • 25 8662700
    Close
  • This dictionary detects the 14-digit Brazilian National Registry of Legal Entities (CNPJ) number.

    This dictionary uses the Mod 11 checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the CNPJ number matches a valid range.

    The number formats that can trigger the dictionary are:

    • 44 455 566 0001 88
    • 44.455/5660001...88
    • 44 -455/ .566 0001...-88

    A number format like 44455566000185 does not trigger the dictionary (bad checksum).

    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The CNPJ number is in a popular format.

    A number format like 44.455.566/0001-88 triggers the dictionary.

    The number formats that do not trigger the dictionary are:

    • 44455566000188
    • 44 455 566 0001 88
    • 44 -455/ .566 0001...-88
    • 44.455.566/0001 88
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The CNPJ number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, CNPJ, Cadastro Nacional da Pessoa Jurídica, National Registry of Legal Entities, and CNPJ#.

    A number format like 44.455.566/0001-88 triggers the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases.

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 44455566000188
    • 44 455 566 0001 88
    • 44 -455/ .566 0001...-88
    • 44.455.566/0001 88
    Close
  • This dictionary detects corporate finance documents, like earnings reports, Form 10-K, etc.

    Zscaler supports only the following document types for corporate finance documents: RTF, PDF, MSG, DOC, DOCX, DOCM, DOTX, DOTM, XLS, XLSX, XLSM, XLTM, PPT, PPTX, PPSX, PPTM, POTM, POTX, and IWORK (pages, numbers, and keynote). The minimum size required for a document is 1 KB.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold Criteria
    LowThis dictionary counts an instance as a violation if the Machine Learning (ML) match score is 50 or more.
    MediumThis dictionary counts an instance as a violation if the ML match score is 70 or more.
    HighThis dictionary counts an instance as a violation if the ML match score is 90 or more.
    Close
  • This dictionary detects Corporate Numbers from Japan.

    You can modify the Confidence Score Threshold:

    • Low: The dictionary counts an instance as a violation if it matches a valid range.
    • Medium: The dictionary counts an instance as a violation if:
      • The requirements of Low Confidence are met.
      • The Corporate Number is in a popular format.
    • High: The dictionary counts an instance as a violation if:
      • The requirements of Medium Confidence are met.
      • The Corporate Number is accompanied by any of the dictionary’s default or custom high confidence phrases.
    Close
  • This dictionary detects court documents, like attorney forms, witness subpoenas, etc.

    Zscaler supports only the following document types for court documents: RTF, PDF, MSG, DOC, DOCX, DOCM, DOTX, DOTM, XLS, XLSX, XLSM, XLTM, PPT, PPTX, PPSX, PPTM, POTM, POTX, and IWORK (pages, numbers, and keynote).The minimum size required for a document is 1 KB.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold Criteria
    LowThis dictionary counts an instance as a violation if the Machine Learning (ML) match score is 50 or more.
    MediumThis dictionary counts an instance as a violation if the ML match score is 70 or more.
    HighThis dictionary counts an instance as a violation if the ML match score is 90 or more.
    Close
  • This dictionary allows you to select one or more credentials and secrets dictionaries (i.e., tokens, keys, passwords, etc.) for the following:

    • Amazon MWS Auth Token
    • Git Token
    • GitHub Token
    • Google API Key
    • Google OAuth Access Token
    • Google OAuth ID
    • JWT Token
    • PayPal Braintree Access Token
    • Picatic API Key
    • Private Key
    • SendGrid API Key
    • Slack Access Token
    • Slack Webhook
    • Square Access Token
    • Square OAuth Secret
    • Stripe API Key

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold Criteria
    MediumThis dictionary counts an instance as a violation if any of the selected sensitive credentials and secrets are matched.
    High

    This dictionary counts an instance as a violation if any of the selected sensitive credentials and secrets are matched with a custom high confidence phrase.

    If you set the Confidence Score Threshold to High, you must specify at least one Custom High Confidence Phrase.

    Close
  • This dictionary detects content related to credit card numbers.

    When you add the Credit Cards dictionary to a DLP engine, you must configure the match count. To learn more, see Configuring the Match Count.

    The popular format for a credit and debit card number is a range from 12-19 numeric digits, separated after every 4 digits by a delimiter that is checked for similarity and is correctly spaced. The delimiter can be a slash, period, hyphen, or space.

    The following are examples of popular formats:

    • AMEX (15 digits)
      • NNNN<delimiter>NNNNNN<delimiter>NNNNN
    • Diner's Club (14 digits)
      • NNNN<delimiter>NNNNNN<delimiter>NNNN
    • China UnionPay (16 digits beginning with 62 or 60)
      • 62NNNNNNNNNNNNN
      • 62NNNNNNNNNNNNNNNNN
    • Maestro Debit Card (16–19 digits starting with 50, 56, 57, 58, 6013, 62, 63, or 67)
      • 50NN-NNNN-NNNN
      • 6013 NNNN NNNN
    • Other Credit Cards (16 digits)
      • NNNN<delimiter>NNNN<delimiter>NNNN<delimiter>NNNN
      • NNNN NNNN NNNN NNNN

    This dictionary uses the Luhn checksum.

    You can also specify whether to include or exclude specific Bank Identification Numbers (BIN) in credit card dictionaries. You can configure up to 512 BIN values. To learn more, see Editing Predefined DLP Dictionaries.

    To use this dictionary to detect credit card numbers, the following guidelines apply:

    These guidelines might not apply to certain use cases. You can adjust your dictionary configurations according to your DLP needs.

    • To catch a small exfiltration of numbers in documents:
      • Set a Confidence Score Threshold of High for the dictionary.
      • When adding a dictionary to an engine, set a low value for the dictionary's match count.
    • To catch a large exfiltration of numbers in spreadsheets or other file types:
      • Set a Confidence Score Threshold of Medium for the dictionary.
      • When adding a dictionary to an engine, set a high value for the dictionary’s match count.
    • In general, configuring a dictionary with a Low Confidence Score Threshold and a low match count value results in too many false positives. Zscaler recommends setting a Confidence Score Threshold of High when its match count value is low.
    • Rich Text Format (RTF) files contain formatting code that can mimic credit card and social security numbers, affecting when a DLP rule is triggered. Plain text files do not contain this formatting code, therefore the DLP rule triggers as expected. So that the DLP policy triggers if confidential numbers are leaked in RTF files, do one of the following steps:
      • Set any value greater than 1 for the dictionary’s match count in the engine.
      • Set a Confidence Score Threshold value of High.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    Low

    The dictionary counts an instance as a violation if:

    • The credit card number matches the length of at least one provider.
    • The credit card number can be validated by Luhn checksum.

    The credit card number can contain a period, hyphen, or space as delimiters. Multiple periods, hyphens, and spaces are allowed.

    The number formats that can trigger the dictionary are:

    • 491-6710478214413
    • 491-67...104782 14413
    • 47 1697--47625799...21
    • 36839-32 9773518
    • 53721 653983--86508
    • 4716 7361 13842732 (Valid length and can be validated by Luhn checksum)

    The number formats that do not trigger the dictionary are:

    • aa5269946762375011aa
    • 4716 7361 1384 2731 (Cannot be validated by the Luhn checksum)
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The credit card number is in a popular format.
    • The credit card number, the length, and starting range of the number match that of the credit card providers.

    The credit card number must use the same delimiters for the full number. Plus, the delimiters must be equally spaced.

    The number formats that can trigger the dictionary are:

    • 4716-9747-6257-9921
    • 5372165398386508
    • 4716 7361 1384 2732

    The number formats that do not trigger the dictionary are:

    • 4916:7104-7821 4413
    • 49167-104-7821-4413
    • a3683:9329:773-518a
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The credit card number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, Amex, MasterCard, Visa, CVV Code, CCV Number, select card type, Discover, Diners Club, jcb, pay with checking account, pay check money order, credit card number, card holder name, or expiration date.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 4716-9747-6257-9921
    • 5269:9467:6237:5011
    • 5372165398386508
    • 4716 7361 1384 2732

    A number like a3683:9329:773-518a does not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases.

    If a violation is enough to trigger a DLP policy and you have configured a DLP email notification, your auditors receive an email. If you have also included the ${DLPTRIGGERS} macro, the email includes what content triggered the violation.

    Close
  • This dictionary uses phrase matching to detect content related to diseases information. It does not use a checksum.

    You can specify an Action to configure how the dictionary evaluates matching disease names:

    • Count All: The dictionary counts all matches of the disease name, including identical disease names, toward the match count.
    • Count Unique: The dictionary counts each unique match of the disease name toward the match count only once, regardless of how many times the disease name appears.
    Close
  • This dictionary allows you to select driver's license dictionaries for one or more of the 50 U.S. states plus the District of Columbia.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold Criteria
    MediumThis dictionary counts an instance as a violation if the selected driver's license is matched.
    HighThe requirements of Medium Confidence are met and the driver's license is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, Drivers License, Drivers License number, DL#, or 2-letter state codes (e.g., CA for California or WA for Washington).

    The Proximity Length field appears when the dictionary’s Confidence Score Threshold is High. The proximity length defines how close a high confidence phrase must be to an instance of the pattern (that the dictionary detects) to count as a match. The phrase can be located in any direction from the pattern within the document. Enter a value from 0–10,000 bytes. A proximity length of 0 disables this option (i.e., the phrase can be any distance from the pattern).

    Close
  • This dictionary uses phrase matching to detect content related to drug information. It does not use a checksum.

    You can specify an Action to configure how the dictionary evaluates matching drug names:

    • Count All: The dictionary counts all matches of the drug name, including identical drug names, toward the match count.
    • Count Unique: The dictionary counts each unique match of the drug name toward the match count only once, regardless of how many times the drug name appears.
    Close
  • This dictionary detects content related to financial statements. It detects a financial document based on machine learning which clusters keywords that typically occur in financial documents. Examples of keywords are Accounts Payable, Common Stock, Current Liabilities, and so on.

    This dictionary does not use a checksum.

    You can modify the Confidence Score Threshold. Confidence scores inform the dictionary how high it must raise the bar, or threshold, for identifying violations and triggering them. To learn more, see Configuring the Confidence Score Threshold.

    Close
  • This dictionary detects content related to First Names from Japan.

    This dictionary does not use a checksum.

    The following are examples of acceptable formats for Japanese First Names:

    • 三郎
    • 清子
    • 美代子
    • 四郎

    You can also specify an Action to configure how the dictionary evaluates matching names:

    • Count All: The dictionary counts all matches of the name, including identical names, toward the match count.
    • Count Unique: The dictionary counts each unique match of the name toward the match count only once, regardless of how many times the name appears.
    Close
  • This dictionary detects Italian fiscal codes from Italy.

    The popular format for an Italian fiscal code is a 16-character alphanumeric code that has no delimiters.

    An example of a popular format is FFF-NNN-YYMDD-RRRRC. It has the following structure:

    • FFF: Represents the first three letters of the last name.
    • NNN: Represents the first three letters of the first name.
    • YYMDD: Represents the date of birth.
      • YY: Represents the year.
      • M: Represents the month. It is represented by a letter that maps to a month.
      • DD: Represents the date.
    • RRRR: Represents the town of birth. It is represented by alphnumeric characters.
    • C: Represents the checksum value. It is represented by a letter.

    This dictionary uses the Mod 26 Check Digit checksum (maps to a letter A-Z).

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    Low

    The Fiscal Code (Italy) dictionary is a regex based dictionary. It is more strict in formatting, even in low confidence. The special characters must be in the right positions. There could be a mix of different special characters, and there may even be consecutive special characters at the right positions, but they cannot be in the wrong positions.

    The dictionary counts an instance as a violation if the Italian fiscal code matches a valid range.

    The Italian fiscal code can contain only a period, hyphen, or space as delimiters. Multiple periods are allowed for a low confidence score.

    The number formats that can trigger the dictionary are:

    • RSS MRA 70A41 F205Z
    • KAYSAN92D03L246A
    • RSS MRA- ... 70A41F205Z
    • KAYSAN 92D03L246 ... A

    The number formats that do not trigger the dictionary are:

    • RSS MRA70A41F20 5Z
    • KAYSAN9 2D03L246A
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The Italian fiscal code is in a popular format.

    The number formats that can trigger the dictionary are:

    • RSS MRA 70A41 F205Z
    • KAYSAN92D03L246A

    The number formats that do not trigger the dictionary are:

    • RSS MRA- ... 70A41F205Z
    • KAYSAN 92D03L246 ... A
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The Italian fiscal code is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, codice fiscal, repubblica italiana, Italian fiscal code, or Italian tax code.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • RSS MRA 70A41 F205Z
    • KAYSAN92D03L246A

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • RSS MRA- ... 70A41F205Z
    • KAYSAN 92D03L246 ... A
    Close
  • This dictionary detects content related to Full Names from Japan.

    This dictionary does not use a checksum.

    For example, if the First Name is 富市, and the Last Name is 村山, the following are examples of acceptable formats for Japanese Full Names:

    • 村山富市 (no delimiters)
    • 村山 富市 (spaces as delimiter)
    • 村山 富市 (unicode space as delimiter)
    • 村山,富市 (ASCII comma as delimiter)

    You can also specify an Action to configure how the dictionary evaluates matching names:

    • Count All: The dictionary counts all matches of the name, including identical names, toward the match count.
    • Count Unique: The dictionary counts each unique match of the name toward the match count only once, regardless of how many times the name appears.
    Close
  • This dictionary detects content related to gambling. It detects documents that have keywords related to both online and physical gambling and betting.

    This dictionary does not use a checksum.

    You can modify the Confidence Score Threshold. Confidence scores inform the dictionary how high it must raise the bar, or threshold, for identifying violations and triggering them. To learn more, see Configuring the Confidence Score Threshold.

    Close
  • This dictionary detects Resident Identity Card numbers from China.

    The popular format for a Resident Identity Card number is an 18-character number, but the last character can either be a digit or the character X (case insensitive).

    The following are examples of popular formats:

    • NNNNNNNNNNNNNNNNNX
    • NNNNNNNNNNNNNNNNNN

    This dictionary uses the Mod 11 Check Digit checksum. This checksum is similar to the Luhn checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the Resident Identity Card number matches a valid range.

    The number formats that can trigger the dictionary are:

    • 36--2331198--00322524X
    • 3607 2719830115027X
    • 3425011992...07064058

    A number format like 1404011998vv05055835 does not trigger the dictionary. It contains alphabetical characters

    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The Resident Identity Card number is in a popular format.

    The number formats that can trigger the dictionary are:

    • 36072719830115027X
    • 342501199207064058

    The number formats that do not trigger the dictionary are:

    • a36233119800322524X (Starts with a character)
    • a1404011998vv05055835 (Starts with a character)
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The Resident Identity Card number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, Chinese identity card number.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 36233119800322524X
    • 36072719830115027X
    • 342501199207064058
    • 140401199805055835
    Close
  • This dictionary detects Hong Kong identity card (HKID) numbers from Hong Kong.

    The popular format for an HKID number is 8 or 9 alphanumeric characters without delimiters. It starts with either one or two alphabet letters, followed by 6 random digits, and ends with a checksum character that must be enclosed in parentheses. The checksum character can be either a digit or the letter "A" (case insensitive). For example, P553722(7), MR427885(6), and FK057839(A).

    This dictionary uses the Mod 11 Check Digit checksum. This checksum is similar to the Luhn checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    Low

    The dictionary counts an instance as a violation if the HKID number matches a valid range.

    HKID is a regex-based dictionary, and is strict in formatting, even in low confidence. No special characters are allowed other than parentheses.

    If the parentheses are not balanced, it does not trigger the dictionary. There can be only a single pair of parentheses. Nested parentheses do not trigger the dictionary.

    The number formats that can trigger the dictionary are:

    • P5537227
    • P553722(7)
    • P553722(A)

    The number formats that do not trigger the dictionary are:

    • P553722-7-
    • P553722(7
    • P5537227)
    • P553722((7))
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The HKID number is in a popular format.

    The number format that can trigger the dictionary is P553722(7).

    The number format that does not trigger the dictionary is P5537227.

    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The HKID number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, Hong Kong Identity Card, HKIC, HKID, Identity Card, or Hong Kong Permanent Resident ID Card.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • P553722(7)
    • P553722(A)
    Close
  • This dictionary detects National Identity Card (MyKad) numbers from Malaysia.

    The popular format for a MyKad number is a 12-digit number. The first group of numbers is the date of birth (YYMMDD). The second group of numbers (SS) represents the place of birth of the holder. The states (01-13), the federal territories (14-16), or the country of origin (60-85). The last group of numbers (###G) is a serial number in an unidentified pattern which is randomly generated. The last digit (G) is an odd number for a male and an even number for a female.

    This dictionary does not use a checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the MyKad number matches a valid range.

    The number formats that can trigger the dictionary are:

    • 13...0125281813
    • 261231129312
    • 170 726145727
    • @240921167814@
    • 16022--5148988

    A number format like 1007192dd33724 does not trigger the dictionary.

    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The MyKad number is in a popular format.

    The number formats that can trigger the dictionary are:

    • 261231129312
    • @240921167814@

    The number formats that do not trigger the dictionary are:

    • 13...0125281813
    • 170 726145727
    • 16022--5148988
    • A100719233724A
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The MyKad number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, mykad, Malaysian nric, or mypr.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 261231129312
    • @240921167814@
    • 100719233724

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 13...0125281813
    • 170 726145727
    • 16022--5148988
    Close
  • This dictionary detects National Identity Card numbers from Thailand.

    The popular format for a National Identity Card number is a 13-digit string in the format of N-NNNN-NNNNN-NN-N. In the Popular Format Checking process, it checks for 4 delimiters.

    This dictionary uses the Luhn variation checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    Low

    The dictionary counts an instance as a violation if the National Identity Card number matches a valid range.

    The National Identity Card number can contain only a period, hyphen, or space as delimiters.

    The number formats that can trigger the dictionary are:

    • @9082208241537@
    • 1-1964-43062-03-2
    • 5-40 42-291 43-14-9
    • 8161427577191
    • 3-5887-693...20-66-7

    A number format like 070ww7389561584 does not trigger the dictionary.

    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The National Identity Card number is in a popular format.

    The number formats that can trigger the dictionary are:

    • @9082208241537@
    • 1-1964-43062-03-2
    • 8161427577191

    The number formats that do not trigger the dictionary are:

    • 070ww7389561584
    • 5-40 42-291 43-14-9
    • 3-5887-693...20-66-7
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The National Identity Card number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, thailand identity card number, thailand national, date issue, or date expiry.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • @9082208241537@
    • 1-1964-43062-03-2
    • 8161427577191

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 07ss07389561584
    • 5-4042-291cc43-14-9
    Close
  • This dictionary detects content related to illegal drugs. It detects the names of illegal drugs such as Cocaine, Heroin, Ketamine, and so on.

    This dictionary does not use a checksum.

    You can modify the Confidence Score Threshold. Confidence scores inform the dictionary how high it must raise the bar, or threshold, for identifying violations and triggering them. To learn more, see Configuring the Confidence Score Threshold.

    Close
  • This dictionary detects immigration documents, like passport renewal forms, I-485, I-856, I-907, etc.

    Zscaler supports only the following document types for immigration documents: RTF, PDF, MSG, DOC, DOCX, DOCM, DOTX, DOTM, XLS, XLSX, XLSM, XLTM, PPT, PPTX, PPSX, PPTM, POTM, POTX, and IWORK (pages, numbers, and keynote). The minimum size required for a document is 1 KB.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold Criteria
    LowThis dictionary counts an instance as a violation if the Machine Learning (ML) match score is 50 or more.
    MediumThis dictionary counts an instance as a violation if the ML match score is 70 or more.
    HighThis dictionary counts an instance as a violation if the ML match score is 90 or more.
    Close
  • This dictionary detects Individual Taxpayer Registry ID numbers (CPF) from Brazil. If you use an Exact Data Match (EDM) Index Template to try and detect a CPF number, the format ddd.ddd.ddd-dd does not work because '.' (period) is an EDM delimiter. However, if you include this dictionary in a rule, the format ddd.ddd.ddd-dd is detected.

    The popular format for an Individual Taxpayer Registry ID number is different for individuals versus legal persons. For individuals, it is an 11-digit number with the last two numbers being the result of an arithmetic operation from the 9 previous ones. For legal persons, it is a 14-digit string formatted as XX.XXX.XXX/XXXX-XX. The first 8 digits identify the company, the four digits after the slash identify the branch or subsidiary, and the last 2 digits are the result of an arithmetic operation from the previous ones.

    This dictionary uses the Mod 11 Check Digit checksum. This checksum is similar to the Luhn checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    Low

    The dictionary counts an instance as a violation if the Individual Taxpayer Registry ID number matches a valid range.

    The Individual Taxpayer Registry ID number can contain:

    • A period, hyphen, or space as delimiters. Multiple periods, hyphens, and spaces are allowed.
    • An alphabetical boundary.

    The number formats that can trigger the dictionary are:

    • 088.258.987-38
    • 103.015.819-32

    The number formats that do not trigger the dictionary are:

    • 989.786.5232-27
    • 286.648.5a71-80
    • 345.543.66@5-02
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The Individual Taxpayer Registry ID number is in a popular format.

    The Individual Taxpayer Registry number can contain a non-alphanumeric boundary. It cannot be bound by alphabetical characters.

    The number formats that can trigger the dictionary are:

    • 286.648.571-80
    • 088.258.987-38
    • @345.543.665-02@
    • 103.015.819-32

    A number format like 989.786.523-2--7 does not trigger the dictionary.

    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The Individual Taxpayer Registry ID number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, Natural Persons Register or Registration Number.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 286.648.571-80
    • 088.258.987-38
    • @345.543.665-02@
    • 103.015.819-32

    A number format like 989.786.523-2--7 does not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases.

    Close
  • This dictionary detects insurance documents, like employee insurance, home insurance, commercial insurance, medical insurance, etc.

    Zscaler supports only the following document types for insurance documents: RTF, PDF, MSG, DOC, DOCX, DOCM, DOTX, DOTM, XLS, XLSX, XLSM, XLTM, PPT, PPTX, PPSX, PPTM, POTM, POTX, and IWORK (pages, numbers, and keynote). The minimum size required for a document is 1 KB.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold Criteria
    LowThis dictionary counts an instance as a violation if the Machine Learning (ML) match score is 50 or more.
    MediumThis dictionary counts an instance as a violation if the ML match score is 70 or more.
    HighThis dictionary counts an instance as a violation if the ML match score is 90 or more.
    Close
  • This dictionary allows you to select IBAN dictionaries for one or more of the following countries:

    • Andorra (AD)
    • Austria (AT)
    • Bosnia (BA)
    • Belgium (BE)
    • Bulgaria (BG)
    • Switzerland (CH)
    • Cyprus (CY)
    • Czechia (CZ)
    • Germany (DE)
    • Denmark (DK)
    • Estonia (EE)
    • Spain (ES)
    • Finland (FI)
    • Faroe Islands (FO)
    • France (FR)
    • United Kingdom (GB)
    • Gibraltar (GI)
    • Greenland (GL)
    • Greece (GR)
    • Croatia (HR)
    • Hungary (HU)
    • Ireland (IE)
    • Israel (IL)
    • Iceland (IS)
    • Italy (IT)
    • Liechtenstein (LI)
    • Lithuania (LT)
    • Luxembourg (LU)
    • Latvia (LV)
    • Monaco (MC)
    • Montenegro (ME)
    • Malta (MT)
    • Netherlands (NL)
    • North Macedonia (MK)
    • Norway (NO)
    • Poland (PL)
    • Portugal (PT)
    • Romania (RO)
    • Serbia (RS)
    • Sweden (SE)
    • Slovenia (SI)
    • Slovakia (SK)
    • San Marino (SM)
    • Tunisia (TN)
    • Turkey (TR)

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold Criteria
    MediumThis dictionary counts an instance as a violation if any of the selected IBAN numbers are matched.
    HighThis dictionary counts an instance as a violation if any of the selected IBAN numbers are matched by any of the dictionary’s default or custom high confidence phrases.
    Close
  • This dictionary detects invoice documents, like Bill of Sale forms, purchase orders, etc.

    Zscaler supports only the following document types for invoice documents: RTF, PDF, MSG, DOC, DOCX, DOCM, DOTX, DOTM, XLS, XLSX, XLSM, XLTM, PPT, PPTX, PPSX, PPTM, POTM, POTX, and IWORK (pages, numbers, and keynote). The minimum size required for a document is 1 KB.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold Criteria
    LowThis dictionary counts an instance as a violation if the Machine Learning (ML) match score is 50 or more.
    MediumThis dictionary counts an instance as a violation if the ML match score is 70 or more.
    HighThis dictionary counts an instance as a violation if the ML match score is 90 or more.
    Close
  • This dictionary detects content related to Last Names from Japan.

    This dictionary does not use a checksum.

    The following are examples of acceptable formats for Japanese Last Names:

    • 村山
    • 佐久間
    • 平山
    • 五十嵐
    • 佐藤
    • 鈴木

    You can also specify an Action to configure how the dictionary evaluates matching names:

    • Count All: The dictionary counts all matches of the name, including identical names, toward the match count.
    • Count Unique: The dictionary counts each unique match of the name toward the match count only once, regardless of how many times the name appears.
    Close
  • This dictionary detects medical documents, like medical consent forms, HIPAA forms, medical record forms, etc.

    Zscaler supports only the following document types for medical documents: RTF, PDF, MSG, DOC, DOCX, DOCM, DOTX, DOTM, XLS, XLSX, XLSM, XLTM, PPT, PPTX, PPSX, PPTM, POTM, POTX, and IWORK (pages, numbers, and keynote). The minimum size required for a document is 1 KB.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold Criteria
    LowThis dictionary counts an instance as a violation if the Machine Learning (ML) match score is 50 or more.
    MediumThis dictionary counts an instance as a violation if the ML match score is 70 or more.
    HighThis dictionary counts an instance as a violation if the ML match score is 90 or more.
    Close
  • This dictionary detects content related to medical information. It detects the drug and disease names based on the ICD 10 code.

    This dictionary does not use a checksum.

    You can modify the Confidence Score Threshold. Confidence scores inform the dictionary how high it must raise the bar, or threshold, for identifying violations and triggering them. To learn more, see Configuring the Confidence Score Threshold.

    Close
  • This dictionary detects Medicare numbers from Australia.

    The popular format for a Medicare number is an 11-digit number. In the Popular Format Checking process, the Mod 10 checksum is performed and the number has to pass the checksum.

    This dictionary uses the Mod 10 checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the Medicare number matches a valid range.

    The number formats that can trigger the dictionary are:

    • @6014812406@
    • 59...89916...497
    • 39--388984 01

    A number format like 24//2877813//2 does not trigger the dictionary.

    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The Medicare number is in a popular format.

    The number formats that can trigger the dictionary are:

    • @6014812406@
    • 5989916497
    • 3938898401

    A number format like A2428778132B does not trigger the dictionary.

    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The Medicare number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, bank account details, medicare payments, mortgage account, bank payments, information branch, credit card loan, department human services, medicare, or medi care.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • @6014812406@
    • 5989916497
    • 3938898401

    A number format like A2428778132B does not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases.

    Close
  • This dictionary detects Mexico Unique Population Registration Code numbers.

    This dictionary uses the Mod 11 checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the Mexico Unique Population Registration Code number matches a valid range.

    The number formats that can trigger the dictionary are:

    • HEGG.560427MVZRRL04
    • HEGG 560427MVZRRL04
    • HEGG-560427MVZRRL04
    • HEGG.- 560427MVZRRL04
    • HEGG560427 MVZRRL04
    • HEGG560427.MVZRRL04
    • HEGG560427-MVZRRL04
    • HEGG560427 -MVZRRL04
    • HEGG 560427-MVZRRL04

    The number formats that do not trigger the dictionary are:

    • HEGG560427MVZRRL03 (bad checksum)
    • HE GG560427MVZRRL04 (unpopular format; doesn't match regex)
    • HEGG560427MVZRRL 04 (unpopular format; doesn't match regex)
    • HE GG 560427 MVZRRL 04 (unpopular format; doesn't match regex)
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The Mexico Unique Population Registration Code number is in a popular format.

    The number formats that can trigger the dictionary are:

    • HEGG560427MVZRRL04
    • hEGG560427MVZRRL04
    • hegg560427mvzrrl04

    A number format like HEGG-560427MVZRRL04 does not trigger the dictionary.

    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The Mexico Unique Population Registration Code number is accompanied by any of the dictionary’s default high confidence phrases. For example, Clave Única de Registro de Población, CURP, clave única, ClaveÚnica#, and clavepersonalIdentidad#.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • HEGG560427MVZRRL04
    • hEGG560427MVZRRL04
    • hegg560427mvzrrl04

    A number format like HEGG-560427MVZRRL04 does not trigger the dictionary.

    Close
  • This dictionary detects My Numbers (also referred to as Individual Numbers) from Japan.

    You can modify the Confidence Score Threshold:

    • Low: The dictionary counts an instance as a violation if it matches a valid range.
    • Medium: The dictionary counts an instance as a violation if:
      • The requirements of Low Confidence are met.
      • The My Number is in a popular format.
    • High: The dictionary counts an instance as a violation if:
      • The requirements of Medium Confidence are met.
      • The My Number is accompanied by any of the dictionary’s default or custom high confidence phrases.
    Close
  • This dictionary detects content related to names from Canada.

    This dictionary does not use a checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold Criteria
    LowThe dictionary counts an instance as a violation if the content contains either a first name or a last name that is in the dictionary.
    MediumThe dictionary counts an instance as a violation if the content contains a first name and last name that is in the dictionary.

    You can also specify an Action to configure how the dictionary evaluates matching names:

    • Count All: The dictionary counts all matches of the name, including identical names, toward the match count.
    • Count Unique: The dictionary counts each unique match of the name toward the match count only once, regardless of how many times the name appears.
    Close
  • This dictionary detects content related to names from Spain.

    This dictionary does not use a checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold Criteria
    LowThe dictionary counts an instance as a violation if the content contains either a first name or a last name that is in the dictionary.
    MediumThe dictionary counts an instance as a violation if the content contains a first name and last name that is in the dictionary.

    You can also specify an Action to configure how the dictionary evaluates matching names:

    • Count All: The dictionary counts all matches of the name, including identical names, toward the match count.
    • Count Unique: The dictionary counts each unique match of the name toward the match count only once, regardless of how many times the name appears.
    Close
  • This dictionary detects content related to names from the United States. It detects first and last names.

    This dictionary does not use a checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold Criteria
    LowThe dictionary counts an instance as a violation if the content contains either a first name or a last name that is in the dictionary.
    MediumThe dictionary counts an instance as a violation if the content contains a first name and last name that is in the dictionary.

    You can also specify an Action to configure how the dictionary evaluates matching names:

    • Count All: The dictionary counts all matches of the name, including identical names, toward the match count.
    • Count Unique: The dictionary counts each unique match of the name toward the match count only once, regardless of how many times the name appears.
    Close
  • This dictionary detects the 9- or 14-digit Polish National Economic Registry Number (REGON) number.

    This dictionary uses the Mod 11 checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the REGON number matches a valid range.

    The number formats that can trigger the dictionary are:

    • 13 3456783
    • 13345678 3
    • 13 345678-3
    • 13 345678312340
    • 13 34567831234-0
    • 13 4567831234 /-0

    A number format like 133456788 does not trigger the dictionary (bad checksum).

    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The REGON number is in a popular format.

    The number formats that can trigger the dictionary are:

    • 133456783
    • 13 345678 3
    • 13345678312340
    • 13/34567831234/0

    The number formats that do not trigger the dictionary are:

    • 13 3456783
    • 13345678 3
    • 13 345678-3
    • 13 345678312340
    • 13 34567831234-0
    • 13 4567831234 /-0
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The REGON number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, regon.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 133456783
    • 13 345678 3
    • 13345678312340
    • 13/34567831234/0

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 13 3456783
    • 13345678 3
    • 13 345678-3
    • 13 345678312340
    • 13 34567831234-0
    • 13 4567831234 /-0
    Close
  • This dictionary detects New Zealand National Health Index Numbers (NZNHIN).

    NZNHINs have two formats, both of which are undelimited with no special characters in between:

    • Old format: AAANNNC
    • New format: AAANNAC

    The old format uses Modulo 11 checksum; the new format uses the Modulo 24 checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the NZNHIN number matches a valid range.

    A number format like WLD-9413 can trigger the dictionary.

    The number formats that do not trigger the dictionary are:

    • WL-D9413 (doesn't match regex)
    • WLD9-413 (doesn't match regex)
    • WLD94-13 (doesn't match regex)
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The NZNHIN number is in a popular format.

    The number formats that can trigger the dictionary are:

    • WLD9413
    • aDh48zj

    A number format like WLD-9413 does not trigger the dictionary.

    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The NZNHIN number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, National Health Index Number, National Health Index Num, NHI number, or NHI#.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • WLD9413
    • aDh48zj

    A number format like WLD-9413 does not trigger the dictionary.

    Close
  • This dictionary detects National Health Service (NHS) numbers from the United Kingdom.

    The popular format for an NHS number is a 10-digit number formatted as NNN <delimiter>NNN<delimiter>NNNN. The delimiters can be a period, hyphen, or space.

    This dictionary uses the Mod 11 Check Digit checksum. This checksum is similar to the Luhn checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    Low

    The dictionary counts an instance as a violation if the NHS number matches a valid range.

    The NHS number can contain:

    • A period, hyphen, or space as delimiters. Multiple periods, hyphens, and spaces are allowed.
    • An non-alphanumeric boundary.

    The number formats that can trigger the dictionary are:

    • 566 8018326
    • 5431043544
    • 808619--0226
    • 5878649888
    • 130323...3851
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The NHS number is in a popular format.

    The NHS number can contain a non-alphanumeric boundary. It cannot be bound by alphabetical characters.

    The NHS number must use the same delimiters for the full number.

    The number formats that can trigger the dictionary are:

    • @5878649888@
    • 1303233851

    The number formats that do not trigger the dictionary are:

    • 5@668018326
    • 543!1043!544
    • A8086190226A
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The NHS number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, NHS Number or National Health Services Number.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 5668018326
    • 5431043544
    • A8086190226A
    • @5878649888@
    • 1303233851
    Close
  • This dictionary detects National Identification Card numbers from Taiwan.

    The popular format for a National Identification Card number is a 10-digit string that contains one letter and 9 digits. It can also contain a period, hyphen, or space as delimiters.

    This dictionary uses the Luhn checksum but the leading alphabet letter gets converted to a numerical equivalent.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    Low

    The dictionary counts an instance as a violation if the National Identification Card number matches a valid range.

    The National Identification Card number can contain a period, hyphen, or space as delimiters. Multiple periods, hyphens, and spaces are allowed.

    The National Identification Card number can not contain a delimiter between the first alphabet letter and the next number.

    The number formats that can trigger the dictionary are:

    • D146. 665-645
    • --I145639659
    • !!F172388317VV
    • Z258578175DD

    The number formats that do not trigger the dictionary are:

    • Z...222063149
    • !!X14234881733
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The National Identification Card number is in a popular format.

    The National Identification Card number can only have a non-alphanumeric character for the end delimiter.

    The number formats that can trigger the dictionary are:

    • I145639659
    • !!X142348817@@
    • Z258578175

    The number formats that do not trigger the dictionary are:

    • Z...222063149
    • F172388317VV
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The National Identification Card number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, taiwanese national identification card number.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • I145639659
    • Z222063149
    • !!X142348817@@
    • Z258578175

    A number format like F172388317VV does not trigger the dictionary if accompanied by any of the dictionary's default or custom high confidence phrases.

    Close
  • This dictionary detects National Institute of Statistics and Economic Studies (INSEE) numbers from France.

    The popular format for an INSEE number is a 15-digit number with the first 13 digits and last 2 digits separated by a space.

    This dictionary uses the Mod by 97 checksum. The last two digits are check digits.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    Low

    The dictionary counts an instance as a violation if the INSEE number matches a valid range.

    The INSEE number can contain:

    • A period, hyphen, or space as delimiters. Multiple periods, hyphens, and spaces are allowed.
    • An alphabetical boundary.

    The number formats that can trigger the dictionary are:

    • @113103115324014@
    • 127...1139450--100 58
    • #23--0036713141980#
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The INSEE number is in a popular format.

    The INSEE number can contain:

    • A non-alphanumeric boundary.
    • Only one space between N(13) and N(2).

    The INSEE number can not contain a period or a hyphen as a delimiter.

    The number formats that can trigger the dictionary are:

    • 2430966952225 59
    • 1160568707959 73
    • %1271139450100 58$
    • #2300367131419 80#

    The number formats that do not trigger the dictionary are:

    • A138057773213587c
    • 113103115324014
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The INSEE number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, insee, national id, national identification, social security number, social security code, and social insurance number.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 2430966952225 59
    • 1160568707959 73
    Close
  • This dictionary detects National Identification (PESEL) numbers from Poland.

    The popular format for a PESEL number is an 11-digit number. The PESEL number has the form of YYMMDDZZZXQ, where YYMMDD is the date of birth (with century encoded in month field), ZZZX is the personal identification number where X codes the sex (even number for females and odd number for males), and Q is a check digit which is used to verify whether a given PESEL is correct or not.

    This dictionary uses the Luhn checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the PESEL number matches a valid range.

    The number formats that can trigger the dictionary are:

    • .14.1017...04491
    • -82-12270--5195-
    • 971 0050 7364

    The number formats that do not trigger the dictionary are:

    • 2302130d3028
    • (55111304237)
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The PESEL number is in a popular format.

    The number formats that can trigger the dictionary are:

    • 97100507364
    • @23021303028$

    The number formats that do not trigger the dictionary are:

    • .14.1017...04491
    • -82-12270--5195-
    • A55111304237G
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The PESEL number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, pesel liczba or peselliczba.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 14101704491
    • 82122705195
    • 97100507364
    • @23021303028$
    • 55111304237
    Close
  • This dictionary detects National Identity Card numbers (DNI) from Spain.

    The popular format for a National Identity Card number is a 9-character number with 8 digits and one letter for security.

    This dictionary uses the Mod by 23 checksum. This checksum is similar to the Luhn checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    Low

    The dictionary counts an instance as a violation if the National Identity Card number matches a valid range.

    The National Identity Card number can contain:

    • A period, hyphen, or space as delimiters. Multiple periods, hyphens, and spaces are allowed.
    • An alphanumeric character as a lead boundary.

    The number formats that can trigger the dictionary are:

    • 20---222624N
    • a22369319W

    The number formats that do not trigger the dictionary are:

    • 0 1311947@G
    • 33052840-T
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The National Identity Card number is in a popular format.

    The National Identity Card number can contain an alphabetical character as an ending boundary.

    The National Identity Card number can not have a delimiter between the last alphabet character and the number.

    The number formats that can trigger the dictionary are:

    • 01311947G
    • 20222624N
    • 33052840T

    A number format like a22369319W does not trigger the dictionary.

    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The National Identity Card number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, national identification number, national identity number, insurance number, personal identification number, national identity, personal identity no, or unique identity number.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 20222624N
    • 22369319W
    • 33052840T

    A number format like d01311947G does not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases.

    Close
  • This dictionary detects National Insurance Numbers (NINO) from the United Kingdom.

    The popular format for a NINO is a 9-character number. The format of the number is two prefix letters, 6 digits, and one suffix letter. Either of the first two prefix letters cannot be D, F, I, Q, U, or V. The second letter cannot also be O. The prefixes BG, GB, NK, KN, TN, NT, and ZZ are not allocated.

    This dictionary does not use a checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the NINO matches a valid range.

    The number formats that can trigger the dictionary are:

    • AA-123456.C
    • @#AA876589C@
    • BB...123456...D

    A number format like RR65------3456 does not trigger the dictionary.

    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The NINO is in a popular format.

    The number formats that can trigger the dictionary are:

    • AA123456C
    • AA876589C

    The number formats that do not trigger the dictionary are:

    • BB123456...D
    • AA-127456.G
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The NINO is accompanied by any of the dictionary’s default high confidence phrases. For example, national insurance number or national insurance.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • AA123465C
    • RK765432C

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • rh123456U
    • AA56432D
    Close
  • This dictionary detects National Provider Identifier (NPI) numbers from the United States.

    This dictionary uses the Luhn checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the NPI number matches a valid range.

    The number formats that can trigger the dictionary are:

    • 123 456 7893
    • 12.345-6789 3
    • 1 -23/ .456 78...- 9 3

    A number format like 1234567894 does not trigger the dictionary (bad checksum).

    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The NPI number is in a popular format.

    A number format like 1234567893 can trigger the dictionary.

    The number formats that do not trigger the dictionary are:

    • 123 456 7893
    • 12.345-6789 3
    • 1 -23/ .456 78...- 9 3
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The NPI number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, national provider identifier number, national provider identifier, and npi.

    A number format like 1234567893 can trigger the dictionary.

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 123 456 7893
    • 12.345-6789 3
    • 1 -23/ .456 78...- 9 3
    Close
  • This dictionary detects content related to National Drug Code (NDC) package codes.

    This dictionary does not use a checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if it matches a key in the hash table.

    The number formats that can trigger the dictionary are:

    • 0002080001
    • 0990-925739
    • 10014001-08
    • 99528---606--45

    The number formats that do not trigger the dictionary are:

    • 000-20800-01 (first hyphen at wrong position)
    • 0990-925-739 (second hyphen at wrong position)
    • 0002-08-00-01 (too many non-adjacent hyphens)
    • 99528---6-06--45 (too many non-adjacent hyphens)
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The NDC package code is in a popular format.

    The number formats that can trigger the dictionary are:

    • 0002-0800-01
    • 10014-001-08
    • 86157-0014-1

    The number formats that do not trigger the dictionary are:

    • 0002080001
    • 0990-925739
    • 10014001-08
    • 99528---606--45

    You can also specify an Action to configure how the dictionary evaluates matching package codes:

    • Count All: The dictionary counts all matches of the package code, including identical codes, toward the match count.
    • Count Unique: The dictionary counts each unique match of the package code toward the match count only once, regardless of how many times the code appears.
    Close
  • This dictionary detects content related to National Drug Code (NDC) product codes.

    This dictionary does not use a checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if it matches a key in the hash table.

    The number formats that can trigger the dictionary are:

    • 00020800
    • 0990--9257

    The number formats that do not trigger the dictionary are:

    • 000-20800 (hyphen at wrong position)
    • 00020-800 (hyphen at wrong position)
    • 0990-925-7 (too many non-adjacent hyphens)
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The NDC package code is in a popular format.

    The number formats that can trigger the dictionary are:

    • 0002-0800
    • 10014-001
    • 86157-0014

    The number formats that do not trigger the dictionary are:

    • 00020800
    • 0990--9257

    You can also specify an Action to configure how the dictionary evaluates matching product code:

    • Count All: The dictionary counts all matches of the product code, including identical codes, toward the match count.
    • Count Unique: The dictionary counts each unique match of the product code toward the match count only once, regardless of how many times the code appears.
    Close
  • This dictionary detects National Registration Identity Card Numbers (UIN and FIN) from Singapore.

    The popular format for a National Registration Identity Card number is a 9-character number that includes 7 digits, with the first character being either S, T, F, or G, and the last character is a checksum calculated with respect to the first letter and the 7 digits.

    This dictionary uses the Mod 11 Check Digit checksum. This checksum is similar to the Luhn checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    Low

    The dictionary counts an instance as a violation if the National Registration Identity Card number matches a valid range.

    The National Identity Card number can contain a period, hyphen, or space as delimiters. Multiple periods, hyphens, and spaces are allowed.

    The number formats that can trigger the dictionary are:

    • T156--2546A
    • G545...8130R
    • aS8879619Ed

    The number formats that do not trigger the dictionary are:

    • F2d397111U
    • T0v75 8616C
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The National Registration Identity Card number is in a popular format.

    The National Registration Identity Card number can not contain any delimiters.

    The boundary check is not performed for the National Registration Identity Card number as both the start and end have a character.

    The number formats that can trigger the dictionary are:

    • T1562546A
    • G5458130R
    • aS8879619Ed

    The number formats that do not trigger the dictionary are:

    • F2d397111U
    • T0v758616C
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The National Registration Identity Card number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, nric, national registration identity card, guin, fin, passport number, or birth certificate.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • T1562546A
    • G5458130R
    • F2397111U
    • T0758616C
    • aS8879619Ed
    Close
  • This dictionary allows you to select passport number dictionaries for one or more of the following countries.

    • China (CN)
    • Japan (JP)
    • South Korea (KR)
    • Malaysia (MY)
    • Philippines (PH)
    • Singapore (SG)
    • Taiwan (TW)
    • Turkey (TR)

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold Criteria
    MediumThis dictionary counts an instance as a violation if any of the selected passport numbers are matched.
    HighThis dictionary counts an instance as a violation if any of the selected passport numbers are matched by any of the dictionary’s default or custom high confidence phrases.
    Close
  • This dictionary allows you to select passport number dictionaries for one or more European Union countries.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold Criteria
    MediumThis dictionary counts an instance as a violation if the selected country's passport number is matched.
    HighThe requirements of Medium Confidence are met and the passport number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, Passport, Passport number, passport#.
    Close
  • This dictionary detects Personal Identification Number (Croatia) numbers.

    This dictionary uses the ISO/IEC 7064, Mod 11,10 checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the PIN (Croatia) number matches a valid range.

    The number formats that can trigger the dictionary are:

    • hr 94577403194
    • hr.- 12345678911
    • HR..9942692209 6

    The number formats that do not trigger the dictionary are:

    • HR69435151531 (bad checksum)
    • H R69435151538 (unpopular format; doesn't match regex)
    • HR69435 151538 (unpopular format; doesn't match regex)
    • 8684 9310 961 (unpopular format; doesn't match regex)
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The PIN (Croatia) number is in a popular format.

    The number formats that can trigger the dictionary are:

    • HR69435151531
    • HR 69435151531
    • 24631579813

    The number formats that do not trigger the dictionary are:

    • 94 577 403194
    • hr.-12345678911
    • HR69435 151538
    • hr.- 12345678911
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The PIN (Croatia) number is accompanied by any of the dictionary’s default high confidence phrases. For example, OIB, Osobni identifikacijski broj, and Personal identification number.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • HR69435151531
    • HR 69435151531
    • 24631579813

    The number formats that do not trigger the dictionary are:

    • 94 577 403194
    • hr.-12345678911
    • HR69435 151538
    • hr.- 12345678911
    Close
  • This dictionary detects real estate documents, like personal or commercial lease agreements, property buying or selling agreements, etc.

    Zscaler supports only the following document types for real estate documents: RTF, PDF, MSG, DOC, DOCX, DOCM, DOTX, DOTM, XLS, XLSX, XLSM, XLTM, PPT, PPTX, PPSX, PPTM, POTM, POTX, and IWORK (pages, numbers, and keynote). The minimum size required for a document is 1 KB.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold Criteria
    LowThis dictionary counts an instance as a violation if the Machine Learning (ML) match score is 50 or more.
    MediumThis dictionary counts an instance as a violation if the ML match score is 70 or more.
    HighThis dictionary counts an instance as a violation if the ML match score is 90 or more.
    Close
  • This dictionary detects Resident Registration Numbers (RRN) from South Korea.

    The popular format for a RRN is a 13-digit number with each digit providing specific information.

    This dictionary uses the Luhn variation checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    Low

    The dictionary counts an instance as a violation if the RNN matches a valid range.

    The RNN can contain:

    • A period, hyphen, or space as delimiters. Multiple periods are allowed.
    • An alphabetical boundary.

    The number formats that can trigger the dictionary are:

    • 9 70403-2966211
    • 780220-1296377
    • #840...719-2145299@
    • D921009-5664079D

    A number format like 720590-2208919 does not trigger the dictionary because it fails the popular format check.

    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The RRN is in a popular format.

    The RRN can contain:

    • A non-alphanumeric boundary.
    • Only a hyphen as a delimiter.

    The number formats that can trigger the dictionary are:

    • 970403-2966211
    • !780220-1296377@

    The number formats that do not trigger the dictionary are:

    • 720590-2208919
    • #840...719-2145299@
    • D921009-5664079D
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The RNN is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, korean resident registration number.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 970403-2966211
    • !780220-1296377@
    • 890320-1104929
    • 840719-2145299
    • 921009-5664079
    • 940219-5027845
    Close
  • This dictionary detects resume documents.

    Zscaler supports only the following document types for resume documents: RTF, PDF, MSG, DOC, DOCX, DOCM, DOTX, DOTM, XLS, XLSX, XLSM, XLTM, PPT, PPTX, PPSX, PPTM, POTM, POTX, and IWORK (pages, numbers, and keynote). The minimum size required for a document is 1 KB.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold Criteria
    LowThis dictionary counts an instance as a violation if the Machine Learning (ML) match score is 50 or more.
    MediumThis dictionary counts an instance as a violation if the ML match score is 70 or more.
    HighThis dictionary counts an instance as a violation if the ML match score is 90 or more.
    Close
  • This dictionary detects content related to Salesforce.com data. It detects keywords that are embedded in a Saleforce report, such as Copyright and Saleforce.com.

    This dictionary does not use a checksum.

    You can modify the Confidence Score Threshold. Confidence scores inform the dictionary how high it must raise the bar, or threshold, for identifying violations and triggering them. To learn more, see Configuring the Confidence Score Threshold.

    Close
  • The Self-Harm and Cyberbulling dictionary is intended for use by K-12 educators. If you’re an educator, contact your Zscaler Account team to enable this dictionary.

    This dictionary detects when students use public search engines to search for self-harm-related content, or when they post bullying-related content on social media.

    You can modify the Confidence Score Threshold. Confidence scores inform the dictionary how high it must raise the bar, or threshold, for identifying violations and triggering them. To learn more, see Configuring the Confidence Score Threshold.

    Close
  • This dictionary detects Social Insurance Numbers (SIN) from Canada.

    The popular format for an SIN is a 9-digit number, separated at the 3rd and 6th digits by a delimiter. The delimiter can be a period, hyphen, or space.

    The following are examples of popular formats:

    • NNNNNNNNN
    • NNN-NNN-NNN

    This dictionary uses the Luhn checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    Low

    The dictionary counts an instance as a violation if the SIN matches a valid range.

    The SIN can contain only a period, hyphen, or space as delimiters. Multiple periods, hyphens, and spaces are allowed.

    The number formats that can trigger the dictionary are:

    • CC036964 310CC
    • @054120 373$
    • @620301 192%
    • AA021574 728

    A number format like 650------72...aa...7266 does not trigger the dictionary.

    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The SIN is in a popular format.

    The SIN can contain delimiters only after the 3rd and 6th digits. The delimiters must be the same character. It cannot contain different delimiters.

    The SIN can have a boundary but the characters immediately before and after the SIN cannot be alpha-numeric.

    The number formats that can trigger the dictionary are:

    • 054120373
    • 650-727-266

    The number formats that do not trigger the dictionary are:

    • @620 301 192
    • AA021574 728
    • 650-72.7266
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The SIN is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, social insurance number or national identification number.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 036 964 310
    • 054120373
    • @620 301 192
    • 650-72.7266
    • 369 893 144

    A number format like AA021574 728 does not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases.

    Close
  • This dictionary detects Social Security numbers (ATSSN) from Austria.

    This dictionary uses the Modulo 11 checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the ATSSN number matches a valid range.

    The number formats that can trigger the dictionary are:

    • 123 701 0180
    • 1237--010180
    • 1 -23/.701 -. 0 ... -180

    A number format like 1238010180 does not trigger the dictionary (bad checksum).

    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The ATSSN number is in a popular format.

    The number formats that can trigger the dictionary are:

    • 1237010180
    • 1237 010180

    The number formats that do not trigger the dictionary are:

    • 123 701 0180
    • 1237--010180
    • 1 -23/.701 -. 0 ... -180
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The ATSSN number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, sozialversicherungsnummer, Austria SSN, soziale sicherheit kein, sozialversicherungsnummer#, and sozialesicherheitkein#.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 1237010180
    • 1237 010180

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 123 701 0180
    • 1237--010180
    • 1 -23/.701 -. 0 ... -180
    Close
  • This dictionary detects social security numbers from Spain.

    You can modify the confidence score threshold:

    • Low: The dictionary counts an instance as a violation if it matches a valid range.
    • Medium: The dictionary counts an instance as a violation if:
      • The requirements of Low Confidence are met.
      • The social security number is in a popular format.
    • High: The dictionary counts an instance as a violation if:
      • The requirements of Medium Confidence are met.
      • The social security number is accompanied by any of the dictionary’s default or custom high confidence phrases.
    Close
  • This dictionary detects social security numbers (AHV) from Switzerland.

    The popular format for an AHV number is a 13-digit number. The AHV number has the form of 756.XXXX.XXXX.XY, where 756 is the ISO 3166-1 code for Switzerland, XXXX.XXXX.X is a random number, and Y is an EAN-13 check digit.

    This dictionary uses the Luhn variation checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the AHV number matches a valid range.

    The number formats that can trigger the dictionary are:

    • 756.52--30.6913.27
    • 756.88 85.8748.24
    • 756.3594.2833.18
    • @7.56.7811.8967.63@

    A number format like 756.2...487.40cc93.09 does not trigger the dictionary.

    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The AHV number is in a popular format.

    The number formats that can trigger the dictionary are:

    • 756.3594.2833.18
    • 756.2487.4093.09
    • @7.56.7811.8967.63@

    The number formats that do not trigger the dictionary are:

    • 756.52--30.6913.27
    • 756.88 85.8748.24
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The AHV number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, identifiant national, insurance number, national identifier, national insurance number, AHV number, AHV-Nummer, Personenidentifikationsnummer, Schweizer Registrierungsnummer, Swiss registration number, or AVS.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 756.3594.2833.18
    • @7.56.7811.8967.63@

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 756.52--30.6913.27
    • 756.88 85.8748.24
    • A756.2487.4093.09A
    Close
  • This dictionary detects Social Security Numbers (SSN) from the United States.

    When you add the Social Security Numbers (US) dictionary to a DLP engine, you must configure the match count. To learn more, see Configuring the Match Count.

    The popular format for an SSN is a 9-digit number.

    The following are examples of popular formats:

    • NNN-NN-NNNN
    • NNNNNNNNN
    • NNN NN NNNN

    This dictionary does not use a checksum.

    You can also specify whether to include or exclude specific SSNs. You can configure up to 512 SSNs to include or exclude. To learn more, see Editing Predefined DLP Dictionaries.

    To use this dictionary to detect SSNs, the following guidelines apply:

    These guidelines might not apply to certain use cases. You can adjust your dictionary configurations according to your DLP needs.

    • To catch a small exfiltration of numbers in documents:
      • Set a Confidence Score Threshold of High for the dictionary.
      • When adding a dictionary to an engine, set a low value for the dictionary's match count.
    • To catch a large exfiltration of numbers in spreadsheets or other file types:
      • Set a Confidence Score Threshold of Medium for the dictionary.
      • When adding a dictionary to an engine, set a high value for the dictionary’s match count.
    • In general, configuring a dictionary with a Low Confidence Score Threshold and a low match count value results in too many false positives. Zscaler recommends setting a Confidence Score Threshold of High when its match count value is low.
    • Rich Text Format (RTF) files contain formatting code that can mimic credit card and social security numbers, affecting when a DLP rule is triggered. Plain text files do not contain this formatting code, therefore the DLP rule triggers as expected. So that the DLP policy triggers if confidential numbers are leaked in RTF files, do one of the following steps:
      • Set any value greater than 1 for the dictionary’s match count in the engine.
      • Set a Confidence Score Threshold value of High.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    Low

    The dictionary counts an instance as a violation if:

    • The SSN is issued regardless of formatting. You can confirm that the group number is within the range issued for a given area at https://www.ssa.gov/employer/ssnweb.htm.
    • The SSN matches a valid range.

    The number formats that can trigger the dictionary are:

    • SA_332831997@@##
    • SA408456050@
    • 15...21----587---44
    • ---23527 1165
    • ...688781663...
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The SSN number is in a popular format.

    The number formats that can trigger the dictionary are:

    • SA_332831997@@
    • SA@408456050@
    • ...688781663...
    • ??292108209//
    • 269865658

    The number formats that do not trigger the dictionary are:

    • 15...21----587---44
    • ---23527 1165
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The SSN is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, date of birth, social security number, or tax payer.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • SA_332831997@@
    • 408456050
    • 292108209//
    • 269865658

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 15...21----587---44
    • ---23527 1165

    If a violation is enough to trigger a DLP policy and you have configured a DLP email notification, your auditors receive an email. If you have also included the ${DLPTRIGGERS} macro, the email includes what content triggered the violation.

    Close
  • This dictionary detects content related to source code. It detects typical source code and programming phrases such as class implements, class extends, and private void.

    This dictionary does not trigger unless the data size of the detected content is at least 1 KB.

    This dictionary does not use a checksum.

    You can modify the Confidence Score Threshold. Confidence scores inform the dictionary how high it must raise the bar, or threshold, for identifying violations and triggering them. To learn more, see Configuring the Confidence Score Threshold.

    Close
  • This dictionary detects Standardized Bank Code numbers (CLABE) from Mexico.

    The popular format for a Standardized Bank Code number is an 18-character number. The Standardized Bank Code number has the following structure:

    • 3 digits: Bank Code
    • 3 digits: Brach Office Code
    • 11 digits: Account Number
    • 1 digit: Control Digit

    This dictionary uses the Luhn variation checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    Low

    The dictionary counts an instance as a violation if the Standardized Bank Code number matches a valid range.

    The Standardized Bank Code number can contain:

    • A period, hyphen, or space as delimiters. Multiple periods are allowed.
    • An alphabetical boundary.

    The number formats that can trigger the dictionary are:

    • 014027000005555558
    • 00201007--777777-7771

    A number format like 032180aaa000118359719 does not trigger the dictionary.

    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The Standardized Bank Code number is in a popular format.

    The Standardized Bank Code number must contain a non-alphanumeric boundary. The starting delimiter and ending delimiter can not be an alphabet.

    The number formats that can trigger the dictionary are:

    • 014027000005555558
    • 002010077777777771

    A number format like A032180aaa000118359719S does not trigger the dictionary.

    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The Standardized Bank Code number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, CLABE, CLABE interbancaria, or standardized bank code.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 014027000005555558
    • 002010077777777771

    A number format like @032180aaa000118359719A does not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases.

    Close
  • This dictionary detects tax documents, such as 1040 forms, 1099 forms, 1998-T forms, 3921 forms, etc.

    Zscaler supports only the following document types for tax documents: RTF, PDF, MSG, DOC, DOCX, DOCM, DOTX, DOTM, XLS, XLSX, XLSM, XLTM, PPT, PPTX, PPSX, PPTM, POTM, POTX, and IWORK (pages, numbers, and keynote). The minimum size required for a document is 1 KB.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold Criteria
    LowThis dictionary counts an instance as a violation if the Machine Learning (ML) match score is 50 or more.
    MediumThis dictionary counts an instance as a violation if the ML match score is 70 or more.
    HighThis dictionary counts an instance as a violation if the ML match score is 90 or more.
    Close
  • This dictionary detects Tax File Numbers (TFN) from Australia. The TFN is issued by the Australian Taxation Office (ATO) to each taxpaying entity such as an individual, company, superannuation fund, partnership, or trust.

    The popular format for a TFN is a unique 8-digit or 9-digit number.

    This dictionary uses the Mod 11 Check Digit checksum. This checksum is similar to the Luhn checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    Low

    The dictionary counts an instance as a violation if the TFN matches a valid range.

    The TFN can contain a period, hyphen, or space as delimiters.

    The number formats that can trigger the dictionary are:

    • 371 186 29
    • 459 - 599- .-230
    • 112.474-082
    • 565051603
    • 85655805'
    • '123456782

    A number format like 23456782 does not trigger the dictionary.

    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The TFN is in a popular format.

    The number formats that can trigger the dictionary are:

    • 371 186 29d
    • 459599230
    • 565051603*
    • '123456782

    The number formats that do not trigger the dictionary are:

    • 112.474-082
    • D85655805D
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The TFN is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, australian business number, marginal tax rate, medicare levy, portfolio number, service veterans, withholding tax, individual tax return, or tax file number.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 371 186 29d
    • 459599230
    • 565051603*
    • '123456782

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 112.474-082
    • D85655805D
    Close
  • This dictionary detects Austrian Tax Identification Numbers (ATTIN), which are 9-digit unique identifiers issued by the Directorate-General Taxation and Customs Union (DG TAXUD) to all taxpaying individuals of a member state.

    This dictionary uses the Luhn checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the number matches a valid range.

    The number formats that can trigger the dictionary are:

    • 93-173-6581
    • 93/173/6581
    • 93--173/6581

    A number format like 931736582 does not trigger the dictionary because it uses an invalid checksum.

    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The number is in a popular format.

    The number formats that can trigger the dictionary are:

    • 931736581
    • 93-173/6581

    The number formats that do not trigger the dictionary are:

    • 93-173-6581
    • 93/173/6581
    • 93--173/6581
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The ATTIN is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, österreich, Steuernummer, TIN.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 931736581
    • 93-173/6581

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 93-173-6581
    • 93/173/6581
    • 93--173/6581
    Close
  • This dictionary detects Belgian Tax Identification Numbers (BTIN), which are 11-digit unique identifiers issued by the Directorate-General Taxation and Customs Union (DG TAXUD) to all taxpaying individuals of a member state.

    This dictionary uses the Mod 97 checksum, which is similar to the Luhn checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the BTIN matches a valid range.

    The number formats that can trigger the dictionary are:

    • 00.0125-111.19
    • 00.01.2511.1.48
    • 0001251111-9

    The number formats that do not trigger the dictionary are:

    • 00.01.25-111.18 (invalid checksum)
    • 00.67.25-111.48 (invalid month)
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The BTIN is in a popular format.

    The number formats that can trigger the dictionary are:

    • 00.01.25-111.19
    • 00012511119

    The number formats that do not trigger the dictionary are:

    • 00.0125-111.19
    • 00.01.2511.1.48
    • 0001251111-9
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The BTIN is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, belasting aantal, numéro d'identification fiscale, bnn#.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 00.01.25-111.19
    • 00012511119

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 00.0125-111.19
    • 00.01.2511.1.48
    • 0001251111-9
    Close
  • This dictionary detects Danish Tax Identification Numbers (DTIN), which are 10-digit unique identifiers issued by the Directorate-General Taxation and Customs Union (DG TAXUD) to all taxpaying individuals of a member state.

    This dictionary uses the Mod 11 checksum, which is similar to the Luhn checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the DTIN number matches a valid range.

    The number formats that can trigger the dictionary are:

    • 010-111-1113
    • 01016011-11

    A number format like 010111-1116 does not trigger the dictionary.

    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The DTIN number is in a popular format.

    The number formats that can trigger the dictionary are:

    • 010111-1113
    • 0101601111

    The number formats that do not trigger the dictionary are:

    • 010-111-1113
    • 01016011-11
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The DTIN is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, centrale personregister, civilt registreringssystem, cpr, cpr#, gesundheitskarte nummer, gesundheitsversicherungkarte nummer, TIN, TAX ID, skat id, skattenummer, skat identifikationsnummer, skat kode.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 010-111-1113
    • 01016011-11

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 010111-1113
    • 0101601111
    Close
  • This dictionary detects Finland Tax Identification Numbers (FITIN), which are 10-digit unique identifiers issued by the Directorate-General Taxation and Customs Union (DG TAXUD) to all taxpaying individuals of a member state.

    This dictionary uses the Modulo 31 checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the FITIN number matches a valid range.

    The number formats that can trigger the dictionary are:

    • 13 1052-308T
    • 1310 52-308T
    • 13 10 52-308T

    The number formats that do not trigger the dictionary are:

    • 131052-308U (bad checksum)
    • 131352-3087 (invalid birth month)
    • 321052-3082 (invalid birth date)
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The FITIN number is in a popular format.

    The number formats that can trigger the dictionary are:

    • 131052-308T
    • 131052+308T
    • 131052a308T
    • 131052A308T

    The number formats that do not trigger the dictionary are:

    • 13 1052-308T
    • 1310 52-308T
    • 13 10 52-308T
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The FITIN is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, Tunnus Kod, tunnistenumero, tunnus numero, tunnusluku, tunnusnumero, and TIN.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 131052-308T
    • 131052+308T
    • 131052a308T
    • 131052A308T

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 13 1052-308T
    • 1310 52-308T
    • 13 10 52-308T
    Close
  • This dictionary detects French Tax Identification Numbers (FRTIN), which are 13-digit unique identifiers issued by the Directorate-General Taxation and Customs Union (DG TAXUD) to all taxpaying individuals of a member state.

    This dictionary uses the Mod 511 checksum, which is similar to the Luhn checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the SPI number matches a valid range.

    The number formats that can trigger the dictionary are:

    • 3 023 217 600 053
    • 30--23-217-600-053
    • 30 23-217.600/053

    A number format like 3 023 217 600 054 does not trigger the dictionary.

    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The SPI number is in a popular format.

    The number formats that can trigger the dictionary are:

    • 3023217600053
    • 30 23 217 600 053
    • 30-23-217-600-053
    • 30.23.217.600.053
    • 30/23/217/600/053

    The number formats that do not trigger the dictionary are:

    • 3 023 217 600 053
    • 30--23-217-600-053
    • 30 23-217.600/053
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The FRTIN is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, tax identification number and numéro SPI.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 30 23 217 600 053
    • 30-23-217-600-053
    • 30.23.217.600.053
    • 30/23/217/600/053

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 3 023 217 600 053
    • 30--23-217-600-053
    • 30 23-217.600/053
    Close
  • This dictionary detects German Tax Identification Numbers (GTIN), which are 11-digit unique identifiers issued by the Directorate-General Taxation and Customs Union (DG TAXUD) to all taxpaying individuals of a member state.

    This dictionary uses the Mod 11 checksum, which is similar to the Luhn checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the TIN matches a valid range.

    The number formats that can trigger the dictionary are:

    • 65929 970 489
    • 86-095742-719

    A number format like 65 929 970 480 does not trigger the dictionary.

    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The TIN is in a popular format.

    The number formats that can trigger the dictionary are:

    • 65 929 970 489
    • 65929970489

    The number formats that do not trigger the dictionary are:

    • 65929 970 489
    • 86-095742-719
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The GTIN is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, identifikationsnummer, steueridentifikationsnummer, steuernummer.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 3023217600053
    • 30 23 217 600 053
    • 30-23-217-600-053
    • 30.23.217.600.053
    • 30/23/217/600/053

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 3 023 217 600 053
    • 30--23-217-600-053
    • 30 23-217.600/053
    Close
  • This dictionary detects Greek Tax Identification numbers (GRTIN).

    This dictionary uses a variation of the Modulo 11 checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the GRTIN number matches a valid range.

    The number formats that can trigger the dictionary are:

    • 19 86 02 355 19 86-02355 19 8602355

    A number format like 198602356 does not trigger the dictionary (bad checksum).

    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The GRTIN number is in a popular format.

    A number format like 198602355 can trigger the dictionary.

    The number formats that do not trigger the dictionary are:

    • 19 86 02 355
    • 19 86-02355
    • 19 8602355
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The GRTIN number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, afm, afm#, tax identification number, and tin.

    A number format like 198602355 can trigger the dictionary.

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 19 86 02 355
    • 19 86-02355
    • 19 8602355
    Close
  • This dictionary detects Hungarian Tax Identification Numbers (HUTIN), which are 10-digit unique identifiers issued by the Directorate-General Taxation and Customs Union (DG TAXUD) to all taxpaying individuals of a member state.

    This dictionary uses the Mod 97 checksum, which is similar to the Luhn checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the TIN matches a valid range.

    The number formats that can trigger the dictionary are:

    • 80 71 592 153
    • 807159215 3
    • 8 071592153

    A number format like 8071592154 does not trigger the dictionary.

    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The TIN is in a popular format.

    A number format like 8071592153 triggers the dictionary.

    The number formats that do not trigger the dictionary are:

    • 80 71 592 153
    • 807159215 3
    • 8 071592153
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The HUTIN is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, adószám and adóazonosító szám.

    A number format like 8071592153 triggers the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases.

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 80 71 592 153
    • 807159215 3
    • 8 071592153
    Close
  • This dictionary detects Tax Identification numbers (NPWP) from Indonesia.

    The popular format for a Tax Identification number is a 15-digit number. The Tax Identification number has the following structure:

    • The first 9 digits (1-9) are the unique identity of the taxpayer.
    • The next 3 digits (10-12) are the code for the tax office where the taxpayer registered for the first time.
    • The last 3 digits (13-15) are the code for central or branch status.

    This dictionary uses the Luhn variation checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold Criteria
    LowThe dictionary counts an instance as a violation if the Tax Identification number matches a valid range.
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The Tax Identification number is in a popular format.
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The Tax Identification number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, npwp, indonesia tax number, indonesian tax number.
    Close
  • This dictionary detects Irish Tax Identification Numbers (IETIN), which are 8- or 9-character unique identifiers issued by the Department of Social Protection. The IETIN is also used by the Revenue Commissioners to identify taxpayers.

    This dictionary uses the Modulo 23 checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the IETIN matches a valid range.

    The number formats that can trigger the dictionary are:

    • 1234567 T
    • 1234567 TW

    The number formats that do not trigger the dictionary are:

    • 1234567Y (bad checksum)
    • 123456-7TW (doesn't match regex)
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The IETIN is in a popular format.

    A number format like 1234567T triggers the dictionary.

    The number formats that do not trigger the dictionary are:

    • 1234567 T
    • 1234567 TW
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The IETIN is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, Personal Public Service Number, PPS num, Tax Identification Number, TIN, pps number, ppsn, ppsno, uimhir phearsanta seirbhíse poiblí, pps uimh, and Uimhir aitheantais phearsanta.

    A number format like 1234567T triggers the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases.

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 1234567 T
    • 1234567 TW
    Close
  • This dictionary detects Luxembourg Tax Identification Numbers (LUTIN), which are 11- or 13-digit unique identifiers issued by the Directorate-General Taxation and Customs Union (DG TAXUD) to all taxpaying individuals of a member state.

    This dictionary uses the Luhn and Mod 11 checksums.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the TIN matches a valid range.

    The number formats that can trigger the dictionary are:

    • 189 312 010 5732
    • 1 893120105732
    • 189312010573 2
    • 81 738 480 120
    • 81 738 480.120
    • 81 -738/ .4 80...-120

    The number formats that do not trigger the dictionary are:

    • 1893120105732
    • 61432208511 (bad checksum)
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The TIN is in a popular format.

    The number formats that can trigger the dictionary are:

    • 1893120105732
    • 81.738.480.120
    • 81-738-480-120
    • 81738480120

    The number formats that do not trigger the dictionary are:

    • 189 312 010 5732
    • 1 893120105732
    • 189312010573 2
    • 81 738 480 120
    • 81 738 480.120
    • 81 -738/ .4 80...-120
    • 81 738 480.120
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The LUTIN is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, sécurité sociale, zinn nummer, carte sécurité sociale, étain, numéro d'étain, étain non, Numéro d'identification fiscal luxembourgeois, tin, zinn, Luxembourg Tax Identifikatiounsnummer, and steier.

    The number formats that trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 1893120105732
    • 81.738.480.120
    • 81-738-480-120
    • 81738480120

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 189 312 010 5732
    • 1 893120105732
    • 189312010573 2
    • 81 738 480 120
    • 81 738 480.120
    • 81 -738/ .4 80...-120
    • 81 738 480.120
    Close
  • This dictionary detects New Zealand Tax Identification Numbers (NZTIN), which are 8- or 9-digit unique identifiers issued by the Inland Revenue Department (IRD).

    This dictionary uses the Mod 11 algorithm for validation.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the NZTIN matches a valid range.

    The number formats that can trigger the dictionary are:

    • 136-410132
    • 136-4-10-132
    • 49091 850

    A number format like 136-410-134 does not trigger the dictionary.

    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The NZTIN is in a popular format.

    The number formats that can trigger the dictionary are:

    • 49 091 850
    • 136410132
    • 136-410-132

    The number formats that do not trigger the dictionary are:

    • 136-410132
    • 136-4-10-132
    • 49091 850
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The NZTIN is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, IRD Number and Inland Revenue Department.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 49 091 850
    • 136410132
    • 136-410-132

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 136-410132
    • 136-4-10-132
    • 49091 850
    Close
  • This dictionary detects Tax Identification numbers (PERUC) from Peru.

    This dictionary uses a variation of the Modulo 11 checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the PERUC number matches a valid range.

    The number formats that can trigger the dictionary are:

    • 20 503644-968
    • 20--503644-968
    • 2 -05/ .036 449...- 68

    A number format like 20503644969 does not trigger the dictionary (bad checksum).

    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The PERUC number is in a popular format.

    The number formats that can trigger the dictionary are:

    • 20503644968
    • 20 503644 968

    The number formats that do not trigger the dictionary are:

    • 20 503644-968
    • 20--503644-968
    • 2 -05/ .036 449...- 68
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The PERUC number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, Registro Unico Contribuyentes, tax identification number, peru tax identification number, peruvian tax identification number, peru tin, and peruvian tin.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 20503644968
    • 20 503644 968

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 20 503644-968
    • 20--503644-968
    • 2 -05/ .036 449...- 68
    Close
  • This dictionary detects Tax Identification numbers (PLTIN) from Poland, which are 10- or 11-digit unique identifiers issued by the Directorate-General Taxation and Customs Union (DG TAXUD) to all taxpaying individuals of a Member State.

    This dictionary uses the Modulo 11 checksum for the 10-digit number, and the Luhn checksum for the 11-digit number.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the PLTIN number matches a valid range.

    The number formats that can trigger the dictionary are:

    • 2234 567895
    • 0207080 3628

    A number format like 2234567894 does not trigger the dictionary (bad checksum).

    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The PLTIN number is in a popular format.

    The number formats that can trigger the dictionary are:

    • 2234567895
    • 02070803628

    The number formats that do not trigger the dictionary are:

    • 2234 567895
    • 0207080 3628
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The PLTIN number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, nip, nip#, numer identyfikacji podatkowej, numeridentyfikacjipodatkowej, tax identification number, tax number, and tin.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 2234567895
    • 02070803628

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 2234 567895
    • 0207080 3628
    Close
  • This dictionary detects Portuguese Tax Identification Numbers (PTIN), which are 9-digit unique identifiers issued by the Directorate-General Taxation and Customs Union (DG TAXUD) to all taxpaying individuals of a member state.

    This dictionary uses the Mod 11 algorithm for validation.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the NIF number matches a valid range.

    The number formats that can trigger the dictionary are:

    • 299 999998
    • 2544964.40
    • 2544.96350

    A number format like 299999991 does not trigger the dictionary.

    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The NIF number is in a popular format.

    A number format like 299999998 triggers the dictionary.

    The number formats that do not trigger the dictionary are:

    • 640130.3331
    • 640823 3234
    • 640 8233234
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The PTIN is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, cpf#, cpf, nif#, nif, numero fiscal, tin.

    A number format like 299999998 triggers the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases.

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 640130.3331
    • 640823 3234
    • 640 8233234
    Close
  • This dictionary detects Tax Identification numbers (CIF) from Spain.

    This dictionary uses the Luhn checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the CIF number matches a valid range.

    A number format like A58818501 can trigger the dictionary.

    The number formats that do not trigger the dictionary are:

    • A 58818501
    • A588 18501
    • A588185-01
    • A58818502 (bad checksum)
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The CIF number is in a popular format.

    A number format like A58818501 can trigger the dictionary.

    The number formats that do not trigger the dictionary are:

    • A 58818501
    • A588 18501
    • A588185-01
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The CIF number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, tax identification code, Código identificación fiscal, CIF, and CIF número.

    A number format like A58818501 can trigger the dictionary.

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • A 58818501
    • A588 18501
    • A588185-01
    Close
  • This dictionary detects Swedish Tax Identification Numbers (STIN), which are 10-digit unique identifiers issued by the Directorate-General Taxation and Customs Union (DG TAXUD) to all taxpaying individuals of a member state.

    This dictionary uses the Luhn checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the TIN matches a valid range.

    The number formats that can trigger the dictionary are:

    • 640130.3331
    • 640823 3234
    • 640 8233234

    A number format like 640823-3235 does not trigger the dictionary.

    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The TIN is in a popular format.

    The number formats that can trigger the dictionary are:

    • 640130-3331
    • 6408233234

    The number formats that do not trigger the dictionary are:

    • 640130.3331
    • 640823 3234
    • 640 8233234
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The STIN is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, personnummer, skatt identifikation, skattebetalarens, identifikationsnummer, sverige tin.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 640130-3331
    • 6408233234

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 640130.3331
    • 640823 3234
    • 640 8233234
    Close
  • This dictionary detects Tax Identification numbers (USITIN) from the United States.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the USITIN number matches a valid range.

    The number formats that can trigger the dictionary are:

    • 927 70-5828 927 70 5828 927 70 58 28

    A number format like 827705828 does not trigger the dictionary (first digit must be a 9).

    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The USITIN number is in a popular format.

    A number format like 927705828 can trigger the dictionary.

    The number formats that do not trigger the dictionary are:

    • 927 70-5828
    • 927 70 5828
    • 927 70 58 28
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The USITIN number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, ITIN, USITIN, tax identification number, and taxpayer identification number.

    A number format like 927705828 can trigger the dictionary.

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 927 70-5828
    • 927 70 5828
    • 927 70 58 28
    Close
  • This dictionary detects technical documents, like computer user manuals, white papers, technical publications, etc.

    Zscaler supports only the following document types for technical documents: RTF, PDF, MSG, DOC, DOCX, DOCM, DOTX, DOTM, XLS, XLSX, XLSM, XLTM, PPT, PPTX, PPSX, PPTM, POTM, POTX, and IWORK (pages, numbers, and keynote). The minimum size required for a document is 1 KB.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold Criteria
    LowThis dictionary counts an instance as a violation if the Machine Learning (ML) match score is 50 or more.
    MediumThis dictionary counts an instance as a violation if the ML match score is 70 or more.
    HighThis dictionary counts an instance as a violation if the ML match score is 90 or more.
    Close
  • This dictionary detects transportation and motor department documents, like sale or transfer of a vehicle, license forms, driving records, etc.

    Zscaler supports only the following document types for transportation and motor department documents: RTF, PDF, MSG, DOC, DOCX, DOCM, DOTX, DOTM, XLS, XLSX, XLSM, XLTM, PPT, PPTX, PPSX, PPTM, POTM, POTX, and IWORK (pages, numbers, and keynote). The minimum size required for a document is 1 KB.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold Criteria
    LowThis dictionary counts an instance as a violation if the Machine Learning (ML) match score is 50 or more.
    MediumThis dictionary counts an instance as a violation if the ML match score is 70 or more.
    HighThis dictionary counts an instance as a violation if the ML match score is 90 or more.
    Close
  • This dictionary uses phrase matching to detect content related to treatments information. It does not use a checksum.

    You can specify an Action to configure how the dictionary evaluates matching treatment names:

    • Count All: The dictionary counts all matches of the treatment name, including identical treatment names, toward the match count.
    • Count Unique: The dictionary counts each unique match of the treatment name toward the match count only once, regardless of how many times the treatment name appears.
    Close
  • This dictionary detects the 13-digit Unique Master Citizen Number.

    This dictionary uses the Mod 11 checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the Unique Master Citizen Number matches a valid range.

    The number formats that can trigger the dictionary are:

    • 0407 9894 04655
    • 04.0798-940 4655
    • 04 -07/ .989 404...-655

    A number format like 0407989404651 does not trigger the dictionary (bad checksum).

    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The Unique Master Citizen Number is in a popular format.

    A number format like 0407989404655 triggers the dictionary.

    The number formats that do not trigger the dictionary are:

    • 0407 9894 04655
    • 04.0798-940 4655
    • 04 -07/ .989 404...-655
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The Unique Master Citizen Number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, JMBG, ЈМБГ, ЕМБГ, Jedinstveni matični broj građana, EMŠO.

    A number format like 0407989404655 triggers the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases.

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • 0407 9894 04655
    • 04.0798-940 4655
    • 04 -07/ .989 404...-655
    Close
  • This dictionary detects Austrian value-added tax (VAT) numbers, which are alphanumeric identifiers used to identify taxable persons (business) or non-taxable legal entities.

    This dictionary uses a variation of the Mod 10 checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the VAT number matches a valid range.

    The number formats that can trigger the dictionary are:

    • AT U10223006
    • AT-U10223006
    • AT.U10223006
    • AT -. U10223006

    The number formats that do not trigger the dictionary are:

    • ATU10223007 (invalid checksum)
    • A TU10223006
    • ATU 10223006
    • ATU10 223 00 6
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The VAT number is in a popular format.

    The number formats that can trigger the dictionary are:

    • ATU10223006
    • AT U10223006

    The number formats that do not trigger the dictionary are:

    • AT U10223006
    • AT-U10223006
    • AT.U10223006
    • AT -. U10223006
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The VAT number is accompanied by phrases such as: Umsatzsteuer-Identifikationsnummer, Ust-Identifikationsnummer, umsatzsteuer, vat number, vat num, vat.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • ATU10223006
    • AT U10223006

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • AT U10223006
    • AT-U10223006
    • AT.U10223006
    • AT -. U10223006
    Close
  • This dictionary detects Belgian value-added tax (VAT) numbers, which are alphanumeric identifiers used to identify taxable persons (business) or non-taxable legal entities.

    This dictionary uses the Mod 97 checksum, which is similar to the Luhn checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the VAT number matches a valid range.

    The number formats that can trigger the dictionary are:

    • BE 0776091951
    • BE-0776091951
    • BE.0776091951
    • BE -. 0776091951

    The number formats that do not trigger the dictionary are:

    • BE0776091952 (invalid checksum)
    • B E0776091951
    • BE0 776091951
    • BE077 609 195 1
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The VAT number is in a popular format.

    The number formats that can trigger the dictionary are:

    • BE0776091951
    • BE 0776091951

    The number formats that do not trigger the dictionary are:

    • BE 0776091951
    • BE-0776091951
    • BE.0776091951
    • BE -. 0776091951
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The VAT number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, vat number and vat num.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • BE0776091951
    • BE 0776091951

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • BE 0776091951
    • BE-0776091951
    • BE.0776091951
    • BE -. 0776091951
    Close
  • This dictionary detects French value-added tax (VAT) numbers, which are alphanumeric identifiers used to identify taxable persons (business) or non-taxable legal entities.

    This dictionary uses the Mod 97 checksum, which is similar to the Luhn checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the VAT number matches a valid range.

    The number formats that can trigger the dictionary are:

    • FR 00300076965
    • FR-00300076965
    • FR.00300076965
    • FR -. 00300076965

    The number formats that do not trigger the dictionary are:

    • FR00300076964 (invalid checksum)
    • F R00300076965
    • FR0 0300076965
    • FR003 000 769 65
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The VAT number is in a popular format.

    The number formats that can trigger the dictionary are:

    • FR00300076965
    • FR 00300076965

    The number formats that do not trigger the dictionary are:

    • FR 00300076965
    • FR-00300076965
    • FR.00300076965
    • FR -. 00300076965
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The VAT number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, Numéro TVA intracommunautaire, TVA intracommunautaire, TVA, and vat.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • FR00300076965
    • FR 00300076965

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • FR 00300076965
    • FR-00300076965
    • FR.00300076965
    • FR -. 00300076965
    Close
  • This dictionary detects German value-added tax (VAT) numbers, which are alphanumeric identifiers used to identify taxable persons (business) or non-taxable legal entities.

    This dictionary uses a combination of the Mod 10 and Mod 11 checksums.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the VAT number matches a valid range.

    The number formats that can trigger the dictionary are:

    • DE 111111125
    • DE-111111125
    • DE.111111125
    • DE -. 111111125

    The number formats that do not trigger the dictionary are:

    • D E111111125
    • DE1 11111125
    • DE 111 111 125
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The VAT number is in a popular format.

    The number formats that can trigger the dictionary are:

    • DE111111125
    • DE 111111125

    The number formats that do not trigger the dictionary are:

    • DE 111111125
    • DE-111111125
    • DE.111111125
    • DE -. 111111125
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The VAT number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, mwst, mehrwertsteuer identifikationsnummer, mehrwertsteuer nummer, vat num, vat#.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • DE111111125
    • DE 111111125

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • DE 111111125
    • DE-111111125
    • DE.111111125
    • DE -. 111111125
    Close
  • This dictionary detects Irish value-added tax (VAT) numbers, which are alphanumeric identifiers used to identify taxable persons (business) or non-taxable legal entities.

    This dictionary uses the Modulo 23 checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the VAT number matches a valid range.

    The number formats that can trigger the dictionary are:

    • IE 8Z49289F
    • IE-8Z49289F
    • IE.8Z49289F
    • IE -. 8Z49289F

    The number formats that do not trigger the dictionary are:

    • IE8Z49289G (bad checksum)
    • I E8Z49289F (doesn't match regex)
    • IE8Z 49289F (doesn't match regex)
    • IE8-Z4-92-89F (doesn't match regex)
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The VAT number is in a popular format.

    The number formats that can trigger the dictionary are:

    • IE8Z49289F
    • IE 8Z49289F

    The number formats that do not trigger the dictionary are:

    • IE 8Z49289F
    • IE-8Z49289F
    • IE.8Z49289F
    • IE -. 8Z49289F
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The VAT number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, vat num and vat number.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • IE8Z49289F
    • IE 8Z49289F

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • IE 8Z49289F
    • IE-8Z49289F
    • IE.8Z49289F
    • IE -. 8Z49289F
    Close
  • This dictionary detects Luxembourg value-added tax (VAT) numbers, which are alphanumeric identifiers used to identify taxable persons (business) or non-taxable legal entities.

    This dictionary uses the Mod 89 checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the VAT number matches a valid range.

    The number formats that can trigger the dictionary are:

    • LU 13669580
    • LU-13669580
    • LU.13669580
    • LU -. 13669580

    The number formats that do not trigger the dictionary are:

    • LU13669581 (invalid checksum)
    • L U13669580
    • LU1 3669580
    • LU136 695 80
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The VAT number is in a popular format.

    The number formats that can trigger the dictionary are:

    • LU13669580
    • LU 13669580

    The number formats that do not trigger the dictionary are:

    • LU 13669580
    • LU-13669580
    • LU.13669580
    • LU -. 13669580
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The VAT number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, tva, vat num, and vat.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • LU13669580
    • LU 13669580

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • LU 13669580
    • LU-13669580
    • LU.13669580
    • LU -. 13669580
    Close
  • This dictionary detects Netherlands value-added tax (VAT) numbers, which are alphanumeric identifiers used to identify taxable persons (business) or non-taxable legal entities.

    This dictionary uses the Mod 11 checksum, which is similar to the Luhn checksum.

    The following table lists the confidence score threshold criteria for this dictionary. You can modify the confidence score threshold.

    Confidence ScoreThreshold CriteriaExamples of Data
    LowThe dictionary counts an instance as a violation if the VAT number matches a valid range.

    The number formats that can trigger the dictionary are:

    • NL 123456782B01
    • NL-123456782B01
    • NL.123456782B01
    • NL -. 123456782B01

    The number formats that do not trigger the dictionary are:

    • NL123456783B01 (invalid checksum)
    • N L123456782B01
    • NL1 23456782B01
    • NL123 456 78 2B0 1
    Medium

    The dictionary counts an instance as a violation if:

    • The requirements of Low Confidence are met.
    • The VAT number is in a popular format.

    The number formats that can trigger the dictionary are:

    • NL123456782B01
    • NL 123456782B01

    The number formats that do not trigger the dictionary are:

    • NL 123456782B01
    • NL-123456782B01
    • NL.123456782B01
    • NL -. 123456782B01
    High

    The dictionary counts an instance as a violation if:

    • The requirements of Medium Confidence are met.
    • The VAT number is accompanied by any of the dictionary’s default or custom high confidence phrases. For example, Btw-nummer, Btw-num, vat num, and vat.

    The number formats that can trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • NL123456782B01
    • NL 123456782B01

    The number formats that do not trigger the dictionary if accompanied by any of the dictionary’s default or custom high confidence phrases are:

    • NL 123456782B01
    • NL-123456782B01
    • NL.123456782B01
    • NL -. 123456782B01
    Close
  • This dictionary detects content related to weapons. It detects weapons such as firearms and other military weapons.

    This dictionary does not use a checksum.

    You can modify the Confidence Score Threshold. Confidence scores inform the dictionary how high it must raise the bar, or threshold, for identifying violations and triggering them. To learn more, see Configuring the Confidence Score Threshold.

    Close
Related Articles
About DLP DictionariesUnderstanding Predefined DLP DictionariesEditing Predefined DLP DictionariesCloning Predefined DLP DictionariesAdding Custom DLP DictionariesDefining Patterns for Custom DLP DictionariesDefining Phrases for Custom DLP DictionariesDefining Microsoft Information Protection Labels for Custom DLP DictionariesAbout DLP EnginesUnderstanding DLP EnginesEditing Predefined DLP EnginesAdding Custom DLP EnginesCloning DLP Engines