How do I edit predefined DLP dictionaries?


How do I edit predefined DLP dictionaries?

Modifying a predefined DLP dictionary is one of the tasks you must complete when configuring DLP policy rules. See How do I configure a policy using Zscaler DLP engines? for the full list of tasks.

You can modify, depending on the dictionary, either one or both of the settings detailed below.

Below is a list of the predefined dictionaries Zscaler provides. Click on a dictionary to learn how to make edits.

You can also add a custom dictionary. See How do I add a custom DLP dictionary?

Any combination of predefined and custom dictionaries can be added to a DLP engine, which is what you must reference when creating DLP policies. See About DLP Engines for more information.

The dictionary triggers only if it finds more violations than the number specified here. For example, if you enter 7, the dictionary will only trigger upon finding 8 or more violations.

  • If you enter a value of less than five, the dictionary will not trigger unless it also finds relevant keywords. For example, for the Credit Card Numbers dictionary, if you enter a value less than five, the dictionary will require keywords like "Visa," or “MasterCard” in order to trigger.
  • You may enter any value less than 10,000.

 You can select a value of Low, Medium, or High.

  • For dictionaries where you can also specify the Number of Violations Threshold, the confidence score threshold you select helps determine what counts as a violation. A lower confidence score threshold means the dictionary is more aggressive in counting an instance as a violation. Conversely, a higher confidence score threshold means the dictionary needs more exact format and verification requirements to be met before it counts an instance as a violation.
  • For dictionaries where you can specify only the Confidence Score Threshold, a lower confidence score threshold means the dictionary is more aggressive in identifying violations and requires less matching content to be found before it triggers. Conversely, a higher confidence score threshold means that the dictionary needs more exact format and verification requirements to be met for matching content, and that it also needs more instances of matching content to be found before it triggers.

 This dictionary detects adult content.

To edit this dictionary, follow the instructions below. 

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. Point to the Adult Content dictionary and click the Edit icon.
  3. You can modify:
    Confidence Score Threshold: Select a value of Low, Medium, or High. Confidence scores inform the dictionary how high it must raise the bar, or threshold, for identifying violations and triggering. A lower confidence score threshold means the dictionary is more aggressive in identifying violations and requires less matching content to be found before it triggers. Conversely, a higher confidence score threshold means that the dictionary needs more exact format and verification requirements to be met for matching content, and  that it also needs more instances of matching content to be found before it triggers.
  4. Click Save and activate the change.

 This dictionary detects leakage of credit cards.

To edit this dictionary, follow the instructions below.

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. Point to the Credit Cards dictionary and click the Edit icon.
  3. You can modify:
    • Number of Violations Threshold: The dictionary triggers only if it finds more violations than the number specified here. For example, if you enter 7, the dictionary will only trigger upon finding 8 or more violations.
      • If you enter a value of less than five, the dictionary will not trigger unless it also finds relevant keywords.  For example, for this dictionary, if you enter a value less than five, the dictionary will require keywords like "Visa," or “MasterCard” in order to trigger.
      • You may enter any value less than 10,000.
    • Confidence Score Threshold: Select a value of Low, Medium, or High
      For the Credit Cards dictionary, the confidence scores have the following implications:
      • Low: Dictionary counts an instance as a violation if a CCN can be validated with a Luhn check.
      • Medium: Dictionary counts an instance as a violation if:
        • The requirements of Low Confidence are met.
        • The CCN is in a popular format.
        • The CCN, the length, and starting range of the number matches that of credit card providers.
      • High: Dictionary counts an instance as a violation if:
        • The requirements of Medium Confidence are met.
        • The CCN is accompanied by keywords such as "American Express," "Amex," "master card," "visa," "CV code," "select card type," "discover," "diners club," "jcb," "pay with checking account," "pay by check or money order," "credit card number," "card holder name," and "expiration date."
  4. Click Save and activate the change.

This dictionary detects leakage of financial statements.

To edit this dictionary, follow the instructions below.

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. Point to the Financial Statements dictionary and click the Edit icon.
  3. You can modify:
    Confidence Score Threshold: Select a value of Low, Medium, or High. Confidence scores inform the dictionary how high it must raise the bar, or threshold, for identifying violations and triggering. A lower confidence score threshold means the dictionary is more aggressive in identifying violations and requires less matching content to be found before it triggers. Conversely, a higher confidence score threshold means that the dictionary needs more exact format and verification requirements to be met for matching content, and  that it also needs more instances of matching content to be found before it triggers.
  4. Click Save and activate the change.

 This dictionary detects content related to gambling.

To edit this dictionary, follow the instructions below.

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. Point to the Gambling dictionary and click the Edit icon.
  3. You can modify:
    Confidence Score Threshold: Select a value of Low, Medium, or High. Confidence scores inform the dictionary how high it must raise the bar, or threshold, for identifying violations and triggering. A lower confidence score threshold means the dictionary is more aggressive in identifying violations and requires less matching content to be found before it triggers. Conversely, a higher confidence score threshold means that the dictionary needs more exact format and verification requirements to be met for matching content, and  that it also needs more instances of matching content to be found before it triggers.
  4. Click Save and activate the change.

 This dictionary detects content related to illegal drugs.

To edit this dictionary, follow the instructions below.

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. Point to the Illegal Drugs dictionary and click the Edit icon.
  3. You can modify:
    Confidence Score Threshold: Select a value of Low, Medium, or High. Confidence scores inform the dictionary how high it must raise the bar, or threshold, for identifying violations and triggering. A lower confidence score threshold means the dictionary is more aggressive in identifying violations and requires less matching content to be found before it triggers. Conversely, a higher confidence score threshold means that the dictionary needs more exact format and verification requirements to be met for matching content, and  that it also needs more instances of matching content to be found before it triggers.
  4. Click Save and activate the change.

 This dictionary detects leakage of medical information.

To edit this dictionary, follow the instructions below.

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. Point to the Medical Information dictionary and click the Edit icon.
  3. You can modify:
    Confidence Score Threshold: Select a value of Low, Medium, or High. Confidence scores inform the dictionary how high it must raise the bar, or threshold, for identifying violations and triggering. A lower confidence score threshold means the dictionary is more aggressive in identifying violations and requires less matching content to be found before it triggers. Conversely, a higher confidence score threshold means that the dictionary needs more exact format and verification requirements to be met for matching content, and  that it also needs more instances of matching content to be found before it triggers.
  4. Click Save and activate the change.

 This dictionary detects leakage of names from the United States region.

To edit this dictionary, follow the instructions below. 

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. Point to the Names (US) dictionary and click the Edit icon.
  3. You can modify:
    • Number of Violations Threshold: The dictionary triggers only if it finds more violations than the number specified here. For example, if you enter 7, the dictionary will only trigger upon finding 8 or more violations.
      • If you enter a value of less than five, the dictionary will not trigger unless it also finds keywords relevant to US names.
      • You may enter any value less than 10,000.
    • Confidence Score Threshold: Select a value of Low, Medium, or High
      For the Names (US) dictionary, the confidence scores have the following implications:
      • Low: Dictionary counts an instance as a violation if the content contains either a first or last name that is in the dictionary.
      • Medium: Dictionary counts an instance as a violation if the content contains a first and last name that is in the dictionary.
      • High: Dictionary counts an instance as a violation if the content contains either the word "name" or "address" as well as a full name that is in the dictionary, in one of the following formats:
        • Firstname Lastname
        • Lastname Firstname (This is the default.)
  4. Click Save and activate the change.

 This dictionary detects leakage of UK National Insurance Numbers.

To edit this dictionary, follow the instructions below.

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. Point to the National Insurance Numbers (UK) dictionary and click the Edit icon.
  3. You can modify:
    Number of Violations Threshold: The dictionary triggers only if it finds more violations than the number specified here. For example, if you enter 7, the dictionary will only trigger upon finding 8 or more violations.
    • If you enter a value of less than five, the dictionary will not trigger unless it also finds keywords relevant to UK National Insurance Numbers.
    • You may enter any value less than 10,000.
  4.  Click Save and activate the change.

This dictionary detects leakage of National Registration Identity Card Numbers (UIN and FIN) for Singapore.

To edit this dictionary, follow the instructions below. 

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. Point to the NRIC Numbers (Singapore) dictionary and click the Edit icon.
  3. You can modify:
    Number of Violations Threshold: The dictionary triggers only if it finds more violations than the number specified here. For example, if you enter 7, the dictionary will only trigger upon finding 8 or more violations.
    • If you enter a value of less than five, the dictionary will not trigger unless it also finds keywords relevant to NRIC Numbers.
    • You may enter any value less than 10,000.
  4. Click Save and activate the change.

This dictionary detects content related to Salesforce.com data.

To edit this dictionary, follow the instructions below.

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. Point to the Salesforce.com Data dictionary and click the Edit icon.
  3. You can modify:
    Confidence Score Threshold: Select a value of Low, Medium, or High. Confidence scores inform the dictionary how high it must raise the bar, or threshold, for identifying violations and triggering. A lower confidence score threshold means the dictionary is more aggressive in identifying violations and requires less matching content to be found before it triggers. Conversely, a higher confidence score threshold means that the dictionary needs more exact format and verification requirements to be met for matching content, and  that it also needs more instances of matching content to be found before it triggers.
  4. Click Save and activate the change.

This dictionary detects leakage of Canadian Social Insurance Numbers.

To edit this dictionary, follow the instructions below. 

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. Point to the Canadian Social Insurance Numbers dictionary and click the Edit icon.
  3. You can modify:
    Number of Violations Threshold: The dictionary triggers only if it finds more violations than the number specified here. For example, if you enter 7, the dictionary will only trigger upon finding 8 or more violations.
    • If you enter a value of less than five, the dictionary will not trigger unless it also finds keywords relevant to Canadian Social Insurance Numbers.
    • You may enter any value less than 10,000.
  4. Click Save and activate the change.

This dictionary detects leakage of U.S. Social Security Numbers.

To edit this dictionary, follow the instructions below. 

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. Point to the Social Security Numbers (US) dictionary and click the Edit icon.
  3. You can modify:
    • Number of Violations Threshold: The dictionary triggers only if it finds more violations than the number specified here. For example, if you enter 7, the dictionary will only trigger upon finding 8 or more violations.
      • If you enter a value of less than five, the dictionary will not trigger unless it also finds keywords relevant to US Social Security Numbers.
      • You may enter any value less than 10,000.
    • Confidence Score Threshold: Select a value of Low, Medium, or High. For the Social Security Numbers (US) dictionary, the confidence scores have the following implications:
      • Low: Dictionary counts an instance as a violation if it matches a valid range.
      • Medium: Dictionary counts an instance as a violation if:
        • The requirements of Low Confidence are met.
        • The SSN is in a popular format.
      • High: Dictionary counts an instance as a violation if:
        • The requirements of Medium Confidence are met.
        • The SSN is accompanied by keywords such as “date of birth,” “social security number,” “tax payer id,” “ssn,” and “password.”
  4. Click Save and activate the change.

This dictionary detects leakage of source code. Note that the Source Code dictionary will not trigger unless the data size of the detected content is at least 1 KB.

To edit this dictionary, follow the instructions below.

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. Point to the Source Code dictionary and click the Edit icon.
  3. You can modify:
    Confidence Score Threshold: Select a value of Low, Medium, or High. Confidence scores inform the dictionary how high it must raise the bar, or threshold, for identifying violations and triggering. A lower confidence score threshold means the dictionary is more aggressive in identifying violations and requires less matching content to be found before it triggers. Conversely, a higher confidence score threshold means that the dictionary needs more exact format and verification requirements to be met for matching content, and  that it also needs more instances of matching content to be found before it triggers.
  4.  Click Save and activate the change.

This dictionary detects leakage of weapons.

To edit this dictionary, follow the instructions below. 

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. Point to the Weapons dictionary and click the Edit icon.
  3. You can modify:
    Confidence Score Threshold: Select a value of Low, Medium, or High. Confidence scores inform the dictionary how high it must raise the bar, or threshold, for identifying violations and triggering. A lower confidence score threshold means the dictionary is more aggressive in identifying violations and requires less matching content to be found before it triggers. Conversely, a higher confidence score threshold means that the dictionary needs more exact format and verification requirements to be met for matching content, and  that it also needs more instances of matching content to be found before it triggers.
  4.  Click Save and activate the change.