icon-unified.svg
Experience Center

Defining Patterns for Custom DLP Dictionaries

You can use alphanumeric patterns to configure custom dictionaries that match a wide variety of data types. For example, you can define patterns to detect data like phone numbers, driver's license numbers, or credit card numbers for specific issuers (a number of sample patterns are provided below). To learn more, see Adding Custom DLP Dictionaries.

General guidelines for patterns are as follows:

  • A dictionary can contain up to 8 patterns.
  • Each pattern can have a maximum of 256 bytes.
  • You can use tokens (?i and ?-i) for case-insensitive matches.

Syntax Requirements for Patterns

Zscaler supports the pattern syntax used by the Perl Compatible Regular Expression (PCRE) library with some exceptions (i.e., lookaround):

  • Metacharacter (POSIX ERE)DefinitionSupported by Zscaler?
    .Matches any single characterSupported
    []Matches a single character from those given between the brackets, matches character ranges (e.g., [a–z])Supported
    [^ ]Matches a single character not from those given between ^ and ]Supported
    ^Matches the starting position within the stringNot Supported
    $Matches the ending position within the stringNot Supported
    ()Defines a capture groupCapture groups are not supported. Parentheses can be used for bounding subexpressions.
    \nReferences a capture groupNot Supported
    *Repeats the preceding element 0 or more timesSupported
    {m,n}Matches the preceding element at least m times and not more than n times. Requires m is less than or equal to n.Zscaler recommends specifying a range as small as possible with m and n. Additionally, you should use the {m,n} construct to quantify the most specific preceding characters (e.g., [a-z]{3,5} instead of .{3,100}. If you provide a range that is too wide, the pattern might not be accepted.
    ?Repeats the preceding element 0 or 1 timesSupported
    +Repeats the preceding element 1 or more timesSupported
    |Matches either the expression before or the expression after the operator (e.g., "abc|def" matches "abc" or "def")Supported
    Close
  • Character classDefinitionSupported by Zscaler?
    \dMatches any single character that is a digit (i.e., [0–9])Supported
    \DMatches any single character that is a non-digitSupported
    \sMatches any single horizontal whitespace character across a single line (i.e., a space " " or tab)Supported
    \SMatches any single non-whitespace characterSupported
    \wMatches any single word character (i.e., ASCII characters [a–zA–Z0–9_])Supported
    \WMatches any single non-word characterSupported
    \bMatches any single word boundarySupported
    Close
  • A repeat of an expression containing a repeat (*, +, ?, or {m,n}) is supported. Examples of repeated elements are in red text below:

    Syntax with Nested RepeatsSyntax without Nested Repeats
    \d-[A-Z]{2}?X - Matches "5-X" or "5-GAX"\d-([A-Z][A-Z])?X
    (\d{3}-){1,2}\d{4} - Matches "555-555-5555" or "555-5555"
    \d{3}-(\d{3}-)?\d{4}
    \d{3}-(\d\d\d-)?\d{4}
    Close
  • Your patterns are no longer required to begin with a simple sequence of alphabetic and numeric characters of a known length called the "base token". A base token ends with an expression that is non-alphanumeric (e.g., ";"), or with the end of the pattern.

    The following are examples of patterns with and without base tokens:

    Example - US phone numbers:

    • Without a base token: (1-)?\d{3}-\d{3}-\d{4} - Starts with an optional expression (in red).
    • Without a base token: (CA)-\d{6} - Starts with grouping (in red).
    • With a base token::
      • 1-\d{3}-\d{3}-\d{4} - Starts with a single-number token (in red).
      • \d{3}-\d{3}-\d{4} - Starts with a three-number token (in red).

    Example - IPv4 Address in dotted-quad notation:

    • Without a Base Token: [0-9.]{7,15} - Mix of numeric characters and punctuation (in red).
    • With a Base Token: \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} - Starts with 1-3 numeric characters (in red).
    Close
  • Close

Adding Patterns

To add patterns:

  1. Go to Policies > Data Protection > Common Resources > Dictionaries & Engines.
  2. Click Add DLP Dictionary or click the Edit icon for an existing dictionary.

The Add/Edit DLP Dictionary window appears.

  1. In the Add/Edit DLP Dictionary window:
    1. Enter the Pattern you want the dictionary to match. To learn more, see the guidelines and syntax requirements for patterns.
    2. Choose the Action the dictionary takes upon detecting valid matches:
      • Count All: The dictionary counts all matches of the pattern, including identical patterns, toward the match count. For example, you create a dictionary with a pattern for US phone numbers and choose Count All. When the dictionary scans content containing three instances of the same exact US phone number, it counts all three instances as three matches.
      • Count Unique: The dictionary counts each unique match of the pattern toward the match count only once, regardless of how many times it appears in the content. For example, you create a dictionary with a pattern for US phone numbers and choose Count Unique. When the dictionary scans content containing three instances of the same exact US phone number, it counts all three instances as one match.
  2. To add another pattern, click Add Pattern.

Adding a Pattern in a custom DLP Dictionary

  1. Click Save and activate the change.
Related Articles
About DLP DictionariesUnderstanding Predefined DLP DictionariesEditing Predefined DLP DictionariesCloning Predefined DLP DictionariesAdding Custom DLP DictionariesDefining Patterns for Custom DLP DictionariesDefining Phrases for Custom DLP DictionariesDefining Microsoft Information Protection Labels for Custom DLP DictionariesAbout DLP EnginesUnderstanding DLP EnginesEditing Predefined DLP EnginesAdding Custom DLP EnginesCloning DLP Engines