About IPsec VPNs

IPsec (Internet protocol security) is a suite of protocols that provide network-layer security to a VPN (virtual private network). A VPN is a virtual network that provides a secure communication path between two peers in a public network. The peers can be two hosts, a remote host and a network gateway, or the gateways of two networks, such as the gateway of your corporate network and a ZEN (Zscaler Enforcement Node) in the service.

IPsec provides the following types of protection:

  • Confidentiality: Ensures that data cannot be read by unauthorized parties.
  • Integrity: Verifies that data was not modified during transit.
  • Authentication: Verifies the identity of the peers.

As shown in the following table, IPsec provides a number of options for applying each type of protection. The peers in the IPsec VPN use a negotiation process called IKE (Internet Key Exchange) to define the security mechanisms they will use to protect their communications. IKE has two phases. In the first phase, the peers define the security parameters they will use to communicate in the second phase. This collection of security parameters is called a security association (SA). In the second phase, the peers define the SA that they will use to protect the actual data exchange.

IPSec VPN Tunnel
  IKE Phase 1 IKE Phase 2


AES, null





PSK, Digital Certificates   

PSK, Digital Certificates   




Key Exchange Method   



Following are the types of protection that IPsec provides and their corresponding algorithms.

Learn about the following:

IPsec uses algorithms such as AES (Advanced Encryption Standard) to encrypt IP packets. These algorithms use symmetric key cryptography to provide encryption.

In this type of cryptography, the peers use the same key to encrypt and decrypt packets. When peer A sends a packet to peer B, it first encrypts the data by dividing it into blocks, and then uses the key and data blocks to perform multiple rounds of cryptographic operations. When peer B receives the packet, it uses the same key and performs the same operations in reverse order to decrypt the data.

AES has a large block size and key length and uses a 128-bit block size and keys with 128, 192 and 256 bits.

IPsec provides authentication and integrity protection through an HMAC (hash message authentication code) algorithm, such as MD5 (Message Digest Algorithm-5) or SHA (Secure Hash Algorithm). This type of algorithm generates a hash (also referred to as a message digest) from the message and a key known to both peers. When peer A sends a message to peer B, it generates the hash and adds it to the packet it sends to peer B. When peer B receives the packet, it uses the shared key to generate the hash and verifies the authenticity and integrity of the packet when the two hashes match.

SHA-1 and SHA-2 are generally considered more secure than MD5 because they generate a larger hash. MD5 generates a 128-bit hash, SHA-1 generates a 160-bit hash, and SHA-2 is a set of four algorithms whose names refer to the size of the hashes they produce, that is SHA2-224, SHA2-256, SHA2-384, and SHA2-512.

IPsec peers can use the following methods to authenticate each other:

  • PSK (pre-shared keys): This type of authentication uses a key that the peers agree on beforehand. The key, also known as a secret, is a text string similar to a password. Peer A uses the pe-shared key and additional data to generate a hash value. Peer B uses the same key and additional data to generate a hash value. Peer B authenticates peer A when the two hash values match.
  • Digital Certificates: Each peer has a digital certificate that contains a public key. In this type of authentication, peer A generates a hash value and encrypts the hash with its private key. The encrypted hash is its digital signature. Peer A then sends the certificate with its digital signature to peer B. Peer B generates another hash and uses the public key to decrypt the digital signature. Peer B compares the decrypted digest with the digest it generated to verify that that the source of the message is peer A. RSA is typically used as the digital signature algorithm.
  • External Authentication: This adds another layer of protection by authenticating the actual users. An external server, such as a Kerberos server or AD server is used to authenticate the user by their user ID and password. It is used in addition to one of the other authentication methods.

IPsec has two main protocols: Authentication Header (AH) and Encapsulating Security Payload (ESP). The IPsec peers determine which protocol they will use to encode the data packets in Phase 2 of the IKE negotiations. The selected protocol then uses the algorithms and authentication method defined in the IPsec SA to encode the data packets.

AH provides authentication and integrity protection through a keyed hash algorithm, as described in Verifies Packet Integrity. ESP encrypts IP packets as described in Ensures Confidentiality. The earlier version of ESP did not provide authentication and integrity protection, so most IPsec implementations used AH and ESP. But since the current version of ESP can also use a keyed hash algorithm to verify the authenticity and integrity of packets, most IPsec implementations use ESP, but not necessarily AH.

ESP can operate in either of two modes: transport mode or tunnel mode. See image.

A diagram showing how packets are encoded in ESP Transport Mode and ESP Tunnel Mode.

As shown in the diagram, ESP adds a header, a trailer, and if authentication is used, an authentication section at the end. The ESP header contains an SPI (Security Parameter Index) value, which is a unique identifier, and a sequence number. The ESP trailer contains fields such as additional bytes for padding and the padding length.

In transport mode, ESP encrypts the data payload and ESP trailer. It uses the original IP header with the original source and destination IP addresses. In implementations that involve communications from or to a gateway, the source and/or destination IP addresses need to be changed to the gateway IP addresses. Since transport mode does not alter the IP header, this mode is used specifically for host-to-host communications.

In tunnel mode, ESP encapsulates the entire packet, including the original IP header. It adds a new IP header that lists the IPsec peers as the source and destination of the packet. ESP tunnel mode is used in VPNs that include at least one gateway, because the gateway address can be specified as the source and/or destination in the new IP header.

IKE (Internet Key Exchange) provides a secure way to establish the IPsec services that the peers use to protect their communications. IKE has two phases:

  • In the first phase, the peers negotiate the parameters for a secure communication channel through which they negotiate the parameters for the second phase. This first set of parameters is referred to as the IKE SA. This SA is bi-directional, so only one SA is established for both directions of traffic.
  • In the second phase, the peers negotiate the parameters for the actual exchange of IP packets. The second set of parameters is referred to as the IPsec SA. The IPsec SA is uni-directional, therefore one SA is established for each connection.

During the IKE negotiations, the peers agree on the Diffie-Hellman group number that they use to generate the shared key. Diffie-Hellman is a method for peers to generate a shared key in a secure manner, without having to exchange shared secrets in the first place. Diffie-Hellman specifies group numbers that correspond to a key length and an encryption generator type. For more information on Diffie-Hellman, refer to RFC 2631, Diffie-Hellman Key Agreement Method.

IKE Phase 1 can operate in either main mode or aggressive mode. In main mode, there are three pairs of message exchanges, and in aggressive mode, there are three messages.

A network diagram of main mode in IKE Phase 1.

  • In the first message, peer A sends the security parameters, its Diffie-Hellman key, a pseudo-random number and its IKE identity to peer B.
  • In the second message, peer B confirms the security parameters, sends its Diffie-Hellman key, a pseudo-random number, its IKE identity and authentication parameters.
  • In the third message, peer A sends its authentication parameters.

Aggressive Mode is useful when the IP address of the remote device is not known beforehand.

For Phase 1, Zscaler supports AES-128 for the encryption algorithm and SHA-1 or MD5 for the authentication algorithm. Zscaler recommends using AES-128 with SHA-1.

About NAT-T

NAT-T provides a method for passing IPsec traffic between two peers when one peer is behind a NAT device. When NAT-T is enabled, the peers detect if there is a NAT device between them and verify that they both support NAT-T during the IKE phase one negotiations.

After both peers determine that they support NAT-T and it is required, they encapsulate the IPsec packets. See image. The new IP header retains the data from the original IPsec packet, except that the protocol changes to UDP. In the UDP header, the source port is set to 500 and the destination port is that of the IPsec peer. Therefore the NAT device processes the encapsulated packet as a UDP packet. The IPsec peer then removes the UDP header and processes the packets as an IPsec packet. For additional information on NAT-T, refer to RFC 3947, Negotiation of NAT-Traversal in the IKE.

A diagram showing how IPSec packets are encapsulated in NAT-T.

Dead Peer Detection

Dead peer detection is a method that is used to detect if an IKE peer is offline. When this method is used, the peers do not periodically exchange keep alive messages. Instead, a peer requests proof that the other peer is online only when it needs to send traffic. Dead peer detection decreases the number of messages needed to determine if a peer is alive. Each peer defines its own dead peer detection interval, which is implementation specific. For more information, refer to RFC 3706, A Traffic-Based Method of Detecting Dead Internet Key Exchange (IKE) Peers.

A network diagram of aggressive mode in IKE Phase 1.

  • In the first pair of messages, the peers negotiate the following
    • The encryption algorithm
    • The keyed hash algorithm
    • The authentication method
    • The Diffie-Hellman group that the peers use to generate a shared key.
    • SA lifetime, which is the time period that an SA is valid. Peers must establish a new SA when it expires.
  • In the second pair of messages, the peers exchange the Diffie-Hellman keys.
  • In the third pair of messages, the two peers authenticate each other.

Because Main mode uses the IP address as part of the exchange for identification, it cannot be used in a configuration where the IP address of the peer may change.

IKE Phase 2 establishes an SA for each direction of traffic. It operates in only one modeQuick modewhich uses three messages. The negotiations in Phase 2 are protected by the IKE SA.

The Phase 2 negotiations are similar to those in Phase 1, wherein the peers negotiate security parameters that includes the encryption and keyed hashed algorithms, and authentication method. Additionally, in this phase, the peers negotiate the IPsec protocol to be applied to the IP packets. They determine whether to use AH, ESP and AH, or ESP. As stated earlier, most VPNs today use ESP.

After the IPsec SA is established, the peers then exchange the IP packets using the security parameters defined in the IPsec SA.

For Phase 2, Zscaler supports null or AES for the encryption algorithm and MD5 for the authentication algorithm. If you'd like to use AES, you must purchase a separate subscription. Zscaler recommends using null encryption with MD5.