This post is a detailed study on Azure WAF is optimized by Hyperscan. it is jointly authored by: Microsoft: Peng Cheng, Ze Gan, Eric Schwabe, Bobby Martin, Osama Mazahir, Teresa Yao; Intel: Xiang Wang, Heqing Zhu, DiGiglio John;
Azure Web Application Firewall (WAF) protects web applications from denial of service and targeted application attacks such as SQL injection, cross site scripting, and file inclusion attacks. WAF often inspects a request against thousands of attack signatures, it is of critical importance to have a high performing WAF engine, minimizing any latency impact.
To meet the challenges, Intel Hyperscan was selected as the state-of-the-art pattern matching algorithm to deliver uncompromised security. Intel is pleased to work with Azure to deliver more secure and faster cloud/edge service.
Testing by the Azure team revealed that using Intel Hyperscan resulted in a WAF latency reduction of about 14.9x for high processing scenarios. The Intel Hyperscan latency improvements are achievable with only a slight increase in memory consumption. In the worst case, it consumed up to 40MByte, which is not a significant increase for multi-tenants WAF running a server platform.
Azure Web Application Firewall
Security is a top priority when moving web applications onto the cloud. Web Application Firewall (WAF) provides centralized protection of web applications from common exploits and vulnerabilities. Web applications are increasingly targeted by malicious attacks that exploit well-known vulnerabilities, such as SQL injection and cross-site scripting, which are among the most common attacks.
Azure WAF can be deployed globally with Azure Front Door (AFD) service with integrated application acceleration, content caching and global load balancing. AFD was originally built to address reliability, scalability, performance and security requirements for Microsoft’s largest online services Bing and Office. Today, AFD powers enterprise grade services for Microsoft as well as Azure customers, processing millions of requests per second. AFD is deployed to hundreds of Azure network edge locations, it leverages Azure global WAN, and optimizes end to end performance from users to services. The low latency is a crucial performance indicator for Azure cloud service globally.
Azure WAF can also be deployed into a customer’s virtual network with Application Gateway, provides dedicated protection for both public facing and internal applications from known web application attacks. The bleow figure provides a high-level overview of Azure WAF.
Figure 1. Azure WAF overview
A request reaches Azure WAF at edge location closest to end users via global anycast, or arrives at an Azure region for regional WAF, usually over HTTPS. The request is decrypted and run against WAF rules. Post WAF inspection, the request is either re-encrypted and forwarded to the backend services or rejected with a customizable “blocked” response based on the configured WAF action type.
WAF solution providers face increasing challenges. The list of critical vulnerabilities keeps growing, attack surfaces spread from web and mobile applications to IoT devices, and the attack landscape often includes multi-vector attacks combining DDoS, targeted web application attacks and malicious bots. It is critical that WAF can inspect an incoming request against hundreds to thousands of known signatures with minimal processing time.
A pattern matching algorithm plays a significant role in delivering uncompromised security and it must be extremely optimized, to not cause a big latency spike. Hyperscan was selected for its high performance and security advantages.
Hyperscan is a high-performance regular expression matching library from Intel, released as open source software under BSD-3 license and available at: https://www.hyperscan.io/. It supports most of the libpcre compatible syntax as a baseline, and it delivers an extensive set of features and orders of magnitude better performance than libpcre. <Ref 1>
Figure 2. Intel Hyperscan Overview
Hyperscan includes unique features such as multi-pattern matching and streaming mode to search across a stream of data blocks, which is useful for cross packet inspection in networking. The comprehensive optimizations on Intel processors’ instruction sets enable Hyperscan to achieve high matching performance that scales from Intel Atom processors to Intel Xeon processors. Hyperscan is a strong match for rule-based and compute intensive networking and security applications, such as WAF.
Figure 3. Hyperscan Workflow
Figure 3 describes the Hyperscan workflow, which can be divided into two phases: compile time and run-time.
Compile time: At compile time, Hyperscan takes regular expressions as input and generates a corresponding compiled pattern database. As shown in Figure 3, the Hyperscan compiler depends on a collection of inputs including ruleset, matching modes, and pattern flags to conduct complex pattern analyses and optimizations. In addition, the generated database can be further serialized to a file, so users can compile patterns once per hardware configuration instead of during service startup.
Run-time: By taking advantage of the precompiled regex patterns generated at compile time and a pre-allocated scratch space for bookkeeping intermediate matching states, users can trigger the Hyperscan scan function to match the input data blocks at run-time. All matches are delivered to the user application via a user-provided callback function that enables customized handling of matches.
Features: Hyperscan comprehends most of the libpcre syntax and has an additional set of pattern flags to define the valid ranges of matching lengths and offsets, the logical combination of rules, and other capabilities. It is operating system-agnostic and supports both virtualized and non-virtualized environments. Hyperscan supports cross-compilation between Intel processors, with specific optimizations for target CPU architecture.
Supporting Multi-pattern matching: Existing regex matching software libraries have limited support for multi-pattern matching, which matches the same input data as many times as stated by the rules. Instead of matching in a serialized way, Hyperscan supports matching multiple rules in parallel, spanning from one to tens of thousands of rules. After the user specifies a unique ID for each rule, Hyperscan can compile all the rules into a single database and return all the matched rule IDs during a single data scan.
Supporting Multiple operation modes: Hyperscan operates in three modes: block mode, streaming mode and vectored mode. In block mode, Hyperscan scans a single block of data and returns matches. Streaming mode is designed for cross-packet inspection, in which the data to be scanned is distributed to multiple packets. In streaming mode, Hyperscan saves the match state for the current packet and uses it as the initial match state when a new packet arrives. Streaming mode provides a simple way to scan a stream of data without buffering and rescanning packets or limiting scanning to a fixed window of historical data. Vectored mode performs scans in sequence on a set of data blocks that already exist but are not contiguous in memory.
High and scalable performance: Hyperscan is an automata-based (e.g. NFA/DFA) approach without backtracking. Backtracking needs to traverse all possible paths within automata and can trigger exponential matching time in terms of input length, which is the root cause of ReDOS. So Hyperscan guarantees no exponential scanning time and is immune to ReDOS attacks. Hyperscan complies with the Intel SSSE3 instruction set as a minimum requirement and leverages SIMD instructions to accelerate matching performance. The support of multi-pattern matching guarantees that the performance will scale as the number of rules increases. Furthermore, multiple CPU cores or multiple threads can share the same read-only database to facilitate scalable performance. As shown in Figure 4, Hyperscan demonstrates high and scalable throughput with an increasing number of cores and threads on Intel(R) Xeon(R) Platinum 8160 CPU. This test uses the Hyperscan official benchmarking tool – “hsbench”, as well as publicly available rulesets (Snort PCRE, Snort Literals and Teakettle) and corpora, which are available for download at https://01.org/downloads/sample-data-hyperscan-hsbench-performance-measurement.
Figure 4. Hyperscan throughput and core scalability
Azure WAF Evaluation
To compare the performance of PCRE, RE2, and Hyperscan in the same context, the Azure team built a measuring tool that was used to profile the performance of pcre_exec of PCRE, re2_exec of RE2, and hs_scan of Hyperscan with block mode. All patterns and corresponding subjects were respectively tested by these functions.
All tested regex patterns were from the Azure WAF rules, which are all supported by Hyperscan. The corpus of subjects consisted of two parts, representing both high processing tasks and regular tasks.
The metrics used for evaluating each regex engine were: average processing time measured by std::chrono::high_resolution_clock with measurement accuracy of minimum three hundred nanoseconds in the testbed and the peak memory usage to match each regex pattern. An average of 100 repeated experiments were performed for each indicator. However, some groups of data from RE2 were ignored because the behavior of RE2 is not comparable to PCRE behavior.
The following diagrams show scatter plots of the evaluated engines while performing heavy tasks, which shows the stability and the peak memory usage when engines encounter potential ReDOS attacks. The subjects of heavy tasks were generated by a regex reverse tool which generates complex subjects for specified patterns. For example, the reverse tool can generate the subject “aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!” for the pattern “^(([a-z])+.)+[A-Z]([a-z])+$” which is a potential ReDOS example provided by OWASP.
Another consideration is that Hyperscan compilation can take seconds, so it’s important to compile expressions before the match.
The following diagrams show scatter plots of the evaluated engines while processing regular tasks, which indicates the behavior of each engine with typical traffic levels. The subjects of regular tasks were taken from Bing’s actual traffic.
Security and Low Latency are important for cloud edge computing, Microsoft selected Intel Hyperscan for its high performance and security advantages. Hyperscan is not susceptible to ReDOS attacks, therefore, it is an excellent choice for performance sensitive scenarios, such as WAF. The increased memory requirement for Hyperscan is part of adopting a highly optimized regex engine, which outweighs the minor cost of additional memory.
Using Intel Hyperscan resulted in a WAF latency reduction about 14.9x for high processing heavy volume attack scenarios. Hyperscan improved the latency with only a slight increase in memory consumption. In the worst case, it consumes up to 40MByte, which is not significant for multi-tenants WAF running a server platform.
Intel Hyperscan is integrated into Azure WAF, makes the Azure service faster and more secure. Intel Hyperscan is a proven open source technology that is adopted at scale. It adds the windows support recently, Hyperscan is one of many technologies offered by Intel to optimize network workloads. Intel is highly committed to optimize network platforms with both hardware and software technologies, focusing on providing high performance, ease of use, and secure solutions with Intel Xeon Scalable Processor based server platforms.
NSDI’19: Hyperscan: A Fast Multi-pattern Regex Matcher for Modern CPUs, https://www.usenix.org/system/files/nsdi19-wang-xiang.pdf