Welcome to Hyperscan!

Welcome to Hyperscan!

Welcome to the Hyperscan blog! We’re happy to be able to share the work that we’re doing on Hyperscan with the world – as of October 19, 2015 Hyperscan 4.0 becomes an Open Source product released under the 3-clause BSD license. Hyperscan is a software-based regular expression matching library, supporting large-scale, high-performance, streaming regular expression matching on Intel Architecture.

Hyperscan has seen success in the Networks IPS/IDS (Intrusion Prevention/Detection System), application identification, and Next Generation Firewall spaces but can be used anywhere that requires Hyperscan’s strengths:

  1. Multiple pattern matching: Hyperscan can match from 1 to tens of thousands of regular expressions or fixed strings. Performance scales with pattern set size and pattern complexity.
  2. Streaming pattern matching: Hyperscan can be used in streaming mode, where patterns can be matched across any number of ‘stream writes’ with any amount of data, while holding only a fixed amount of stream state data. The amount of stream state data is dependent on the patterns but is fixed at pattern compile time.
  3. High-performance: generally Hyperscan is faster than alternative systems.

These strengths come with some gotchas, and some applications won’t find Hyperscan suitable right away (or perhaps at all):

  1. If you only want to use your pattern once on a small amount of data, the cost of the Hyperscan build might be too expensive to justify using Hyperscan.
  2. If you need features (e.g. sub-expression capture) or pattern constructs (e.g. back-references or arbitrary look around asserts) you may not be able to use Hyperscan (although we do have a pre-filter mode that can optimize some cases).

In the near future, we will be discussing in detail our implementation strategies in this blog. For now, regular expression aficionados should consider Hyperscan an “automata-based” regular expression matcher similar to Thompson’s NFA-based matcher, the tre project and Google’s RE2 (along with many others) as opposed to a “backtracking” matcher such as libpcre. Our implementation creates multiple communicating automata engines, including literal match, NFA, DFA and custom engines.

We use libpcre as a source of syntax and semantics, although our semantics as an “all matches” automata-based matcher mandates some differences to libpcre.

We’re releasing the full source code for the Hyperscan library with Hyperscan 4.0. We hope that some of our internal testing and QA tools can be released with subsequent releases – watch this space. We will maintain our traditional release cadence (around a release per quarter) with tested and supported releases and make finer-grain updates to the community source code repository as work is completed here at Intel.

We’re looking forward to sharing our knowledge about implementation of regular expressions and tuning for Intel Architecture, working with the community to both improve Hyperscan, and helping with integration of Hyperscan into other projects.

See you on the mailing list!

Geoff Langdale, Principal Engineer, Intel Corporation