[][src]Struct regex_automata::Regex

pub struct Regex<D> { /* fields omitted */ }

A regular expression that uses deterministic finite automata for fast searching.

A regular expression is comprised of two DFAs, a "forward" DFA and a "reverse" DFA. The forward DFA is responsible for detecting the end of a match while the reverse DFA is responsible for detecting the start of a match. Thus, in order to find the bounds of any given match, a forward search must first be run followed by a reverse search. A match found by the forward DFA guarantees that the reverse DFA will also find a match.

The type of the DFA used by a Regex corresponds to the D type parameter, which must satisfy the DFA trait. Typically, D is either a DenseDFA or a SparseDFA, where dense DFAs use more memory but search faster, while sparse DFAs use less memory but search more slowly.

When using this crate without the standard library, the Regex type has no default type parameter.

Sparse DFAs

Since a Regex is generic over the DFA trait, it can be used with any kind of DFA. While this crate constructs dense DFAs by default, it is easy enough to build corresponding sparse DFAs, and then build a regex from them:

use regex_automata::Regex;

// First, build a regex that uses dense DFAs.
let dense_re = Regex::new("foo[0-9]+")?;

// Second, build sparse DFAs from the forward and reverse dense DFAs.
let fwd = dense_re.forward().to_sparse()?;
let rev = dense_re.reverse().to_sparse()?;

// Third, build a new regex from the constituent sparse DFAs.
let sparse_re = Regex::from_dfas(fwd, rev);

// A regex that uses sparse DFAs can be used just like with dense DFAs.
assert_eq!(true, sparse_re.is_match(b"foo123"));

Implementations

impl<D: DFA> Regex<D>[src]

pub fn is_match(&self, input: &[u8]) -> bool[src]

Returns true if and only if the given bytes match.

This routine may short circuit if it knows that scanning future input will never lead to a different result. In particular, if the underlying DFA enters a match state or a dead state, then this routine will return true or false, respectively, without inspecting any future input.

Example

use regex_automata::Regex;

let re = Regex::new("foo[0-9]+bar")?;
assert_eq!(true, re.is_match(b"foo12345bar"));
assert_eq!(false, re.is_match(b"foobar"));

pub fn shortest_match(&self, input: &[u8]) -> Option<usize>[src]

Returns the first position at which a match is found.

This routine stops scanning input in precisely the same circumstances as is_match. The key difference is that this routine returns the position at which it stopped scanning input if and only if a match was found. If no match is found, then None is returned.

Example

use regex_automata::Regex;

let re = Regex::new("foo[0-9]+")?;
assert_eq!(Some(4), re.shortest_match(b"foo12345"));

// Normally, the end of the leftmost first match here would be 3,
// but the shortest match semantics detect a match earlier.
let re = Regex::new("abc|a")?;
assert_eq!(Some(1), re.shortest_match(b"abc"));

pub fn find(&self, input: &[u8]) -> Option<(usize, usize)>[src]

Returns the start and end offset of the leftmost first match. If no match exists, then None is returned.

The "leftmost first" match corresponds to the match with the smallest starting offset, but where the end offset is determined by preferring earlier branches in the original regular expression. For example, Sam|Samwise will match Sam in Samwise, but Samwise|Sam will match Samwise in Samwise.

Generally speaking, the "leftmost first" match is how most backtracking regular expressions tend to work. This is in contrast to POSIX-style regular expressions that yield "leftmost longest" matches. Namely, both Sam|Samwise and Samwise|Sam match Samwise when using leftmost longest semantics.

Example

use regex_automata::Regex;

let re = Regex::new("foo[0-9]+")?;
assert_eq!(Some((3, 11)), re.find(b"zzzfoo12345zzz"));

// Even though a match is found after reading the first byte (`a`),
// the leftmost first match semantics demand that we find the earliest
// match that prefers earlier parts of the pattern over latter parts.
let re = Regex::new("abc|a")?;
assert_eq!(Some((0, 3)), re.find(b"abc"));

pub fn is_match_at(&self, input: &[u8], start: usize) -> bool[src]

Returns the same as is_match, but starts the search at the given offset.

The significance of the starting point is that it takes the surrounding context into consideration. For example, if the DFA is anchored, then a match can only occur when start == 0.

pub fn shortest_match_at(&self, input: &[u8], start: usize) -> Option<usize>[src]

Returns the same as shortest_match, but starts the search at the given offset.

The significance of the starting point is that it takes the surrounding context into consideration. For example, if the DFA is anchored, then a match can only occur when start == 0.

pub fn find_at(&self, input: &[u8], start: usize) -> Option<(usize, usize)>[src]

Returns the same as find, but starts the search at the given offset.

The significance of the starting point is that it takes the surrounding context into consideration. For example, if the DFA is anchored, then a match can only occur when start == 0.

pub fn find_iter<'r, 't>(&'r self, input: &'t [u8]) -> Matches<'r, 't, D>[src]

Returns an iterator over all non-overlapping leftmost first matches in the given bytes. If no match exists, then the iterator yields no elements.

Note that if the regex can match the empty string, then it is possible for the iterator to yield a zero-width match at a location that is not a valid UTF-8 boundary (for example, between the code units of a UTF-8 encoded codepoint). This can happen regardless of whether allow_invalid_utf8 was enabled or not.

Example

use regex_automata::Regex;

let re = Regex::new("foo[0-9]+")?;
let text = b"foo1 foo12 foo123";
let matches: Vec<(usize, usize)> = re.find_iter(text).collect();
assert_eq!(matches, vec![(0, 4), (5, 10), (11, 17)]);

pub fn from_dfas(forward: D, reverse: D) -> Regex<D>[src]

Build a new regex from its constituent forward and reverse DFAs.

This is useful when deserializing a regex from some arbitrary memory region. This is also useful for building regexes from other types of DFAs.

Example

This example is a bit a contrived. The usual use of these methods would involve serializing initial_re somewhere and then deserializing it later to build a regex.

use regex_automata::Regex;

let initial_re = Regex::new("foo[0-9]+")?;
assert_eq!(true, initial_re.is_match(b"foo123"));

let (fwd, rev) = (initial_re.forward(), initial_re.reverse());
let re = Regex::from_dfas(fwd, rev);
assert_eq!(true, re.is_match(b"foo123"));

This example shows how you might build smaller DFAs, and then use those smaller DFAs to build a new regex.

use regex_automata::Regex;

let initial_re = Regex::new("foo[0-9]+")?;
assert_eq!(true, initial_re.is_match(b"foo123"));

let fwd = initial_re.forward().to_u16()?;
let rev = initial_re.reverse().to_u16()?;
let re = Regex::from_dfas(fwd, rev);
assert_eq!(true, re.is_match(b"foo123"));

This example shows how to build a Regex that uses sparse DFAs instead of dense DFAs:

use regex_automata::Regex;

let initial_re = Regex::new("foo[0-9]+")?;
assert_eq!(true, initial_re.is_match(b"foo123"));

let fwd = initial_re.forward().to_sparse()?;
let rev = initial_re.reverse().to_sparse()?;
let re = Regex::from_dfas(fwd, rev);
assert_eq!(true, re.is_match(b"foo123"));

pub fn forward(&self) -> &D[src]

Return the underlying DFA responsible for forward matching.

pub fn reverse(&self) -> &D[src]

Return the underlying DFA responsible for reverse matching.

Trait Implementations

impl<D: Clone> Clone for Regex<D>[src]

impl<D: Debug> Debug for Regex<D>[src]

Auto Trait Implementations

impl<D> RefUnwindSafe for Regex<D> where
    D: RefUnwindSafe

impl<D> Send for Regex<D> where
    D: Send

impl<D> Sync for Regex<D> where
    D: Sync

impl<D> Unpin for Regex<D> where
    D: Unpin

impl<D> UnwindSafe for Regex<D> where
    D: UnwindSafe

Blanket Implementations

impl<T> Any for T where
    T: 'static + ?Sized
[src]

impl<T> Borrow<T> for T where
    T: ?Sized
[src]

impl<T> BorrowMut<T> for T where
    T: ?Sized
[src]

impl<T> From<T> for T[src]

impl<T, U> Into<U> for T where
    U: From<T>, 
[src]

impl<T> ToOwned for T where
    T: Clone
[src]

type Owned = T

The resulting type after obtaining ownership.

impl<T, U> TryFrom<U> for T where
    U: Into<T>, 
[src]

type Error = Infallible

The type returned in the event of a conversion error.

impl<T, U> TryInto<U> for T where
    U: TryFrom<T>, 
[src]

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.