Routinely we have to deal with recognizing patterns within text or byte streams. While LL(k) and LALR are common types of parsers, the nom crate brings parser combinators to the embedded Rust world.
Parser Combinators
As their name implies, parser combinators are things that can parse inputs, and can be combined to parse even more complex input. Similar to LL(k) type of parsing, small atoms are built up into larger, more capable parsing.
Why we parse
So far we've implemented two drivers for WiFi-offloading boards. Both of these boards used some variant of Hayes AT commands in order to communicate with your primary device.
For instance, when joining a WiFi access point, the eS-WiFi board might reply one of two ways:
If successful:
[JOIN ] myssid,192.168.2.18,0,0
OK
>
[JOIN ] myssid
[JOIN ] Failed
ERROR
>
For the device-driver to be able to turn that into an Ok
or an Err
, we have to pluck apart the bits.
The parser
To start, we notice that every response concludes with a >
on a new line.
That parser is pretty easy to write with nom
:
named!(
pub prompt,
tag!("> ")
);
This creates a function named prompt()
that can take a slice of [u8]
and attempt to match a >
followed by a space.
We notice in the successful case, like so many other commands, we might also need to match OK
on a regular basis.
Let's create a rule for that:
named!(
pub ok,
tag!("OK\r\n")
);
Additionally, any error response will include the word ERROR
, so let's make a rule there.
named!(
pub error,
tag!("ERROR\r\n")
);
Also pretty straight-forward.
In the successful case, there's more useful information, such as the assigned IP address that we might want to parse out of the response.
Ignoring the OK
and the >
prompt, we can write a pretty quick parser:
named!(
pub(crate) join<JoinResponse>,
do_parse!(
tag!("[JOIN ] ") >>
ssid: take_until!(",") >>
char!(',') >>
ip: take_until!(",") >>
char!(',') >>
tag!("0,0") >>
tag!("\r\n") >>
ok >>
(
JoinResponse::Ok
)
)
);
Notice we don't have to specifically match many bits.
We look for the [JOIN ]
blob, and then match everything until the first comma as the SSID.
We consume that comma, and everything until the next comma is the assigned IP address.
Then you notice, we are using the ok
parser we already created, to match the OK\r\n
.
We've chosen to not match the prompt just yet.
The other response, the error case, we can write a very loose parser to match:
named!(
pub(crate) join_error<JoinResponse>,
do_parse!(
take_until!( "ERROR" ) >>
error >>
(
JoinResponse::JoinError
)
)
);
Basically, we decide to ignore most everything, and match using our error
rule above.
Of course, that join_error
parse rule is pretty ambiguous, but we'll solve that shortly.
Combining parsers
With nom
you create your rules, and then you pick one, and attempt to parse a slice of bytes.
It either successfully parses, returning a result along with any remainder slice of unparsed bytes,
or it returns an error, indicating all bytes remain unparsed.
While we're combined the ok
and error
parsers into the join
and join_error
parsers,
we need some way to glue those two together as another higher-order parser.
Thankfully, nom
gives us the alt!(..)
macro to do just that.
As long as your rules return the same type of result (both are JoinResponse
in our case),
they can be alt!
'd together.
A first attempt might look like:
Warning, will not work as you might hope
named!(
pub(crate) join_response<JoinResponse>,
do_parse!(
tag!("\r\n") >>
response:
alt!(
join
| join_error
) >>
prompt >>
(
response
)
)
);
First, before we address why it doesn't work, let's deconstruct it a bit.
Every response starts with a \r\n
sequence, which we did not put in either rule, so we match it regardless on our aggregate rule.
Then we pick with join
or join_error
and assign it to the response
variable.
Then we finally match the always trailing prompt
and return the matches response.
Why doesn't this work?
Because nom
allows for streaming parsing, knowing that maybe it didn't completely parse something this time, but once you add a few more bytes, maybe it'll match next time.
Given the definition of join
, we match a given prefix up to a comma.
In the error case, the comma is not present.
You might think that'd force it to attempt the second option of join_error
which requires no comma, but nom is an optimist.
Parsing of join
did not fail, it just hadn't yet succeeded.
Given the way we use the parser, though, we know in our case there are no additional bytes to be expected.
What you've got is all there is, so let's let nom
know that by wrapping the rules with complete!(...)
.
This signals that a rule that has not yet succeeded should be considered a failure, and for nom to continue evaluating other alternatives.
named!(
pub(crate) join_response<JoinResponse>,
do_parse!(
tag!("\r\n") >>
response:
alt!(
complete!(join)
| complete!(join_error)
) >>
prompt >>
(
response
)
)
);
Now, all we have to do is use the parser, and we'll get back the result we're hoping for:
let parse_result = parser::join_response(&bytes);
match parse_result {
Ok((_remainder, response)) => {
match response {
JoinResponse::Ok => {
// yay!
}
JoinResponse::JoinError => {
// bad password or ssid or such
}
}
}
Err(_) => {
// something went woefully wrong during parsing
}
}
In this case, we know the slice we parsed was complete, so we can ignore the _remainder
since it should, in theory, be empty.
More complex
We've also published an uber-tiny drogue-nom-utils
crate to provide helper and utility parsers that we have used more than once.
Some of these adapters will transmit a response like the following to indicate a certain amount of data is available for a certain connection:
+IPD,<link_id>,<length>
The length portion is simply the characters that make up the length, such as 1024
; a 1
followed by a 0
followed by a 2
and a 4
.
So we need to parse that as an actual numeric type, not an array of 4 number-looking ASCII characters.
The conversion of ASCII to numbers is usually easy, unless you're no_std
so we have our own atoi_usize
to do that for usize
-sized numbers.
To make a parser combinator that can parse a sequence of ASCII digits and return a single usize
, we write a function in the style of nom
:
pub fn parse_usize(input: &[u8]) -> IResult<&[u8], usize> {
let (input, digits) = digit1(input)?;
let num = atoi_usize(digits).unwrap();
IResult::Ok((input, num))
}
Which can then be used like:
named!(
pub data_available<Response>,
do_parse!(
opt!( crlf ) >>
tag!( "+IPD,") >>
link_id: parse_usize >>
char!(',') >>
len: parse_usize >>
crlf >>
(
Response::DataAvailable {link_id, len }
)
)
);
One nice thing about nom
is that you can feed data forward within a do_parse!
, using the result of parse_usize
in another macro such as take!(len)
.
Combinators in the tin
While we write our own domain-specific combinators, nom
ships with a vast array of useful ones beyond the tag!
, alt!
and complete!
that we've seen.
Each of these macros can also be replaced with function calls if you have some fancy logic you want to involve in your parsing.
A short (and incomplete) list of the combinators we've found useful:
alt!
: Try a list of parsers and return the result of the first successful onechar!
: matches one character: `char!(char) => &u8 -> IResult<&u8, char>complete!
: replaces a Incomplete returned by the child parser with an Errordo_parse!
: applies sub parsers in a sequence. it can store intermediary results and make them available for later parsersopt!
: make the underlying parser optionaltag!
: declares a byte array as a suite to recognizetake!
: generates a parser consuming the specified number of bytes
Supports no_std
Nom works fantastically well, even in a no_std
environment.
You can't use the regexp combinators without std
, but even if you did use regexps, then you'd have two problems.