Skip to content

Parser combinators FTW

by Frank Sauer on August 31st, 2014

In my previous post I introduced the AlertRule case class as one of the domain classes of our performance monitoring application. An AlertRule captures under which conditions to fire alerts when metrics violate a given threshold. In this post I will describe how instances of AlertRule get created by parsing an external DSL with a parser built with Scala’s parser combinators.

Here is a slightly modified version of the AlertRule case class I defined in the previous post:

case class AlertRule(name:String, matcher: Regex, 
                     trigger: (Number,Number)=>Boolean, 
                     threshold: Number, msg: String, 
                     duration: Option[FiniteDuration])

To make the parser a bit more interesting, I have added an optional duration property. The goal is to not immediately fire an alert when a matching metric triggers the trigger function, but instead continue to monitor this metric and if it keeps triggering the trigger function for the amount of time specified in duration, then we’ll fire the alert. If at any time during the specified duration the metric stops triggering the trigger, we forget the pending alert and start over. In my next post I will describe an Akka actor that enforces these rules by using Akka schedulers to handle this timing aspect.

Let’s use an example rule to see what a DSL for alert rules like this might look like:

queueSize: 
when /messaging\.queues\..*\.spooled/ > 5000
for 30 seconds
alert "Queue $2 too large. $value messages spooled. Stalled consumer?"

Assuming we collect metrics with a ‘messaging.queues.queuename.spooled’ path name, this rule says that whenever we see a spooled metric on any queue with a value larger than 5000 for at least 30 seconds (without dropping under the 5000 threshold during that time) then fire an alert with the given message. The alert message will undergo some string interpolation to replace $0..$9 with the corresponding part of the metrics path and $value with the actual value of the metric causing the alert. How to do this is left as an exercise for the reader.

At this point I would offer the DSL grammar using some kind of grammar notation like EBNF, but parser implementations using Scala’s parser combinators look so much like such a grammar that I’ll skip this step. Just like in a formal grammar definition language, the idea of parser combinators is that larger constructs can be defined in terms of smaller ones using sequences, choices, optional terms and repetition of terms. The parser combinators correspond exactly to these grammatical constructions, as we will see below.

Scala comes with a few predefined parser traits that can be extended by your own parsers in order to obtain parsers for common language constructs. The JavaTokenParsers trait contains parsers for identifiers, numbers and string literals ready to be used by our DSL parser. Let’s jump right into the deep end of the pool and look at the entire DSL parser. It will turn out to be surprisingly small…

Following the code I’ll explain some of this in more detail, but it helps to know the notation of a few parser combinators up front:

Combinator Description
X ~ Y sequential combinator. Parser will be successful if first X is successful and then Y is successful as well
X ~> Y same as above but drop the result of X. Keep only the Y parser after X parses successfully
X | Y choice combinator. Parser will succeed if either X succeeds or Y does
X >> Y into combinator. It passes the result of parsing X into the function that defines the parser for Y
X.? optional combinator. parsing succeeds if X is absent or parses successfully
X.* expect 0 or more repetitions of X
X ^^ {function} If X parses successfully transform the result using the function
X ^^^ {value} change a successful result into the given value
object AlertRuleParser extends JavaTokenParsers {
  
  override protected val whiteSpace = """(\s|#.*)+""".r

  def duration: Parser[FiniteDuration] =
    "for" ~> wholeNumber ~ timeUnit ^^ {
      case value ~ units => new FiniteDuration(value.toInt, units)
  }

  def timeUnit: Parser[TimeUnit] = (
       "s(econd(s)?)?".r ^^^ {TimeUnit.SECONDS}
    |  "m(inute(s)?)?".r ^^^ {TimeUnit.MINUTES}
    |  "h(our(s)?)?".r ^^^ {TimeUnit.HOURS}
  )

  def op: Parser[(Number,Number)=>Boolean] = (
       "<" ^^^ {(x:Number,y:Number) => x.doubleValue < y.doubleValue}
    | "<=" ^^^ {(x:Number,y:Number) => x.doubleValue <= y.doubleValue}
    | "==" ^^^ {(x:Number,y:Number) => x.doubleValue == y.doubleValue}
    | ">=" ^^^ {(x:Number,y:Number) => x.doubleValue >= y.doubleValue}
    |  ">" ^^^ {(x:Number,y:Number) => x.doubleValue > y.doubleValue}
  )

  def regex: Parser[Regex] = "/" ~> "[^/]*".r <~ "/" ^^ {_.r}

  def ruleHeader: Parser[String] = ident <~ ":"

  def ruleBody(name:String): Parser[AlertRule] =
    "when" ~> regex ~ op ~ floatingPointNumber ~
    duration.? ~
    ("alert" ~> stringLiteral) ^^ {
      case expr ~ trigger ~ threshold ~ duration ~ alert => 
           AlertRule(name,expr,trigger,threshold.toDouble, duration, alert)
  }

  def alert: Parser[AlertRule] = ruleHeader >> ruleBody

  def alerts:Parser[List[AlertRule]] =alert.*

  def apply(rules: String):ParseResult[List[AlertRule]] = parseAll(alerts, rules)

}

And there we have it, the entire DSL parser in 42 lines of code. Let’s go through this rule by rule.

  • The definition of whitespace is overridden on line 3 to allow the use of # as a one-line comment in our rule DSL
  • A valid duration clause (lines 5-14) is the keyword “for” followed by a whole number and a time unit which is either s, second, or seconds, or the equivalent terms for minutes and hours. Notice how the “for” keyword is dropped by using the ~> combinator. By doing so we won’t have to repeat it in the partial function of our transformation following the ^^.Also notice (lines 11-13) how each time unit alternate uses the ^^^ combinator to return an actual instance of TimeUnit. This allows us to very easily create the appropriate FiniteDuration object in the final transformation function which is a partial function matching the number and units to use in the construction of a FiniteDuration. It is common to see the use of partial functions in the ^^ transformers and they use the same parser combinators as the grammar itself, which is a bit confusing at first. It is important to understand that inside the ^^ transformations, we combine the results of the parsers (as opposed to the parsers themselves) and bind them to the given names. Here, value will be the actual result of the wholeNumber parser (30 in our example rule) and units will be the actual TimeUnit object (TimeUnit.SECOND in our example) parsed by the timeUnit parser.
  • On lines 16-22 we see how the comparison operators translate to the actual predicates on two numbers used by the rule to decide if an alert is required or not by comparing the metric value against the threshold using the specified comparator. Parser results can be functions! How cool is that?
  • A regex (line 24) is simply everything between two subsequent ‘/’. The resulting String gets transformed to a regular expression by the transformation function following ^^.
  • To keep the alert rule (line 36) small I split it into a ruleHeader (line 26) and a ruleBody (lines 28-34). (The parentheses can get a bit tricky if you try to use ~> multiple times to eliminate keyword boilerplate.) The ruleHeader simply parses into the name for the alert rule and the ruleBody transforms into the actual AlertRule object. However since the AlertRule constructor requires the name of the rule, how do we get that from the ruleHeader into the ruleBody? This is where the into (>>) combinator comes in. By writing ruleHeader >> ruleBody and defining the ruleBody function with a String parameter (the result of the ruleHeader parser is a String) we get to use the result of the ruleHeader in our ruleBody when we construct the final AlertRule object. Here we see the use of a partial function again matching the exact terms of the grammar minus the parts we left off by using ~>. Note that the AlertRule case class is defined with an Option[FiniteDuration] which is exactly the result of applying the optional combinator (?) to the duration parser.
  • Finally, a parser that can parse a sequence of alert rules is simply defined as alert.* (line 38)
  • The main entry point into our alert rule parser is the apply method (line 40). It invokes the parseAll function inherited indirectly via JavaTokenParsers to parse the given text with the alerts parser. The result is a ParseResult, which in case of success is of type Success[List[AlertRule]]. If parsing fails the result will be an Error or Failure containing an error message. Both can be captured using NoSuccess

Let’s illustrate what happens when we deliberately introduce an error into our rules (days is not a valid time unit):

  def main(args: Array[String]) {

     val text =
       """
         |# rule to catch large queues
         |largeQueue:
         |when /messaging\.queues\..*\.spooled/ > 25000
         |for 30 days
         |alert "Queue $2 too deep. $value messages spooled. Stalled consumer?"
       """.stripMargin

    RuleParser(text) match {
      case Success(alerts,_) => println(s"Got alerts: $alerts")
      case NoSuccess(msg,remaining) => println(s"We have a problem: $msg at: ${remaining.pos}")
    }
  }

This will print We have a problem: string matching regex `h(our(s)?)?' expected but `d' found at: 5.8 The position is line.char. This error message is a bit strange in that it class to expect hours, whereas seconds or minutes would also be acceptable. Getting good quality error messages out of parsers is beyond the scope of this post (translate: I haven’t figured this out yet :-)

If we change 30 days to 10 minutes and run the parser again, we see Got alerts: List(AlertRule(largeQueue,messaging\.queues\..*\.spooled,,25000.0,Some(10 minutes),"Queue $2 too deep. $value messages spooled. Stalled consumer?"))

Thanks to parser combinators it is now a breeze to use external DSLs in our applications without requiring any extra tools (like ANTLR) other than our trusted compiler and I will be making good use of this in the future!

From → programming, scala

Comments are closed.