Skip to content

I love Scala!

by Frank Sauer on July 12th, 2014

I am currently writing a system to collect application performance metrics and visualize them in various dashboards. The UI part is taken care of by Grafana and I won’t dwell on that in this post. The data collectors are written in Scala using Akka. The systems being measured are hardware messaging appliances and various applications written in Java using a proprietary high performance platform for distributed applications, providing such features as guaranteed messaging, high availability through transactional event sourcing, etc.

Metrics being collected include queue sizes in the messaging appliances, inbound and outbound message rates of the applications, all the JVM stats like heap size, GC stats and CPU stats, as well as numerous other metrics related to the proprietary middleware. All these metrics are received via proprietary java libraries and processed with Actors.

We wanted to evaluate several time-series databases, and there is an amazing number of choices here – the most common choice being Graphite – but others include OpenTSDB, InfluxDB, ElasticSearch (with Kibana as the UI), etc.. Because we wanted to evaluate all of these options, I designed the system around the Akka event stream. The data collectors collect data using the java APIs and publish this data to the Akka event stream using case classes of an internal model for time-series data. For each of the candidate databases we have an actor subscribing to these messages to transform them to whatever format the database desires before shipping it off for storage.

This is where it got interesting. Due to the uncertainty around data stores and the wide variety of storage models for metrics, I needed a flexible internal model. Graphite wants very long dot-separated metric names and each series stores only one numeric value. InfluxDB is much more flexible, it can store multiple columns and these can be of various data types. It looks much more like a traditional relational database in that respect. I wanted my internal model to support both concepts.

The model

I won’t describe the entire model here, the part I want to focus on is the case class describing the source of the metrics being collected. This is the part of the model that in Graphite’s case results in the dot separated metric name, but in Influx’s case could result in a number of named columns being stored. Therefore, it is basically an ordered list of name-value pairs. The first part is just a fixed string, for example “system” or “application” or other very high level classifications of various kinds of metrics (This would be the table name when all keys are stored as columns). The next parts are all name value pairs for things like hostname, server name, thread name, queue name, etc.

Here is what it looks like:

case class Source(kind : String, keys: (String,String)*)(tags: (String,String)*)

The difference between keys and tags is that keys will be used to form graphite-style metrics names, whereas tags are simply additional string values that might be of interest (like the PID for a process) and could be stored by those databases that support it (like openTSDB and InfluxDB) but that do not really need to be part of the series name.

This all seemed to work nicely, but I quickly grew tired of typing long expressions like this all over the place:

val source = Source("appliance","host"->"applianceHost", "vpn"->"myVpnName", "type"->"queue")()

Out of the box, the Source case class does not support any way to add new elements to the keys, which would be handy so we could initially create a Source for a server, then pass it around to other parts of the code that could then add new key elements for whatever metric they are creating. In addition, it turns out that the server names being emitted by the applications follow a naming convention – “serverName@hostName”, which I could parse into separate keys for host and server…

Hmm, let’s see how we can modify the Source class to help us out… While we’re at it , let’s also add some code that formats a source into a graphite-style series name:

case class Source(kind : String, keys: (String,String)*)(tags: (String,String)*) {

   def + (tuple: Tuple2[String,String]):Source = 
         Source(kind, (keys.toArray ++ Array(tuple)):_*)(tags:_*)

   def asList = List(kind) ++ keys.unzip._2
   def seriesName(sep: String) = asList.mkString(sep)

The definition for + is a bit tricky, but only because the Source constructor takes a variable number of arguments for the keys. After we concatenate the extra tuple to the existing keys, we need to pass it into the Source constructor as a vararg. This is done by the funny :_* syntax.

The asList function collects the kinds followed by all the values of the keys. seriesName simply turns this list into a string with each element separated from the next by the given separator string.

Now we can do really cool stuff like this:

scala> val s = Source("system", "fooServer@barHost")
s: Source = (Source(system,WrappedArray((host,barHost), (server,fooServer))) 
scala> val s2 = s + ("a"->"b")
s2: Source = (Source(system,WrappedArray((host,barHost), (server,fooServer), (a,b)))
scala> s2.seriesName
res0: String = "barHost.fooServer.b"

Wait a minute, what happened there on the first line? That’s not a valid syntax to construct a Source! Or is it? And how did it magically separate the host and server parts? The magic here is a new apply function in the companion object for Source that parses the given server name and creates the Source using the parsed host and server pieces:

object Source {
   val ServerPattern = "([^@]*)@(.*)".r

   def apply(kind: String, serverName: String):Source = serverName match {
      case ServerPattern(server, host) => Source(kind, "host"->host, "server"->server)()
      case _ => Source(kind, "server"->serverName)()

I think the pattern matching on the regular expression is amazing. There is a good bit of compiler checking going on here as well; Your regular expression MUST define at least one group and the number of parameters specified in the case MUST match the number of groups and the compiler will flag an issue if you get this wrong! How cool is that?

I love scala!

From → programming, scala

Comments are closed.