Moje zdjęcie
Software Craftsman's Blog by Marcin Pieciukiewicz
Java and Scala development

Wednesday, July 17, 2013

Neat and simple way to measure code performance in Scala

When we write a source code we often have to be aware of the performace of our solution. The easiest way to achieve this is by measuring duration of the execution by taking the system time before and after our block code, ie.

Where is the problem?

val startTime = System.nanoTime
timeConsumingOperations()
val endTime = System.nanoTime

println("Operations took " + (endTime - startTime) + " nano seconds")
This is most basic approach that can be used in the simplest cases. But what if we want to measure times of multiple different operations? This simple solution would become a little complex and harder to maintain, ie.
val startTime = System.nanoTime
operation1()
val middleTime1 = System.nanoTime
operation2()
val middleTime2 = System.nanoTime
operation3()
val endTime = System.nanoTime

println("Operation1 took " + (middleTime1 - startTime) + " ns")
println("Operation2 took " + (middleTime2 - middleTime1) + " ns")
println("Operation3 took " + (endTime - middleTime2) + " ns")
That code is becoming hard to read, maintain and, in my opinion the worst, it almost hides our business code.
Now let's compilcate it even more by assuming that we need to run those operations multiple times (ie. in a for loop) and we need to measure their entire time of execution. We could try to do something like that:
var operation1Duration = 0L
var operation2Duration = 0L
var operation3Duration = 0L

for(i <- 1 to 10) {
  val startTime = System.nanoTime
  operation1()
  val middleTime1 = System.nanoTime
  operation2()
  val middleTime2 = System.nanoTime
  operation3()
  val endTime = System.nanoTime

  operation1Duration += middleTime1 - startTime
  operation2Duration += middleTime2 - middleTime1
  operation3Duration += endTime - middleTime2
}
println("Operation1 took " + operation1Duration + " ns")
println("Operation2 took " + operation2Duration + " ns")
println("Operation3 took " + operation3Duration + " ns")
And that is an ugly piece of code.

Solution Proposal

I propose a simple solution to that problem, a simple stopwatch. With this stopwatch we can refactor our third case into this:
clearStopwatchResults()

for(i <- 1 to 10) {
  stopwatch("First operation") {
    operation1()
  }
  stopwatch("Second operation") {
    operation2()
  }
  stopwatch("Third operation") {
    operation3()
  }
}

println(stopwatchResults)
And the result of this measurement would be something like this:
Third operation -> 48 ms 355 us 124 ns 
First operation -> 136 ms 219 us 210 ns 
Second operation -> 644 ms 69 us 657 ns
Basic idea for that approach is to create a Stopwatch that can measure execution time of a block of code and store it under given identifier (ie. "First operation"). It will accumulate a time of execution for given identifier, and in the end it will write out the measurement results in a readable format.

Solution Implementation

So here is the implementation of my Stopwatch:
import collection.mutable
import scala.StringBuilder

class Stopwatch {

  private val measurements = mutable.HashMap[Any, Long]()

  def clearResults() {
    measurements.clear()
  }

  def apply(identifier:Any)(codeBlock: => Unit) {
    val start = System.nanoTime
    codeBlock
    val end = System.nanoTime
    val oldValue = measurements.getOrElse(identifier, 0L)
    measurements += identifier -> (oldValue + end - start)    
  }

  def results():String = {
    ResultsFormatter.format(measurements)
  }
}
And there is also a companion object to provide default Stopwatch and convenient methods to use it.
object Stopwatch {
  private val defaultStopwatch = new Stopwatch

  def stopwatch(identifier:Any)(codeBlock: => Unit) {
    defaultStopwatch(identifier)(codeBlock)
  }

  def stopwatchResults() = defaultStopwatch.results()

  def clearStopwatchResults() {
    defaultStopwatch.clearResults()
  }
}
To complete the solution we also need a class to format our measurements results into well formatted String:
object ResultsFormatter {
  def format(measurements: mutable.HashMap[Any, Long]):String = {
    val sb = new StringBuilder
    measurements.foreach(appendResult(sb))
    sb.toString()
  }

  private def appendResult(sb: StringBuilder)(result: (Any, Long)) {
    val identifier = result._1
    val value = result._2
    sb.append(identifier).append(" -> ")

    appendIfNotZero(sb, extractSeconds(value), "s")
    appendIfNotZero(sb, extractMills(value), "ms")
    appendIfNotZero(sb, extractMicro(value), "us")
    appendIfNotZero(sb, extractNano(value), "ns")

    sb.append('\n')
  }

  private def extractNano(value: Long): Long = value % 1000
  private def extractMicro(value: Long) = (value / 1000) % 1000
  private def extractMills(value: Long) = (value / 1000000) % 1000
  private def extractSeconds(value: Long) = value / 1000000000

  private def appendIfNotZero(sb:StringBuilder, value:Long, suffix:String) {
    if (value > 0) {
      sb.append(value).append(" ").append(suffix).append(" ")
    }
  }
}
I have to mention that this is not thread safe implementation and one instance of Stopwatch should be use within single thread.

Implementation description


Stopwatch class
The main point of the implementation is a Map called measurements. It is a mutable HashMap, for higher performance and it is used to map identifier into execution duration.

The Stopwatch class provides three public methods:
- apply(identifier:Any)(codeBlock: => Unit) - This method takes an identifier and a code block that will be measured. It simply calls the codeBlock, calculates how long it took to execute it and in the end it accumulates the result in the measurements map.
I think the method declaration requires a little more explanation. First it uses a feature called currying. For our purpose we can assume that it allows us to define function parameters in multiple parentheses pairs (of course currying is much more and it can be used for other purposes, but this is topic for other aricle) instead of one. Also codeBlock: => Unit part defines a function that takes no parameters, or just a code block surrounded by curly brackets { ... }.
Thanks to that we could call apply method in this way (full syntax):
apply("some computation")({complexComputation()})
or we can omit latter parentheses and write simply:
apply("some computation"){complexComputation()}
We wouldn't be able to do this, in such a nice way, if we would use only one parameters block, ie. apply(identifier:Any, codeBlock: => Unit)
- clearResults() - This method simply clears the measurements map to allow new measurements to be performed.
- result() - Returns a well formatted String representation of measurements result.

As you can see, the core imlpementation of this solution is very clear and simple. Also this is very easy to add new functionalities to this class.


Stopwatch companion object
This companion object is used to increase the ease of using Stopwatch class. It provides default instance of Stopwatch and provides a convenient methods to use it. To use those methods we have to import this object into current namespace by:
import Stopwatch._
This import statement will include all methods from Stopwatch object into current namespace, so we can call it directly, without passing Stopwatch name.


ResultsFormatter class
This class is simpy used to create a well formatted String from Map containing measurement results.

Conclusion

I think that this approach to code performance measurement is well suited for many use cases, when you don't want to use external libraries for microbenchmarking or use of external profilers. Scala is very flexible language that allows us to create simple and very elegant solution for that kind of problems.
Of course my Stopwatch is very simple, and I can imagine many ways to improve it. In example it can be rebuilt to be thread safe, or internal map can be replaced by simple array to increase its performance. But I also think that this is a good point to start for most of the projects.

Next article: Metaprogramming in Scala with higher order functions
Previous article: Scala - Parentheses and Curly Brackets in Anonymous Functions

34 comments:

  1. Thank you for sharing such a nice post!

    Looking for Software Training in Bangalore , learn from Softgen Infotech Software Courses on online training and classroom training. Join today!

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. Good Post! it was so good to read and useful to improve my knowledge as an updated one, keep blogging. After seeing your article I want to say that also a well-written article with some very good information which is very useful for the readers....thanks for sharing it and do share more posts like this.
    AWS Training

    ReplyDelete
  5. Good Post! , it was so good to read and useful to improve my knowledge as an updated one, keep blogging. After seeing your article I want to say that also a well-written article with some very good information which is very useful for the AWS Cloud Practitioner Online Training

    ReplyDelete
  6. Therefore my own preferred teaching technique is to provide employees with a MP3 player with a structured learning programme pre-loaded on it. Salesforce training in Hyderabad

    ReplyDelete
  7. Wow! Such an amazing and helpful post this is. I really really love it. I hope that you continue to do your work like this in the future also.

    Online Training for Big Data
    Big Data Hadoop Online Training

    ReplyDelete
  8. It’s great to come across a blog every once in a while that isn’t the same out of date rehashed material. Fantastic read. Get for more information

    Java Training in Chennai

    Java Course in Chennai

    ReplyDelete
  9. Thanks for the post. It was very interesting and meaningful. I really appreciate it! Keep updating stuff like this.
    Best Training Institute for AWS in Pune

    ReplyDelete
  10. Data analyst generally works on creation of reports based on company’s data driven KPI’s(generally involves descriptive analytics), whereas Data scientists understand business and domain along with the technicalities to understand what will happen in future(more on descriptive + predictive analytics both)
    Etlhive is a data science institute in pune. actuelly we wanted to promote our website on your site will you please contact me discus further details

    website: - www.etlhive.com
    contact: - +91 8055020011

    ReplyDelete
  11. December brings with it a special mood permeating the air around us. Yes! The world gears up to celebrate Christmas and the New Year that .

    New Year Quotes for Husband

    ReplyDelete