Software Craftsman's Blog by Marcin Pieciukiewicz: July 2013

Tuesday, July 30, 2013

Faking System Clock with metaprogramming pattern in Scala

Tomasz Nurkiewicz has written an article Fake system clock pattern in Scala with implicit parameters on his blog. The problem he addresses is that sometimes you require to test your business code against a given date, ie. your system should behave differently on the weekends.

Obviously this will be impossible (or very hard) to handle if a programmers would directly use new Date(), System.curentTimeMills() or some similar construct provided by JVM. So Tomasz proposed an application wide Clock trait that is kind of proxy to Joda Time:

import org.joda.time.{DateTime, Instant}
 
trait Clock {
  def now(): Instant
  def dateNow(): DateTime
}

With the default implementation:

import org.joda.time.{Instant, DateTime}
 
object SystemClock extends Clock {
  def now() = Instant.now()
  def dateNow() = DateTime.now()
}

I like this approach very much, because it provides a clean way to grab the current date, and it decouples us from specific Time library (Joda Time in this case). It is quite obvious how programmer should grab the current system time:

val currentTime = SystemClock.now

Now let's back to initial problem, how to pass a specific time if we want to test our code? I propose to use the method I've described in Metaprogramming in Scala with higher order functions article.

First Step - Create companion object for clock
Let's create a companion object Clock that will provide, to a programmer, a kind of proxy to default clock implementation, so the programmer won't have to use SystemClock directly:

object Clock {
  private val clockInstance: Clock = SystemClock
  def now() = clockInstance.now()
  def dateNow() = clockInstance.dateNow()
}

And now the usage of our Clock has been simplified to this:

val currentTime = Clock.now

Second step - Allow to change clock instance
To solve our problem of testing against specific time we would like to allow programmer to change the clock instance that is used in Clock companion object:

object Clock {
  private var clockInstance: Clock = SystemClock
  def now() = clockInstance.now()
  def dateNow() = clockInstance.dateNow()
 
  def changeClock(newClock:Clock) {
    clockInstance = newClock
  }
}

Now the programmer can switch the instance of Clock in the following way:

//alternative Clock implementation
class FakeClock(fixed: DateTime) extends Clock {
  def now() = fixed.toInstant
  def dateNow() = fixed
}

//change of clock that is used
Clock.changeClock(new FakeClock(new DateTime(2013, 7, 15, 0, 0, DateTimeZone.UTC)))
//code to be tested
...
val currentTime = Clock.now
...
//revert the clock change
Clock.changeClock(SystemClock)

As you can see we allowed a programmer to force the Clock object to return specific date instead of current system date. To use it in tests a programmer only have to call Clock.changeClock function before tested code, and a tested code will "see" that current time is different.

Final step - change Clock instance by declaration
In the final step we'll change our Clock.changeClock function to higher order function so it's usage will be much nicer and it will allow to change clock instance temporarily only.

object Clock {
  private var clockInstance: Clock = SystemClock
  def now() = clockInstance.now()
  def dateNow() = clockInstance.dateNow()
 
  def changeClock(newClock:Clock)(codeBlock: => Unit) {
  val oldClock = Clock.clockInstance
    clockInstance = newClock
    codeBlock
    clockInstance = oldClock
  }
}

So now to change a clock instance a programmer could write something like this:

import Clock.changeClock

// changing clock instance only for the  given block of code
changeClock(new FakeClock(new DateTime(2012, 2, 2, 0, 0, DateTimeZone.UTC))) {
  ...
  val currentTime = Clock.now
  ...
}

Usage example
This is a little more complicated usage example, but it's a nice way to demonstrate how easy the change of clock is now:

object ClockTest {

  import Clock.changeClock

  def testedMethod() {
    println(Clock.dateNow())
  }

  def main(args: Array[String]) {
    testedMethod()

    changeClock(new FakeClock(new DateTime(2013, 7, 15, 0, 0, DateTimeZone.UTC))) {

      testedMethod()

        changeClock(new FakeClock(new DateTime(2012, 2, 2, 0, 0, DateTimeZone.UTC))) {
          testedMethod()
        }

      testedMethod()
    }

    testedMethod()
  }

}

And the output of this simple program would be this (let's assume that now is 30th of July 2013):

2013-07-30T19:02:38.501+02:00
2013-07-15T00:00:00.000Z
2012-02-02T00:00:00.000Z
2013-07-15T00:00:00.000Z
2013-07-30T19:02:38.594+02:00

As you can see this worked very well even for nested blocks of changeClock.

The complete implementation is available on gist.github.com.

Previous article: Metaprogramming in Scala with higher order functions

Sunday, July 21, 2013

Metaprogramming in Scala with higher order functions

Scala is very flexible programming language. In this article I would like to propose some usage pattern for higher order functions. This pattern could help us to create a convenient way to handle multiple situations such as:
- monitoring code performance
- exception logging
- executing commands in separate Thread
- transaction handling
- try with resource
- environment changing

Below I'm giving an usage examples and sample implementations for every of this cases.

Higher order function as meta-data
Basically higher order function is a function that takes another function as a parameter. In the following example calculate is a higher order function that takes function sum as parameter :

scala> def sum(a:Int, b:Int) = a + b
scala> def calculate(a:Int, b:Int, operation: (Int, Int) => Int) = operation(a, b)
scala> calculate(5, 3, sum)
res1: Int = 8

We will leverage the fact that we can pass a function that does not take any parameters, and does return nothing. That means that we'll simply pass a code block as a parameter to our function. This way we can do this:

//implementation
def doItTwice(codeBlock: => Unit) {
  codeBlock
  codeBlock
}
//usage with full syntax
doItTwice({
  println("Hello!")
})

and to do this even nicer we can skip parentheses in doItTwice call, so it will look this way:

//proper usage with abbreviated syntax
doItTwice {
  println("Hello!")
}

the result is simple to predict:

Hello!
Hello!

Doesn't it feel like we have extended language we are using? Now it's almost like adding simple metadata to our application.
This feature can be leveraged to improve our programs, but before that we have to use another Scala feature called currying. It allows us to define multiple parameters lists for a function. Thanks to that, we will be able to pass another parameters to our higher function. Will take a closer look at this by introducing new loop implementation.

Simple loop
Let's assume we want to define a simple way to loop our program for given number of times. And we want to do this as simple as possible. In example we would like to use it in this way:

//sample usage
repeat(5) {
  println("Cool loop")
}

Now we should be able to do this easily. This is the solution:

//implementation
def repeat(times: Int)(codeBlock: => Unit) {
  var i = 0
  while (i < times) {
    codeBlock
    i += 1
  }
}

Thanks to the currying we could have passed times parameter and still be able to use abbreviated syntax when passing a code block. This is very simple usage of this pattern, but it uses every feature of Scala we need. Let's go to more serious problem.

Execution time logging
Next case. Let's assume that we have some complex block of code, and we have to monitor it's execution time. So the nice way to do this would be like that:

//sample usage
logExecutionTime("My complex operations") {
  complexOperation1
  complexOperation2
  complexOperation3
}

with the result looking like this:

Execution time of "My complex operations" was 125 mills

To achieve this we can simply define logExecutionTime function this way:

//implementation
def logExecutionTime(name: String)(codeBlock: => Unit) {
  val start = System.currentTimeMillis()
  codeBlock
  val end = System.currentTimeMillis()
  println("Execution time of \""+ name +"\" was "+ (end - start)+" mills")
}

Btw. I have proposed a little more complex solution to measure time execution and described it in Neat and simple way to measure code performance in Scala article.

Exception logging
Now let's think about exceptions. There are many cases when you need just output the information about exception that occurred, but your application or maybe container of the application will handle that exception somewhere else. In the following example logExceptions will catch exception thrown from inner code block, write it out to a console, and than rethrow it, so the program can handle it:

//sample usage
logExceptions {
  operationThatMightCauseException
}

And this is simple implementation of it.

//sample usage
def logExceptions(codeBlock: => Unit) {
  try {
    codeBlock
  } catch {
    case e:Exception => {
      e.printStackTrace()
      throw e
    }
  }
}

I hope that now you've got feeling that this approach can be treated as metaprogramming, in this case logException can be treated as an annotation to a given code block. And, in my humble opinion, this is very nice. So let's go to further possible usages of this pattern.

Execute commands in separate Thread
So maybe we would like to execute some commands separately to our main program thread, ie. to send a message to external server. It have to be executed in separate thread, so it wouldn't block our application because of network delays. So that might be the way we would like to program it:

//possible usage
fireAndForget {
  sendMessageViaHttp
}

In the simplest solution we'll just create a new thread and run a given block of code in that thread:

//implementation
def fireAndForget(codeBlock: => Unit) {
  new Thread(new Runnable {
    def run() {
      codeBlock
    }
  }).start()
}

Transaction handling
Now let's think about a common pattern in enterprise application - a transaction handling in database communication. In simplest situation you will to start a transaction, then execute your database commands and in the end you will commit the transaction or rollback it if something went wrong. We would like to help a programmer to do this multiple of times, so it would be nice if he could simply declare a transaction around the database commands. In the example I've assumed that our database commection is represented by simple trait with three methods to control a transaction:

trait Connection {
  def beginTransaction()
  def commit()
  def rollback()
}

And now the programmer should be able to program like that:

//possible usage
transaction(postgresConnection) {
  insertRecords
  updateOtherRecords
}

The transaction is responsible for creating a transaction, executing commands and commiting or rollbacking transaction (on the exception occurrence). So it's implementation could look like that:

//implementation
def doInTransaction(connection: Connection)(codeBlock: => Unit) {
  connection.beginTransaction()
  try {
    codeBlock
    connection.commit()
  } catch {
    case e:Exception => {
      connection.rollback()
      throw e
    }
  }
}

Try with resource or simple resource handling, ie. file reading
Java 7 introduced try with resources statement that facilitated usage of external resources that have to be handled with care. In example they have to be properly closed to be reused. Before that resources handling was error prone because programmers often forgot to close the resource. So let's look how we could simplify it in Scala with the example of file reading. One of the easiest way to read file could be like that:

//possible usage
readFile("file.txt") {reader =>
  println("First line " + reader.readLine())
  println("Second line " + reader.readLine())
  println("Third line " + reader.readLine())
}

And the fileRead implementation should properly open a file, and at the end close it properly. That could be implemented in this way:

//implementation
def readFile(filename:String)(codeBlock: BufferedReader => Unit) {
  val reader = new BufferedReader(new FileReader((filename)))
  try {
    codeBlock(reader)
  } finally {
     if(reader != null) {
       reader.close()
     }
  }
}

Environment change
For the last example let's assume that we have a parts of application that depend on some specific environment configuration. What if we have to change it temporarily to do something specific, for example for test purposes.
So let's assume that our environment is accessible for us by the singleton object that holds String to String map:

object Environment {
  var environment = Map[String, String]()
}

And for our example let's assume that our environment contains an entry that holds ip address for some name server: "nameServer" -> "192.168.10.210". If we would like to alternate it temporarily to redirect request to localhost we would like to do something like that:

//possible usage
println(Environment.environment("nameServer"))

changeEnvironment("nameServer" -> "127.0.0.1") {
  println(Environment.environment("nameServer"))
}

println(Environment.environment("nameServer"))

and that should give us that output:

192.168.10.210
127.0.0.1
192.168.10.210

To achive this, one possible implementation of changeEnvironment could look like this:

//implementation
  def changeEnvironment(change: (String, String))(codeBlock: => Unit) {
    val oldEnvironment = Environment.environment
    Environment.environment = Environment.environment + change
    codeBlock
    Environment.environment = oldEnvironment
  }

The last example was inspired by Tomasz Nurkiewicz in his blog article Fake system clock pattern in Scala with implicit parameters. He proposed a solution for switching a Clock object, that is used within application, for the purpose of test that might require testing against given date. In the comment to that article I've proposed a different solution, than given by Tomasz, based on environment changing described above.

Conclusion
I hope I've given you an idea that Scala can be used for metaprogramming with very simple constructs. When we'll use it properly, it can easily separate a business logic of an application from technical details of the implementation and this will greatly improve the way the source code can be read and managed.

Previous article: Neat and simple way to measure code performance in Scala

Wednesday, July 17, 2013

Neat and simple way to measure code performance in Scala

When we write a source code we often have to be aware of the performace of our solution. The easiest way to achieve this is by measuring duration of the execution by taking the system time before and after our block code, ie.

Where is the problem?

val startTime = System.nanoTime
timeConsumingOperations()
val endTime = System.nanoTime

println("Operations took " + (endTime - startTime) + " nano seconds")

This is most basic approach that can be used in the simplest cases. But what if we want to measure times of multiple different operations? This simple solution would become a little complex and harder to maintain, ie.

val startTime = System.nanoTime
operation1()
val middleTime1 = System.nanoTime
operation2()
val middleTime2 = System.nanoTime
operation3()
val endTime = System.nanoTime

println("Operation1 took " + (middleTime1 - startTime) + " ns")
println("Operation2 took " + (middleTime2 - middleTime1) + " ns")
println("Operation3 took " + (endTime - middleTime2) + " ns")

That code is becoming hard to read, maintain and, in my opinion the worst, it almost hides our business code.
Now let's compilcate it even more by assuming that we need to run those operations multiple times (ie. in a for loop) and we need to measure their entire time of execution. We could try to do something like that:

var operation1Duration = 0L
var operation2Duration = 0L
var operation3Duration = 0L

for(i <- 1 to 10) {
  val startTime = System.nanoTime
  operation1()
  val middleTime1 = System.nanoTime
  operation2()
  val middleTime2 = System.nanoTime
  operation3()
  val endTime = System.nanoTime

  operation1Duration += middleTime1 - startTime
  operation2Duration += middleTime2 - middleTime1
  operation3Duration += endTime - middleTime2
}
println("Operation1 took " + operation1Duration + " ns")
println("Operation2 took " + operation2Duration + " ns")
println("Operation3 took " + operation3Duration + " ns")

And that is an ugly piece of code.

Solution Proposal

I propose a simple solution to that problem, a simple stopwatch. With this stopwatch we can refactor our third case into this:

clearStopwatchResults()

for(i <- 1 to 10) {
  stopwatch("First operation") {
    operation1()
  }
  stopwatch("Second operation") {
    operation2()
  }
  stopwatch("Third operation") {
    operation3()
  }
}

println(stopwatchResults)

And the result of this measurement would be something like this:

Third operation -> 48 ms 355 us 124 ns 
First operation -> 136 ms 219 us 210 ns 
Second operation -> 644 ms 69 us 657 ns

Basic idea for that approach is to create a Stopwatch that can measure execution time of a block of code and store it under given identifier (ie. "First operation"). It will accumulate a time of execution for given identifier, and in the end it will write out the measurement results in a readable format.

Solution Implementation

So here is the implementation of my Stopwatch:

import collection.mutable
import scala.StringBuilder

class Stopwatch {

  private val measurements = mutable.HashMap[Any, Long]()

  def clearResults() {
    measurements.clear()
  }

  def apply(identifier:Any)(codeBlock: => Unit) {
    val start = System.nanoTime
    codeBlock
    val end = System.nanoTime
    val oldValue = measurements.getOrElse(identifier, 0L)
    measurements += identifier -> (oldValue + end - start)    
  }

  def results():String = {
    ResultsFormatter.format(measurements)
  }
}

And there is also a companion object to provide default Stopwatch and convenient methods to use it.

object Stopwatch {
  private val defaultStopwatch = new Stopwatch

  def stopwatch(identifier:Any)(codeBlock: => Unit) {
    defaultStopwatch(identifier)(codeBlock)
  }

  def stopwatchResults() = defaultStopwatch.results()

  def clearStopwatchResults() {
    defaultStopwatch.clearResults()
  }
}

To complete the solution we also need a class to format our measurements results into well formatted String:

object ResultsFormatter {
  def format(measurements: mutable.HashMap[Any, Long]):String = {
    val sb = new StringBuilder
    measurements.foreach(appendResult(sb))
    sb.toString()
  }

  private def appendResult(sb: StringBuilder)(result: (Any, Long)) {
    val identifier = result._1
    val value = result._2
    sb.append(identifier).append(" -> ")

    appendIfNotZero(sb, extractSeconds(value), "s")
    appendIfNotZero(sb, extractMills(value), "ms")
    appendIfNotZero(sb, extractMicro(value), "us")
    appendIfNotZero(sb, extractNano(value), "ns")

    sb.append('\n')
  }

  private def extractNano(value: Long): Long = value % 1000
  private def extractMicro(value: Long) = (value / 1000) % 1000
  private def extractMills(value: Long) = (value / 1000000) % 1000
  private def extractSeconds(value: Long) = value / 1000000000

  private def appendIfNotZero(sb:StringBuilder, value:Long, suffix:String) {
    if (value > 0) {
      sb.append(value).append(" ").append(suffix).append(" ")
    }
  }
}

I have to mention that this is not thread safe implementation and one instance of Stopwatch should be use within single thread.

Implementation description

Stopwatch class
The main point of the implementation is a Map called measurements. It is a mutable HashMap, for higher performance and it is used to map identifier into execution duration.

The Stopwatch class provides three public methods:
- apply(identifier:Any)(codeBlock: => Unit) - This method takes an identifier and a code block that will be measured. It simply calls the codeBlock, calculates how long it took to execute it and in the end it accumulates the result in the measurements map.
I think the method declaration requires a little more explanation. First it uses a feature called currying. For our purpose we can assume that it allows us to define function parameters in multiple parentheses pairs (of course currying is much more and it can be used for other purposes, but this is topic for other aricle) instead of one. Also codeBlock: => Unit part defines a function that takes no parameters, or just a code block surrounded by curly brackets { ... }.
Thanks to that we could call apply method in this way (full syntax):

apply("some computation")({complexComputation()})

or we can omit latter parentheses and write simply:

apply("some computation"){complexComputation()}

We wouldn't be able to do this, in such a nice way, if we would use only one parameters block, ie. apply(identifier:Any, codeBlock: => Unit)
- clearResults() - This method simply clears the measurements map to allow new measurements to be performed.
- result() - Returns a well formatted String representation of measurements result.

As you can see, the core imlpementation of this solution is very clear and simple. Also this is very easy to add new functionalities to this class.

Stopwatch companion object
This companion object is used to increase the ease of using Stopwatch class. It provides default instance of Stopwatch and provides a convenient methods to use it. To use those methods we have to import this object into current namespace by:

import Stopwatch._

This import statement will include all methods from Stopwatch object into current namespace, so we can call it directly, without passing Stopwatch name.

ResultsFormatter class
This class is simpy used to create a well formatted String from Map containing measurement results.

Conclusion

I think that this approach to code performance measurement is well suited for many use cases, when you don't want to use external libraries for microbenchmarking or use of external profilers. Scala is very flexible language that allows us to create simple and very elegant solution for that kind of problems.
Of course my Stopwatch is very simple, and I can imagine many ways to improve it. In example it can be rebuilt to be thread safe, or internal map can be replaced by simple array to increase its performance. But I also think that this is a good point to start for most of the projects.

Next article: Metaprogramming in Scala with higher order functions
Previous article: Scala - Parentheses and Curly Brackets in Anonymous Functions

Monday, July 15, 2013

Scala - Parentheses and Curly Brackets in Anonymous Functions

Scala gives a lot of flexibility in defining anonymous functions, and that might lead to a confussion. Jacek Laskowski asked an interesting question on stackoverflow.com regarding of calling a map method on collection. To summarize it (given a list lst):

Why lst map {c:Char => 1} and lst map ((c:Char) => 1) work fine, but lst map (c:Char => 1) gives us compilation error:

error: identifier expected but integer literal found.
lst map (c:Char => 1)

To answer this question we should look into Scala Language Specification, into part 6.23 Anonymous Functions.There is a description how anonymous function can be defined:

Expr ::= (Bindings | [‘implicit’] id | ‘_’) ‘=>’ Expr
ResultExpr ::= (Bindings | ([‘implicit’] id | ‘_’) ‘:’ CompoundType) ‘=>’ Block
Bindings ::= ‘(’ Binding {‘,’ Binding} ‘)’
Binding ::= (id | ‘_’) [‘:’ Type]

As you can see Bindings requires to be placed inside two surrounding parentheses. So if anonymous function defines the type of the parameter, as in our example, it has to be defined in this way:

(c:Char) => 1

And the call to map should look like that:

lst map((c:Char) => 1)

Also in the same part of reference documentation we can find:

If an anonymous function (x: T) => e with a single typed parameter appears as the result expression of a block, it can be abbreviated to x: T => e.

So if anonymous function is defied as last expression in a code block, and has exacly one parameter, you can use abbreviated syntax, without parenthesis around c:Char, to define anonymous function. So, in our example, we can write c:Char => 1, but only when we place it inside a code block {c:Char => 1}. And we can call map function this way:

lst map({c:Char => 1})

or with abbreviated syntax without parenthesis:

lst map {c:Char => 1}

And that explains main question why lst map {c:Char => 1} is legal, and lst map (c:Char => 1) is not.

This is summarized in changelog at the end of the documentation (Changes in Version 2.1.7 (19-Jul-2006)):

Closure Syntax

The syntax of closures has been slightly restricted (§6.23). The form

x: T => E

is valid only when enclosed in braces, i.e. { x: T => E }. The following is illegal, because it might be read as the value x typed with the type T => E:

val f = x: T => E

Legal alternatives are:

val f = { x: T => E }
val f = (x: T) => E

Another way to specify anonymous functions:

If we look closer at specification we can see that it allows us to use another way to define anonymous function:

Expr ::= (Bindings | [‘implicit’] id | ‘_’) ‘=>’ Expr

We can see that we can define your anonymous function without parameter binding in those two ways (if we don't need to specify argument type):

lst map (c => 1)
// or
lst map (_ => 1)

I hope that this article clarified how we can declare anonymous functions and it shouldn't cause a confusion any more.

Next article: Neat and simple way to measure code performance in Scala
Previous article: Named and Default parameters inconsistency in Scala

Monday, July 8, 2013

Named and Default parameters inconsistency in Scala

While I was reading Scala in Depth I've run into an interesting problem related to named method parameters and class inheritance. What will happen, when in child class we'll override method and we will change parameters names in this method? How will it change the way that this method should be called? And what about default parameters, will those behave simillary to named parameters? It turns out that they will not behave the same. For this reason inheritance of default parameters may result in difficult to detect errors in your application behavior.

I will start by summarizing ways that Scala gives us to pass parameters to method calls. How we can use named and default parameters which were introduced in Scala 2.8.

Positional parameters (likewise in Java).
Usually parameters are passed to function by passing multiple parameters in the order they have been declared in a function declaration. ie.:

scala> def process(x1:Int, x2:Int) = x1 * 2 + x2

scala> process(2, 5)
res0: Int = 9

Named parameters.
If we need to change the order of parameters in function call, or we want to improve readability of function call, or we want to pass only some of the parameters (if function allows that), we can use named parameters. ie.:

scala> def process(x1:Int, x2:Int) = x1 * 2 + x2

// passing parameters without change in ordering
scala> process(x1 = 2, x2 = 5) 
res1: Int = 9 

// passing parameters with change in ordering
scala> process(x2 = 5, x1 = 2) 
res2: Int = 9

It is also possible to call a function by combining positional parameters with named parameters. But in this case it is required that named parameters (if we change ordering of passed parameters) should be passed after positional parameters. ie.:

// some parameters with name
scala> process(2, x2 = 5) 
res3: Int = 9 

// first parameter named, BUT ordering was not changed
scala> process(x1 = 2, 5) 
res4: Int = 9 

// first parameter named, AND ordering was changed
scala> process(x2 = 5, 2) 
<console>:9: error: positional after named argument.
              process(x2 = 5, 2)
                              ^

Default parameters.
If we want allow a user of our function, not to pass all of the parameters, we can define, in a function declaration, a value that will be assigned to a parameter, if that parameter won't be passed in a function call. ie.:

scala> def sum(x1:Int, x2:Int, x3:Int = 0) = x1 + x2 + x3 
 
scala> sum(1, 2, 3)
res5: Int = 6 
 
scala> sum(1, 2)
res6: Int = 3

It is worth to notice, that default parameters don't have to be declared at the end of the parameters list. In that case the only way to call a function without passing theirs value is to use named parameters:

scala> def sum(x1:Int, x2:Int = 0, x3:Int) = x1 + x2 + x3 
 
// only to parameters have been passed
scala> sum(1, 3) 
<console>:9: error: not enough arguments for method sum: (x1: Int, x2: Int, x3: Int)Int.
Unspecified value parameter x3.
              sum(1, 3) 
 
// defined parameters were passed by name
scala> sum(x1 = 1, x3 = 3) 
res7: Int = 4

Named parameters and class inheritance.
Let's check how named parameters behave in the case of class inheritance and method overwriting. Let us look onto those example classes:

class Calculator {
  def process(x1:Int, x2:Int) = x1 * 2 + x2
}

class BrokenCalculator extends Calculator {
  override def process(a:Int, b:Int) = super.process(a, b) + 1
}

It's important to notice that parameter names of method process has been channged from x1 and x2 to a and b.
In the next step let's create instances of those classes, but in the case of BrokenCalculator instances we will assign them to references of different types:

val calculator = new Calculator
val brokenCalculator = new BrokenCalculator
val brokenCalculatorAsCalculator:Calculator = new BrokenCalculator

Does assigning of BrokenCalculator to reference of type Calculator have an influance on method behavior? It turns out that it depends on the way how the parameters are passed:

scala> calculator.process(2, 5)
res8: Int = 9
scala> brokenCalculator.process(2, 5)
res9: Int = 10
scala> brokenCalculatorAsCalculator.process(2, 5)
res10: Int = 10

In the case when we passed parameters as positional parameters all of the calls depends on the class of the instatiated object, therefore brokenCalculator and brokenCalculatorAsCalculator have behaved exacly the same.
What if we want to use named parameters in method call?

scala> calculator.process(x2 = 5, x1 = 2)
res11: Int = 9 
 
// parameter passed with the names from Calculator class
scala> brokenCalculator.sum(x2 = 5, x1 = 2)
<console>:11: error: value sum is not a member of BrokenCalculator
              brokenCalculator.sum(x2 = 5, x1 = 2)
                                       ^ 
 
scala> brokenCalculator.process(b = 5, a = 2)
res12: Int = 10 
 
// parameter passed with the names from BrokenCalculator class
scala> brokenCalculatorAsCalculator.sum(b = 5, a = 2)
<console>:11: error: value sum is not a member of Calculator
              brokenCalculatorAsCalculator.sum(b = 5, a = 2)
                                           ^ 
 
scala> brokenCalculatorAsCalculator.process(x2 = 5, x1 = 2)
res13: Int = 10

In this case, as you can see, it is important what reference type we are using. In our example objects calculator and brokenCalculatorAsCalculator required parameter names x1 and x2. When we've tried to pass the names a and b we've got compilation error. Analogous in the case of brokenCalculator object we had to use names a and b, and when we've tried to use x1 and x2 we've also got compilation error.

I think that for proper use of named parameters we have to assume that method's contract (the way how method can be called) is determined by the type of reference we use, and this allows compiler to verify the correctness of a method call.

Default parameters and class inheritance.
Let's now change our classes, by defining a default parameter values. In the case of BrokenCalculator we'll change default value of x2 parameter:

class Calculator {
  def process(x1:Int, x2:Int = 0) = x1 * 2 + x2
}

class BrokenCalculator extends Calculator {
  override def process(x1:Int, x2:Int = 1) = super.process(x1, x2)
}

Now let's check how method behavior depends on object type and reference type that it is assigned to:

scala> calculator.process(2)
res14: Int = 4 // default value 0
scala> brokenCalculator.process(2)
res15: Int = 5 // default value 1
scala> brokenCalculatorAsCalculator.process(2)
res16: Int = 5 // default value 1

Those results are quite surprising (!), it seems that if a child class changes the default value of a parameter than default values of parameters are indendent of the reference type. During the call of brokenCalculatorAsCalculator parameter x2 had been assigned default value of 1, although the contract from Calculator class defined a default value of 0. I think that this might cause problems, especially because IDE (in my case IntelliJ IDEA) shows to a developer that default value of x2 in the call of brokenCalculatorAsCalculator is 0!

One more case left. What if BrokenCalculator class won't declare any default value for x2 parameter?

class BrokenCalculator extends Calculator {
  override def process(x1:Int, x2:Int) = super.process(x1, x2)
}

Can we now call the process method and pass it only 1 parameter?

scala> calculator.process(2)
res17: Int = 4
scala> brokenCalculator.process(2)
res18: Int = 4
scala> brokenCalculatorAsCalculator.process(2)
res19: Int = 4

Yes we can! In this case BrokenCalculator has inherited default value of x2 parameter, so if child class doesn't define default parameter value, this value is inherited from parent class.

By this example you can see huge inconsistency in Scala design. What if some programmer will use the Calculator reference? By looking at method defined in Calculator class he cannot know the default parameter value! He has to look into concrete implementation of object he will have. But this is not always possible.

So you have to remember that child class always inherits default values of parameters and it can always change them. And when you call the method you need to know the default values defined in concrete class, not only in reference type you are using.

Next article: Scala - Parentheses and Curly Brackets in Anonymous Functions
Previous article: List sorting in Scala

Monday, July 1, 2013

List sorting in Scala

While I was watching Jacek Laskowski (polish blog) working on taskassin (also in polish) and Scala programming I decided to summarize different approches to List sorting in Scala.
In Jacek's problem we have to sort a list of tasks that are defined this way: (I simplified the class for the purpose of this article):

case class Task(label: String, due: DateTime)

and we have defined list of objects:

val tasks = List(Task("t1", DateTime.now),
                 Task("t2", DateTime.now + 2.hours), 
                 Task("t3", DateTime.now - 2.hours))

Clarification: DateTime and .hours are from nscala-time library that encapsulates JodaTime so it can be easily used in Scala.

On a first glance we can see, that List class have three sorting methods:
.sorted - to sort objects with defined Ordered trait
.sortBy - to sort objects by selected field
.sortWith - to sort objects by given ordering function
Which one we will choose should depend on the business context of the application. Let's look at some situations, in which we might need to sort tasks list, that might occur:

Case 1. If sorting of tasks will always be done based on the value of due field, we can use object oriented programming and extend Task class with Ordered[Task] trait:

case class Task(label: String, due: DateTime) extends Ordered[Task] {
  def compare(other: Task): Int = this.due.compareTo(other.due)
}

That way, the sorting of tasks list could be done this way:

val sortedTasks = tasks.sorted

It's hard to imagine something easier from the programmer's, that uses the Task class, point of view.

Case 2.1. The different approach have to be made if we'll assume that tasks can be sorted in multiple ways, ie. by due time, or by a label. In this case we can pass a parameter to sorted method, that is an instantiation of Ordering[Task] with definition of the sorting order:

 val sortedTasks = tasks.sorted(new Ordering[Task] {
      def compare(a: Task, b: Task): Int = a.due.compareTo(b.due)
    })

also, you can use an easier way to create Ordering object:

val sortedTasks = tasks.sorted(Ordering.fromLessThan[Task](_.due < _.due))

Case 2.2. Foregoing approach might seem not very elegant, but you can use it to create convenient way to sort tasks. To do so we can define ways of sorting in a companion object using Ordering[Task]:

case class Task(label: String, due: DateTime)

object Task {
  //full syntax of defining anonymous function
  val orderingByDue = Ordering.fromLessThan((a:Task, b:Task) => a.due < b.due)
  //short syntax of defining anonymous function
  val orderingByLabel = Ordering.fromLessThan[Task](_.label < _.label) 
}

In this case you can sort tasks this way:

val tasksSortedByDue = tasks.sorted(Task.orderingByDue)
val tasksSortedByLabel = tasks.sorted(Task.orderingByLabel)

Case 3. If, for some reason, we don't want to force programmer to use our way of sortnig, the sortBy method might be used. It takes, as a parameter, a field of the object that will be used to determine object sorting (by passing a getter function for that field):

val sortedTasks = tasks.sortBy((t:Task) => t.due)

Or you can use shorter and more readable syntax:

val sortedTasks = tasks.sortBy(_.due)

Additionally you can define more than one field that will be used for sorting (ie. sorting persons first by family name and, if those are the same, by the first name). You do that by passing a tuple of fields you are interested in:

val sortedTasks = tasks.sortBy((x) => (x.label, x.due))

In this case tasks will be sorted by name and, if some of them have the same name, by the due time.

Case 4.1. If solution from Case 3. is not enought for you (ie. sorting is more complicated), you can use sortBy method, that requires a function that compares two objects:

val sortedTasks = tasks.sortWith((a:Task, b:Task) => a.due < b.due)
val sortedTasks = tasks.sortWith(_.due < _.due)

Case 4.2. It is important to notice, that in this case (alike in case 2.2.) it is possible to define those functions as an elements of a companion object:

object Task {
  val compareByDue = (a:Task, b:Task) => a.due < b.due
}

And it can be used this way:

val sortedTasks = tasks.sortWith(Task.compareByDue)

Sorting algorithms. While if was looking closely to the sorting in Scala, I've started to wonder about used sorting algoriths.

Sorting of sequences (including List) by usage of sorted, sortBy or sortWith methods is made by the sorting algorithm provided by JDK, in particular by java.util.Arrays.sort(...) function. Scala rewrites the sequence to Array and it is passed to the sort function, than it is sorted in place. Than based on the sorted array a new sequence is created (based on a given sequence type, ie. List).

Sorting method provided by JDK, used to sort array of objects, uses a modified mergesort algorithm, that guarantees sorting stability, and complexity O(n) = n log(n). Additionally, if input array is partially sorted, a performance of that algorithm is much greater.

It is important to notice that java.util.Arrays.sort(...) function, in case of primitives' array, uses an optimized quicksort algorithm (as opposed to objects' array, because in case of primitives, it doesn't have to be stable), but to use it in Scala you have to call it manually.

Summary. My personal favor approaches to list sorting is presented in Case 1. and Case 2.2. bacause these allow us to define a business context, so the programmer won't have to discover it over and over (of course that will have more sense in more compilcated cases, than presented in taskassin). Additionally these approaches allow us to define properties of Task class, no it's behavior.
I know that solutions presented by me don't exhaust the topic of sorting in Scala, but I think that those are the most usefull ones.

Next article: Named and Default parameters inconsistency in Scala