Friday, July 20, 2012

Why I like Scala

In my past few blog posts (actually, almost all of my blog posts), I've been trying to present ways of implementing stuff built into Scala, using old-school Java syntax. I should probably explain some reasons why I think it makes more sense to just make the jump to Scala.

Type inference

Whenever you declare an object variable in Java, you tend to need to specify the type twice: once when declaring the compile-time type for the variable (the thing that goes to the left of the variable name) and once when specifying the object constructor (the thing that goes after new). Java has added some inference of generic type parameters in 1.7, so you can now write Map<String, Integer> myMap = new HashMap<>, but it's still a fair bit of code noise compared to Scala's var myMap = new HashMap[String, Int](). You still need to specify types on method/constructor parameters, of course, and you can explicitly specify types for variables when you want (e.g. when you're making a mistake and the type inferencer is being too helpful).

Some people unfamiliar with Scala see the type inference and assume that Scala is a dynamically typed language, since the conciseness of variable declarations looks more like Ruby, Python, or Javascript. Scala is statically typed, though. The compiler just looks at the type of the expression on the right side of the =, and maps the type to the left side (and will generate a compiler error right there if you've explicitly specified a type on the left side that is not satisfied by the type of the expression on the right). In my opinion, having the short, unobtrusive code typically associated with dynamic languages, but still keeping the compile-time type-safety (and IDE autocompletion) of a static language, is awesome.

Each line in this toy example that involves a variable declaration is shorter in Scala than in Java (even ignoring the semicolon inference). Plus, as described in the next section, to make it more fair, the first and fourth line in the Java example should have final prepended.

Immutability

While I've shown in a few posts that it's possible to implement immutable data structures in Java (and you can use the Functional Java library), it's not the same as having core library support. The Java Collections API has mutability built in from the ground up. Sure, you can use Collections.unmodifiable{Set,List,Map,Collection} to produce immutable collections, but then you're left with runtime exceptions if you do try modifying their contents, and no way to create a new collection based on a previous one (besides an O(n) copy to create a new collection, versus the O(1) prepend/tail of an immutable list or the O(log n) insert/delete of an immutable trie map/set). In my humble opinion, to push the unmodifiability constraint to compile time, it would have been ideal to split the collections interfaces, adding e.g. ReadableCollection, ReadableList, ReadableIterator, etc. making the existing interfaces extend the new "read-only" interfaces, adding the modification methods. That's not how they did it, though.

Furthermore, while in Java, it's a good idea to minimize mutability (it's even Item 15 in Effective Java), I personally spend a decent amount of time and screen real estate writing final everywhere. (It's largely become habit for me, so I'm not wasting brain cycles when typing final, but it's confusing for my colleagues who aren't familiar with seeing the final keyword.)

In Scala, immutability is the default for collections, and it takes the same number of characters to create an immutable variable as it does a mutable one: it's var for mutable and val for immutable. That's it.

Uniform access principle

Most Java (or C++) developers are taught early on to hide data in private fields, and provide public accessors and mutators. This has the advantage that you can someday add validation on the mutators, log use of the accessors, delegate to some proxy object, etc. Unfortunately, 90+% of the time, you're just providing get/set access to a field. Yes, for that 10%, the flexibility is crucial, but it leads to a ridiculous amount of code noise (made less painful for the initial developer by the fact that the "fix" has been to add "Generate getter/setter methods" functionality to IDEs -- unfortunately, that code noise is still present for anyone maintaining the codebase).

Languages like Ruby allow you to first write public fields, and later drop in custom accessors or mutators, without changing the calling code. The earliest language that I know of to provide this functionality was Delphi, with its notion of properties (which were later included in the initial version of C#, since Anders Hejlsberg designed both). In general, this ends up being a form of the uniform access principle.

Scala provides support for the uniform access principle with a simple, built-in compiler trick. Effectively, whenever you create a field that is visible outside the current object, the Scala compiler outputs bytecode for the field and accessor/mutator methods (excluding the mutator method if the field is a val).

Consider the following Scala code:

Compiling that and running it through javap -c -private UAPExample1, we get the following output:

public class UAP.UAPExample1 extends java.lang.Object implements scala.ScalaObject{
private java.lang.String name;

public java.lang.String name();
  Code:
   0: aload_0
   1: getfield #11; //Field name:Ljava/lang/String;
   4: areturn

public void name_$eq(java.lang.String);
  Code:
   0: aload_0
   1: aload_1
   2: putfield #11; //Field name:Ljava/lang/String;
   5: return

public UAP.UAPExample1();
  Code:
   0: aload_0
   1: invokespecial #19; //Method java/lang/Object."":()V
   4: aload_0
   5: ldc #21; //String Michael
   7: putfield #11; //Field name:Ljava/lang/String;
   10: return

}

Our public, mutable name field was pushed into a private field called name, and methods name() and name_$eq were created to provide an accessor and mutator. We can produce (more or less) the same bytecode, following the traditional Java bean pattern, as follows:

The only difference from the previous javap output is that the private field is called _name instead of name. Of course, this code is also doing the exact same thing as UAPExample1. Let's consider a more interesting case, where the accessor/mutator are doing something more complicated, like providing access to a value stored on disk:

This example is functionally equivalent to the first two, but keeps the values on disk. In particular, the following code exercises all three. The output from the first two examples is always "Michael" followed by "Blah". The third example will output "Michael" followed by "Blah" on the first run, but will output "Blah" followed by "Blah" on subsequent runs in the same working directory (since the previously set "Blah" value is still accessible):

Traits

Traits sit somewhere between Java's interfaces and abstract classes. In particular, traits can define methods, like abstract classes, but cannot take constructor parameters. Like interfaces, a class can implement multiple traits. Since you can define behaviour in traits, this gives you a form of mixin inheritance.

Since the JVM itself does not support multiple inheritance, this mixin behaviour is accomplished by linearizing the class hierarchy. That is, you can imagine that a linear class hierarchy has been created where each class has one parent (with one fewer trait mixed in). The implementation at the bytecode level doesn't exactly work that way, from what I can in the output from javap -- rather one class is created with delegates to static methods for the implemented trait methods (passing this to the static method).

Let's consider an example using traits. I'm going to create a base CSSStyle trait, which will have subtraits for various CSS attributes. Then, we'll have two classes -- one representing an HTML element with inline style, and the other representing a CSS style rule (as you would find in a stylesheet file or within a style element). We'll add CSS attributes to each of these classes by mixing them in to new anonymous classes.

The output from running RunCSSStyle.main is:

<span style="font-weight: bold; font-style: italic; font-family: Helvetica, Arial, sans-serif"></span>
.important {
    font-weight: bold;
    font-style: italic;
    font-family: Helvetica, Arial, sans-serif
}

More refined scoping

In my example on the uniform access principle, I made use of private[this], and also stuck some imports within a method declaration. Both of these are examples of Scala's tighter compile-time scoping.

Using private[this] specifies that the method or field can only be accessed by the current object -- other instances of the same class cannot access it. (If I had simply specified private, the compiler would have generated an accessor/mutator pair for use by other instances of the object, to be consistent with the UAP.)

Scoping imports within a method (or even a nested scope), allows you to clearly specify where you're using something. Compare that to the typical Java source file, where the IDE collapses your mountain of imports, so you're not overwhelmed.

Similarly, there are times in Java where you want to invoke the same chunk of code multiple times within a method (but not in a loop), or you just want to break a method apart into logical chunks that solve independent problems, specific to the current method. So, you create a private method scoped to the class, even though the new method is only used in that one statement. Scala, on the other hand, supports nested methods. Once it's compiled, the nested method is pushed up to the class scope (since the JVM doesn't support nested methods), but your code is organized in a cleaner way, and the compiler will prevent you from using the method outside of its scope.

Incidentally, Scala also supports import aliasing, for when you're going to be using conflicting imports within the same scope. I believe this was mostly done to avoid developer anger when trying to bridge between (e.g.) a java.util.List and a scala.collections.immutable.List. Having to fully qualify one or the other would bulk up your code. Instead, you can write:

import java.util.{List => JavaList}

Then, every time you say JavaList, the compiler knows that you mean java.util.List. Lacking this in Java bugs me regularly, since my job involves working with Lucene documents (org.apache.lucene.document.Document) and occasionally with XML DOM documents (org.w3c.dom.Document). On those rare occasions where I would like to use both in one file, only can can have the name Document. The other must be fully qualified. It would be so much nicer if I could import them as LuceneDocument and DOMDocument, regardless of what the third parties decided to call them.

Off-hand, I'm not sure why locally-scoped imports and import aliases haven't made it into Java yet. Looking at the output of javap, it looks like the bytecode doesn't actually contain imports. Instead, the bytecode references fully-qualified classes (that is, the imports are resolved statically). As a result, adding support for locally-scoped imports and import aliases would not break bytecode compatibility, and would give Java developers another (albeit minor) reason to like the language.

In my opinion, Scala's scoping rules are the logical next step to the scoping transition from C to C++, where C required that you list all of your local variables at the beginning of a function (and any for (i = 0; i < ...; i++) within the function was using the same i), but C++ allows you to declare variables in the scope in which they would be used (but ultimately compiles to pretty much the same machine code).

Functional Programming

Obviously, one of the main benefits of Scala is functional programming. I made use of the map method on the Map type in the trait example to transform the key/value entries into a collection of strings, without needing to write a for loop. Note that I mapped a static method directly (technically, methods on a Scala object aren't really static -- they're methods on a singleton), without needing to create an anonymous Function instance, as I've done in my previous posts in Java. (Behind the scenes, the Scala compiler did create an anonymous Function1[Tuple2[String,String], String] for me, but I don't have to care about that.)

I won't bother writing a bunch of examples of functional programming in Scala, since they would largely come down to showing how the stuff I implemented (verbosely) in Java in my previous posts already exists in Scala, and has much nicer syntax. For example, I believe all of the immutable data structures I've implemented in Java exist in some form in Scala (and have a much better, more unified API). Assuming I stick with writing Scala in future posts, there will be more functional programming examples.

Conclusions

These are a few of the features I like in Scala that I miss in my day-to-day work programming in Java. I have barely touched the surface of what's available in Scala (having ignored pattern matching, case classes, the REPL, implicits, parallel collections, and more). I chose these features to begin with, since they're not that foreign to Java developers. Indeed, just about everything here could be done in Java, but with a lot more code.

Learning Scala has reinforced my love of the JVM as a platform and my disdain for Java as a language (and the Java standard library -- I have the utmost respect for Joshua Bloch as the author of Effective Java and Java Puzzlers, but the Java collections API angers me regularly in terms of things I believe they could have done better, while retaining backwards-compatibility). That said, I am also hoping that that the .Net port of Scala matures, as a lot of code tends to target the compiler and the standard library, and should be portable across underlying runtimes.

If you're interested in learning more about Scala, I suggest reading Scala for Java Refugees (which is probably a little outdated now -- two major versions of Scala have been released since then) or the free chapters of Scala for the Impatient. There is a "Java to Scala" page on the scala-lang website with more resources for moving to Scala.

In September, there will be a free course given by Martin Odersky (the creator of Scala, and the guy who introduced generics to Java). I'm signed up and would love more classmates.

In terms of books, my favourite is Programming in Scala (I bought it, downloaded the .mobi version, and converted it to ePub using Calibre). A less-good, but still enjoyable book is Dean Wampler's Programming Scala (also a little outdated, but still relevant). As a followup book (once you're pretty comfortable with Scala), I recommend Josh Suereth's Scala in Depth. The preprint PDF I read had a lot of grammar/spelling/typographical mistakes, but the content was solid and understandable (and hopefully the mistakes will get fixed in a subsequent printing).

Finally, if you're a little warped in the head (like me) and you develop a taste for the hardcore functional programming features that Scala enables (and have been implemented in the Scalaz library), I suggest you learn some Haskell (via Real World Haskell or Learn You a Haskell for Great Good -- both great books, which I encourage you to buy, but which you can also read online). While Scala is an object-oriented/functional hybrid language (by design), where you can always fall back to mutable data structures and side-effects, Haskell feels (to me at least, with my predominantly imperative, object-oriented background) more like playing the piano with one hand tied behind your back. That said, I feel that programming in Haskell makes me a better Scala programmer (much as switching between C++ and Java tends to make me better at both than if I stuck with one or the other).