Earthly Powers

Friday, June 15, 2012

Modular services with OpenJDK Jigsaw

Services are a simple but effective way to decouple interface and implementation.

Services in the classpath universe

The class java.util.ServiceLoader was introduced in Java SE 6 and formalized a pattern that many developers were already implementing prior to SE 6 (especially for JSR implementations).

ServiceLoader provides a simple way to bind a Java interface, a service interface, to an instance of a Java class, a service provider class, that implements the interface. Such classes are declared in files located in the META-INF/services directory, where the file name is the fully qualified name of the service interface. The contents of a META-INF/services file contains one or more lines, each line of which declares the fully qualified class name of a service provider class implementing the service interface.

Using ServiceLoader one can lazily iterate over all service instances. All META-INF/services files visible to the class loader, that is used to get those the files (see ClassLoader.getResources), are parsed, service provider classes declared in those files are loaded, and then those classes are instantiated.

It's a neat little class that helps decouple interface and implementation.

That is not to say there are no problems with it.

The Java compiler knows nothing about META-INF/services files, so, unless an IDE groks those files and helps the developer, runtime errors can sneak in, for example if the service interface and/or a service provider class name does not exist, perhaps there was a spelling mistake, perhaps a class was renamed or moved to another package, or a service provider class does not implement the service interface.

A service provider class may have dependencies on third party libraries. The developer has to ensure the classpath is set up correctly, otherwise be prepared for errors such as NoClassDefFoundError.

Bundling multiple jar files into one uber jar may result in missing service provider classes if two or more jars contain the same META-INF/services files.

For the latter two issues maven is your friend. It can ensure the classpath is set up correctly. The maven shade plugin is smart enough to combine META-INF/services files.

Services in the modular universe

The Java multiverse is expanding with OpenJDK Jigsaw to embrace the modular universe.

Developers can choose to stay in the classpath universe or take a quantum leap into the modular universe, where the classpath no longer exists, and modularity become a first class citizen of the Java language, compilation and runtime.

It is important to stress that the JDK has always maintained backwards compatibility between releases, nothing is taken away from the developer, thus the classpath universe will not collapse into a singularity, no impending entropy death is to be anticipated.

In the modular universe less can be more, since the JDK itself is being modularized using the module system. Smaller JDK installations are possible (no need for CORBA?, fine with me!).

Services too become a first class citizen of the Java language, compilation and runtime. No more META-INF/services files!

Recently i have been delving into the design and implementation of services in Jigsaw. While there is an implementation in place it is recognized as a temporary solution to get something mostly working with the JDK usage of ServiceLoader.

The Jigsaw team have come up with an alternative cleaner approach that works well in modular universe. For more information see this email and corresponding presentation. However, in this blog i don't want to get into the details of that email and instead want to describe the basics of how to use services in the modular universe (the presentation is helpful in that respect and i will be reusing terms defined in that presentation).

I have have pushed a simple example to GitHub. If you have available a recent build of Jigsaw then it is possible to compile and execute this example. (See here for a recent developer preview or if you are on the Mac you can also select more recent builds created and uploaded by Henri Gomez.)

This example consists of four modules all within the same src directory:

A service interface module, mstringer, exporting a service interface for transforming strings
Two service provider modules, mhasher and mrotter, that declare service provider classes that produce an MD5 checksum and a ROT13 transformation of a string respectively; and
A service consume module, mapp, that creates service instances (using ServiceLoader) and transforms strings.

This example can be loaded in NetBeans. Expect to see some warnings and red squiggly lines since NetBeans currently understands nothing about Java modularity! However, the project is still editable and the targets in ant build8.xml file can be executed to build and run the project. Each module corresponds to a separate source package folder:

The above image presents a good example of modular source layout. Each module, in a directory name corresponding to the module name, contains a Java source file module-info.java, in the default package location, that is the module declaration, then there are Java source files in packages. Think of module source layout as a level of indirection of the source layout in the classpath universe. The javac compiler has been modified to recognize the modular source layout and is capable of compiling multiple modules under one source directory.

The module declaration for the service interface module mstringer is:

module mstringer@1.0 {
exports stringer;
}

This module declaration declares that all publicly accessible Java classes in the stringer package of module mstringer are visible and accessible to any code in a dependent module. There is only one Java class which is the service interface:

package stringer;

public interface StringTransformer {
String description();
String transform(String s);
}

Note that there is nothing specific to this module that says it has anything to do with services. Any non-final Java class can become a service interface.

The module declaration for the service provider module mrotter is:

module mrotter@1.0 {
requires mstringer;
provides service stringer.StringTransformer
with rotter.RotterStringTransformer;
}

This module declaration declares a dependency on the mstringer module since it's service provider classe will implement the service interface, stringer.StringTransformer.

Now we get into some service specifics. This module provides a service provider class rotter.RotterStringTransformer that implements the service interface stringer.StringTransformer. In effect the "provides <S> with <I>" is a very simple binding language: bind this interface to that implementation.

Note that the package rotter is never exported. This means that code in any other module cannot see and access the class RotterStringTransformer. In the modular universe service provider classes can remain private to the service provider module.

Also note that, although not present in this example, the module could require other modules that are needed for the implementation of the service interface.

The module declaration for the service provider module mhasher is very similar.

Finally the module declaration for the service consumer module mapp is:

module mapp@1.0 {
requires mstringer;
requires service stringer.StringTransformer;
class app.Main;
}

This module declaration declares a dependency on the mstringer module since it will use service instances that implement stringer.StringTransformer.

The module declares it requires the service interface with "requires <S>". This informs the module system to include any service provider modules that provide for the service interface, and most importantly any dependencies of those modules, in the dependency resolution and linking phases, which occur both at compile time and when modules are installed into a library.

The module also declares an entry point, a class with a static main(String[] args) method that is invoked when the module is executed.

Notice that service consumer module mapp knows nothing about the service provider modules mrotter and mhasher. They are decoupled. New services can be installed and old ones removed without necessarily affecting mapp.

See the execution code here. The class ServiceLoader is being used even though there are no longer any META-INF/services files. The approach that is being proposed in the email linked to previously ensures that the use of ServiceLoader in the modular universe is much easier to grok (hint: it does not matter what class loader is used as long as it is a module class loader).

Since javac indirectly knows about services (by way of the module system) errors can be produced at compile time if, for example, a service provider class does not implement the corresponding service interface.

Hopefully this blog entry and example will help developers get started playing around with modular services and Jigsaw. I have deliberately avoided going into the details of compilation, installation and execution. Take a closer look at the ant build script to understand how the Java command line tools are being used.

Dependency injection in the modular universe

The modular service clauses and the use of ServiceLoader is really a very simple form binding and explicit dependency injection.

What if a module could declare richer bindings to be consumed by dependent modules? Could Guice like modules and bindings be supported such that the module system could create appropriate injectors for consuming modules? If a module defines an entry point perhaps that could be instantiated by the dependency injection system thereby allowing for annotation-based injection out-of-the-box?

Lots of questions!

Wednesday, April 11, 2012

Building Jigsaw on Mac OS X natively

No sooner had I got Jigsaw builds working on the Mac, using Virtual Box, Michael McMahon updates the code base to build natively and then before you can blink Henri Gomez pushes out some jigsaw DMGs to install. Great! now it is even easier to play with Jigsaw on the Mac.

However, if like me you want to hack and build the source, here are some details.

The build dependencies are:

OS X Lion 10.7.3
Xcode 4.2.3
OpenJDK 7u4 (or using Henri's distribution)

That's simple right? well I went through a couple of iterations for this to work.

If there is a previous version of Xcode installed I recommend removing it:

sudo /Developer/Library/uninstall-devtools --mode=all

The latest version Xcode is a little saner and installs in one location. Then, switch Xcode to the latest installation:

sudo xcode-select -switch /Applications/Xcode.app/Contents/Developer

Some of the JDK build scripts will use "xcode-select -print-path" to determine the path to the command line tools.

The installation of Xcode does not install the command line tools, such as gcc, g++ and make. Such tools can be installed by running Xcode, selecting the Preferences dialog, selecting the Downloads tab, and clicking to install the Command Line Tools.

Finally, if OpenJDK 7u4 was installed before Xcode 4.2.3 was installed it necessary to reinstall the former (as stated here).

If you have the Oracle OpenJDK 7 distribution installed then set the environment variable ALT_BOOTDIR to:

export ALT_BOOTDIR=/Library/Java/JavaVirtualMachines/1.7.0.jdk/Contents/Home

Building natively took about 22 minutes, approximately one third faster than on Ubuntu within VirtualBox. The gain was mostly for building hotspot (implying VirtualBox is not so efficient at managing multiple CPUs).

While on the subject of building there is work going on to improve the build system, plus the build structure will at some point be modularized as part of the JDK modularization effort, thus making building separate parts easier. All in all this means building is gonna get easier and faster.

Thursday, April 5, 2012

Building Jigsaw on Mac OS X using VirtualBox

Jigsaw is the OpenJDK project for modularity in the JDK.

Currently the easiest way I have found to get started with Jigsaw on a Mac is to check out and build the source on a virtual machine running Linux.

I boot-strapped from reading Julien's very useful blog entry on building Jigsaw. My experience was a little smoother.

I am using Mac OS X 10.7.3 with VirtualBox 4.1.10.

Create a virtual machine

First, the virtual machine needs configured and installed with an operating system.

Create a new virtual machine with at least 1GB of memory and 16GB of disk space (space gets tight if the default 8GB is selected).

Download the Ubuntu 11.10 32 bit ISO and hook up the ISO to the CD drive of the virtual machine (see the Storage section of the settings dialog).

Start up the virtual machine, install Ubuntu, and update the OS to the latest packages.

The above steps should take about 40 to 50 minutes to complete, given a reasonable network connection.

Prepare the virtual machine

Next, the virtual machine needs to be prepared to checkout the Jigsaw source and build it.

Install OpenJDK 7 and it's build dependencies:

sudo apt-get build-dep openjdk-7

sudo apt-get install openjdk-7-jdk

Jigsaw will be built using OpenJDK 7, commonly referenced in this context as the bootstrap JDK.

Install mercurial:

sudo apt-get mercurial

The Jigsaw repository (like that for other OpenJDK projects) uses the Mercurial distributed version control system.

Check out and build

Now the source can be checked out and built. Check out the source forest:

hg clone http://hg.openjdk.java.net/jigsaw/jigsaw

and then execute:

cd jigsaw

bash get_sources.sh

to get all source trees in the forest i.e. this is a multi-project repository (i don't yet know if the forest extension can be utilized).

Set the following environment variables:

export LANG=C

export ALT_BOOTDIR=/usr/lib/jvm/java-7-openjdk-i386

export ALLOW_DOWNLOADS=true

The ALT_BOOTDIR variable declares the location of the bootstrap JDK that will be used to build Jigsaw. The ALLOW_DOWNLOADS variable ensures that source not within the repository, namely that for JAXP and JAX-WS, will be downloaded.

Then make Jigsaw:

make sanity

make all

On my machine it took about 50 minutes to build. The installation of Jigsaw is located in the directory build/linux-i586. Execution can be verified:

./build/linux-i586/bin/java -version

and the output should be something like:

openjdk version "1.8.0-internal"

OpenJDK Runtime Environment (build 1.8.0-internal-sandoz_2012_04_04_13_06-b00)

OpenJDK Client VM (build 23.0-b11, mixed mode)

I can get this down to about 30 minutes by reconfiguring the virtual machine to have 4 CPUs (the same as the host machine) and setting the following environment variables:

export HOTSPOT_BUILD_JOBS=4

export NO_DOCS=true

The next blog entry will explain how to compile, install and execute modules.

Wednesday, April 4, 2012

Java Boomerang

Friday 30th of March was my last day at CloudBees.

Monday 2nd of April was my first day back at Oracle. I have joined the Java Platform team (more on that later in another blog entry).

CloudBees is a great company full of great people and is on track to be a success disrupting the middleware market with the RUN@cloud, DEV@cloud, the integrated set of services, and Jenkins Enterprise. It's fantastic to see Jenkins and the community go from strength to strength. I wish the team the very best of luck. CloudBees is two years old today, Happy Birthday! Keep up the internal meme generation service!

However, personally, the time is not right for me to work at home in a startup.

So, it is with mixed feelings that I say I am sad about leaving CloudBees but also very excited about joining Oracle and the Java Platform team.

Wednesday, November 2, 2011

Lazy Mandelbrots in Clojure

When learning a new language it is often beneficial to select a small fun task that one has previously applied to other languages.

Rendering the Mandelbrot set is such a small task i find to be fun, i am a sucker for results that are visually appealing! Such rendering, while simple to implement, is computationally expensive. Rendering the set as fast as possible can be a useful exercise to understand the performance capabilities of the language.

My experience so far with Clojure has been a good one. The runtime has a bijouxesque quality to it, the Clojure 1.3 runtime jar is about 3.4 MB. I thought the syntax would be an issue but one quickly sees through the brackets and braces when using a good editor. The learning curve for the Clojure syntax is not much different to that of the JSON syntax.

All code for rendering the Mandelbrot set can be found here on github.

First, a function needs to be defined that determines if a point, c say, in the complex plane is a member of the Mandelbrot set. Basically a point c is a member of the set if iterations of z_n+1 = z_n² + c remain bounded i.e. no matter how large n gets z_n never goes beyond a certain value. In addition, if c is not a member then a value can be assigned that reflects the rate at which the sequence of z tends towards infinity, this value is commonly used to assign colours to points outside of the set.

Using Clojure 1.3 the following function is the fastest i could implement:

If the function returns zero then c is a member of the set, otherwise a value is returned that represents how fast the sequence tends towards infinity.

Notice the primitive type hints (see here for more details). This is not ideal, i would prefer to utilize complex number types from Clojure contrib rather than having to split out the real and imaginary parts, for example:

which is similar to the implementation here. I especially like the use of the iterate function to create the lazy sequence of z values. However, this approach proved to be slow. I don't recall the exact numbers but there was about a 15x difference between using the more appealing, but slower, function on Clojure 1.2 and using less appealing, but faster, function on Clojure 1.3. Note that Clojure 1.3 does have some significant performance improvements for arithmetic operations, however using the more appealing function on Clojure 1.3 still proved to be too slow.

Regardless of the two approaches i like Clojure's approach to tail recursion using loop and recur.

Next, lets define a lazy sequence of values of mandelbrot function that range over an area of the complex plane:

Since this is a lazy sequence no calculations will occur until values are extracted from that sequence. This function applies a linear transformation from co-ordinates in some space, say that for a graphical image, to that in the complex space.

The sequence returned will be of the form ([x y limit] [x y limit] ...).

Then, lets define a function that returns a lazy sequence of sequences given a bounding area in complex space, a and b, and the number of values, size, that should be calculated in either the real or imaginary axis:

The bound represented by a and b is normalized into a lower and upper bound. Given the size, lower and upper bound a distance d is calculated, and the width, w, and height, h (or number of values to be calculated on the real and imaginary axis respectively, thus the total number of values will be the product of the width and height). If w=size then h<=size, otherwise if h=size then w<=size.

The sequence returned will be of the form:

[w h
([x y w h ([x y limit] [x y limit] ...)]
[x y w h ([x y limit] [x y limit] ...)] ...)]

Where n defines the number of sub-sequences. In this particular case the function is splitting up the bounded area into n regions each represented by a lazy sequence, and for each sub-sequence the sub-bounds are declared. The reasons for this will become apparently later on. Again note that when this function is called no calculations will be performed until values are extracted from the sequences.

As a slight aside notice the ranges function:

I started writing this function then realized partition did exactly what i wanted, which is, given a right-open interval of [0, l) return the sequence of non-intersecting right-open sub-intervals of size s, and optionally ending in an right-open interval whose size is less than s and whose upper endpoint is l, for example:

mandel.mandelbrot=> (ranges 10 3)
((0 3) (3 6) (6 9) (9 10))

The lesson being, "there is probably a function for that".

Given the mandelbrot-seqs function we can start rendering Mandelbrots. Here is a function to render as text:

Which can produce output like the following:

mandel.mandelbrot=> (text-mandel 40 [-2 -1] [0.5 1] 10)
(888887777666666666666655554431 5555666
888877766666666666665555554421 24555566
8888776666666666666555555443 03445556
8887766666666666665555544331 2444556
8887766666666666655555433321 1234455
8877666666666665555544 0 1 23205
88766666666666555544430 4
88766666666665544444320 3
8866666666655444444330 03
876666666553233433321 02
866666555443 12112211
866655554432 00
8655555444320
855555444321 1
8555543321 3
8443233220 13
8 23
8443233220 13
8555543321 3
855555444321 1
8655555444320
866655554432 00
866666555443 12112211
876666666553233433321 02
8866666666655444444330 03
88766666666665544444320 3
88766666666666555544430 4
8877666666666665555544 0 1 23205
8887766666666666655555433321 1234455
8887766666666666665555544331 2444556
8888776666666666666555555443 03445556
888877766666666666665555554421 24555566)
nil

In this case we don't need to split into multiple sub-sequences and we are just interested in the width, w, or number of characters on the line so the sequence of strings can be partitioned by that value.

Alternatively we can render as an image, which gets more interesting. An image will contain many more pixels than characters in textual output, so will be more expensive to create. The rendering algorithm is easier to parallelize. This is the reason why multiple sub-sequences are supported.

If i have a machine with 4 CPUs i can map the problem to four lazy sequences, let each core process a sequence to produce an image, and reduce those images to one larger image.

Here is the code to produce a BufferedImage:

For the case where the problem is split into n parts the pmap function is used to map the function image-mandel-doseq in parallel to the lazy sequences. This produces a sequence of images which are then reduced into one large image by a further sequential map operation.

The image-mandel-doseq function is where the work is performed to set pixels in an image:

The doseq repeatedly invokes WritableRaster.setPixel. Notice the type hint ^BufferedImage and the functional calls to int and ints. To ensure that Clojure knows which setPixel method to call it is necessary to convert values, otherwise a costly call by reflection will result.

When interacting with Java libraries i recommend setting the *warn-on-reflection* flag to true, then the compiler will warn when reflection is used to make calls:

(set! *warn-on-reflection* true)

So how does the parallel execution work out on my Mac Book Pro with an Intel Core i7?

user=> (defn t [n f]
(for [_ (range 0 n)]
(let [start (. System (nanoTime))
ret (f)]
(/ (double (- (. System (nanoTime)) start)) 1000000.0))))
#'user/t
user=> (for [i (range 1 8)]
(/ (apply + (t 10 #(mandel.mandelbrot/image-mandel i 1024 [-2 -1] [0.5 1] 256))) 10))
(679.6881000000001 467.689 451.8355 366.8658 361.23620000000005 400.1187 535.1956)

The system profiler says my machine has 2 cores, but sysctl -a says there are 4 CPUs. The results are not quite what i was expecting. Nearly a 2x increase when the number of parallel tasks is four or five. Hmm... i don't quite understand this result. Need to dig deeper to work out what is going on.

I would love a graphically aware repl that could render the image that is returned from the image-mandel function, kind of like a poor man's Mathematica repl :-) An alternative is to develop a web service and render the image in a browser. This is really simple with compojure and ring. The image can be viewed by executing lein ring server.

The Renderable protocol of compojure was extended to support rendering of images:

Ideally i would like to directly stream out the image rather that buffer to a byte array, but this requires some tweaks to ring, which may make it after the 1.0 release, see the discussion here.

Finally the pièce de résistance is deployment of the application to Cloudbees by executing lein cloudbees deploy. Browse to here (if it takes a while to load that is because the application has hibernated and it takes some time to wake up). Developers forking the project can easily deploy to CloudBees if they have an account by tweaking the project.clj. For more detailed instructions on how to use lein with CloudBees see my previous blog entry.

Friday, October 28, 2011

Using lein to deploy Clojure ring applications to CloudBees

If you have a Clojure ring application and are using lein, then it is really easy to deploy that application to CloudBees using the lein CloudBees plugin, which is available on clojars.

Signup for CloudBees to create an account.

Download the CloudBees SDK and install. (See later for the case where the CloudBees SDK is not installed.)

Verify that you can execute bees app:list. The first time a bees command is executed it will prompt for your user name and password so that your API key and API secret can be downloaded and cached in the ~/.bees/bees.config properties file. This key and secret will be used to authenticate when using the CloudBees SDK or the lein CloudBees plugin.

Modify the project.clj file to identify the application:

:cloudbees-app-id "<account>/<appname>"

Where account is the name of your account and appname is the name of your application.

Modify the project.clj file to include the following development dependency:

[lein-cloudbees "1.0.1"]

for example:

(defproject mandel"1.0.0-SNAPSHOT"
:description "A Mandelbrot web app"
:cloudbees-app-id "sandoz/mandel"
:dependencies [[org.clojure/clojure "1.3.0"]
[org.clojure/clojure-contrib "1.2.0"]
[compojure "0.6.4"]]
:dev-dependencies [[lein-ring "0.4.6"]
[lein-cloudbees "1.0.1"]]
:ring {:handler mandel.core/app})

Verify that lein cloudbees works, you should see something like the following:

$ lein cloudbees
Manage a ring-based application on Cloudbees.
Subtasks available:
list-apps List the current applications deployed to CloudBees.
deploy Deploy the ring application to CloudBees.
tail Tail the runtime log of the deployed application.
restart Restart the deployed application.
stop Stop the deployed application.
start Start the deployed application.
Arguments: ([list-apps deploy tail restart stop start])

Deploy the application:

$ lein cloudbees deploy
Created /Users/sandoz/Projects/clojure/mandel/.project.zip
Deploying app to CloudBees, please wait....
http://mandel.sandoz.cloudbees.net
Applcation deployed.

The deployment creates an uber war and then deploys that war to CloudBees. The deployment process is smart, only changes will be sent and furthermore any jars in WEB-INF/lib will be checked, securely, against a global cache, before sending. So even if the uber war is rather big the actual stuff sent across the wire may be much less than expected, even on the first deployment.

Eh Voila! your application is deployed and the state can be verified:

$ lein cloudbees list-apps
sandoz/mandel - active

If you don't have the CloudBees SDK installed then you can reference the API key and secret key in the project.clj, for example:

:cloudbees-api-key ~(.trim (slurp "/Users/sandoz/cloudbees/sandoz.apikey"))
:cloudbees-api-secret ~(.trim (slurp "/Users/sandoz/cloudbees/sandoz.secret"))

Such declarations will take precedence over any API key and secret key declared in the ~/.bees/bees.config properties file.

It's a bad idea to reference the key and secret directly in the project.clj and instead it is better to refer to that information in a file. Slurp them in from a file to a string, then trim to remove any white space or line-feeds (the plugin needs to be modified to trim those values).

Monday, February 14, 2011

Buzzzzzz

On the 1st of March I will be joining my illustrious new colleagues at CloudBees. A heady mixture of ex-Sun, ex-JBoss and ex-Red Hat employees. So some of those new colleagues are actually old colleagues too, namely Harpreet, Kohsuke and Vivek.

CloudBees has a very compelling vision to push the full development-to-production cycle (build, test, deploy) into the cloud. They have been on a roll cranking out stuff at a rapid rate.

The CloudBees platform consists of two main pillars: DEV@cloud; and RUN@cloud. General availability was announced recently.

DEV@cloud enables developers to develop and build software in the cloud using the best Continuous Integration software available, namely Jenkins, as a service.

RUN@cloud enables developers to deploy software to the cloud.

There is an enormous amount of value to gained by integrating these two pillars together into one integrated platform.

My focus will be on Jenkins, Nectar and DEV@cloud. Exciting times ahead.

I have been watching, well lurking, as the s/Hudson/Jenkins events unrolled. For the moment lets just say that we are living in "interesting times" and given the outcome of the events I feel far more comfortable with Jenkins than I would be if involved with the alternative.

Between now and the 1st March i will be taking some time off, but the temptation to do a little hacking will be hard to resist! On verra...