JVM Language Study

This web page contains the source code and experimental data from the research carried out for "JVM-Hosted Languages: They Talk the Talk, but do they Walk the Walk?", a paper submitted to, and accepted by, the PPPJ'13 conference. A copy of the conference presentation slides can be found here.

My JVM language study profiled a number of benchmarks written in 5 JVM languages and compiled using Java 1.6.0_30, Clojure 1.3.0, JRuby 1.6.7, Jython 2.5.3 and Scala 2.9.2. I used the Java HotSpot(TM) 64-Bit Server VM (build 20.5-b03) for my experiments. If you have any questions, just contact me by email: w.li.2@research.gla.ac.uk

A similar study on Clojure, JRuby and Jython behaviour, titled "Characteristics of Dynamic JVM Languages", was conducted by Sarimbekov et al. Our own study used an earlier version of their toolchain and their paper examines different metrics to our own.

Experimental data

The following sections provide details of how to repeat the profiling experiments perfomed for my study. However, if you are simply interested in the data from the experiments then JP2_data.zip and ET_data.zip contain the parsed data from the JP2 and Elephant Tracks profilers respectively.

Setting up JP2

Download JP2 from http://code.google.com/p/jp2/ and extract to a suitable folder.
Download the modified source files from JP2-modifications.tar.gz.
Extract the modified source files and place into the JP2 directory, replacing the existing files.
Run ExtendThread.sh if using Linux or ExtendThread.bat if using Windows.
Use Apache Ant to build JP2 by entering 'ant' at the command prompt.

This will build the modified JP2 version used in our study. JP2 automatically downloads a copy of the DaCapo benchmarks. The Scala Benchmark SUite can be found at http://www.scalabench.org/.

Profiling using JP2

Note that all benchmarks were executed in the JP2 directory (I could not get it to work anywhere else).

Computer Languages Benchmark Game, DaCapo and Scala Benchmark Suite

The CLBG benchmarks were obtained from the Computer Languages Benchmark Game site between March to June 2012. The code was obtained by copying and pasting the source code into separate files from the listings provided by the site. Some modifications to the Python 3.0 benchmarks were required to ensure compatibility with Jython.

Download and extract the CLBG benchmark source files, CLBG-benchmarks.tar.gz.
Download the JP2 scripts, CLBG-JP2_scripts.tar.gz and extract them.
Ensure that Clojure, JRuby, Jython and Scala are installed. The scripts require the environment variables 'CLOJURE_HOME', 'JRUBY_HOME', etc. to be set to their installation directories.
Each benchmark must be run within the JP2 directory. Copy the benchmarks and scripts for each JVM language into the JP2 directory and run the batch scripts. Move the traces belonging to one JVM language elsewhere before starting on another programming language. Retain the compiled class files for the CLBG benchmarks for profiling with Elephant Tracks later.

Parsing the JP2 Traces

Download AnalyseBC.jar.
Organise the JP2 traces according to JVM language, i.e. place Java JP2 traces in a directory called 'java', place Scala JP2 traces in a directory called 'scala', etc.
Run AnalyseBC using 'java -jar AnalyseBC.jar 4' in the directory containing the JVM language trace directories.

AnalyseBC has several options:
-nofilter : MATLAB and WEKA N-grams include Java bytecode.
-planalysis : Examine N-grams for programming languages rather than individual benchmarks.
-matlab : produce MATLAB compatible CSV file.
n : maximum size of N-gram. This number must be at the end.

AnalyseBC will produce a text file detailing the N-grams used, the top ten N-grams for each size, the top ten methods according to frequency and bytecodes covered and the size and frequency of methods executed.
If 'planalysis' is selected, a text file detailing the unique sequences, sequences not found in the Java traces, the coverage of those sequences and the top ten N-grams for each size. The 'nofilter' switch will determine if Java bytecode within the non-Java language traces is filtered out.
If 'matlab' is selected, a CSV file is produced, listing the N-grams used by each benchmark . The value for each N-gram is normalised according to the total number of N-grams executed.
'report.csv' contains statistics, like bytecodes and methods executed, for each benchmark.

Setting up Elephant Tracks

Download Elephant Tracks from here. Make sure you also download the asm jar that is compatible with Elephant Tracks.
Extract Elephant Tracks and build it according to the instructions on the web site.
Place the asm-3.1.jar file into the directory where Elephant Tracks is installed
Add the location of Elephant Tracks to the 'LD_LIBRARY_PATH' environment variable.

If you get the error:

ETCallBackHandler.h:22:7: note: candidate expects 1 argument, 2 provided

Nathan Ricci suggests modifying ETCallBackHandler.cpp line 25 to:

ETCallBackHandler::ETCallBackHandler (jvmtiEnv *jvmti, JavaVM* vm,
TraceOutputter* traceOutputter) :
CallBackHandler::CallBackHandler(jvmti, traceOutputter)

And to delete line 26.

Profiling Using Elephant Tracks

There were problems when profiling some benchmarks with Elephant Tracks. The solution was to profile them by giving the process a high priority using 'sudo nice -19' before each batch command. The Elephant Tracks traces for all of the benchmarks will exceed 500Gb in space.

Download CLBG-ET_Scripts.tar.gz.
Extract into a directory and copy the class files, compiled earlier for JP2, into each JVM language directory. The DaCapo and Scala Benchmark jars should be copied into the appropriate directories (or modify the scripts to access the location they are stored in).
Run the batch ET script within each JVM language directory. Profiling with ET will take some time.
Run the batch AnalyseGC within each JVM language directory. This may also take some time, depending on the size of the Elephant Tracks traces.

AnalyseGCTrace produces a number of files:

classcounts : lists the number of instances of each class used.
lifetime : lists object lifetimes and the number objects with those lifetimes.
livetrace : a trace of the heap size after each allocation or death.
size : lists objects sizes and the number of objects of each size.

There are a number of scripts that will parse the CSV files produced by AnalyseGCTrace and produce useful statistics about the amount of boxing used, method stack depths and object sizes.

Profiling Individual Applications

Clojure Leiningen and Noir Blog

Ensure that you have Leiningen installed according to the instructions here.
Download the Noir Blog example from https://github.com/ibdknox/Noir-blog and extract it.
Download the noir-blog_scripts.tar.gz and extract to the Noir-blog-master directory.
Modify line 3 of the file Noir-blog-master/src/noir_blog/server.clj from:

[noir-blog.models :as models]))
to:
[noir-blog.models :as models]) (:gen-class))

In the JP2 directory, rename the src directory to src.bak.
Copy the contents of the Noir-blog-master to the JP2 directory. You may want to make a clean copy of the JP2 directory before you do so, allowing you to remove these files later.
Profile Leiningen compiling the Noir Blog site using the lein-uberjarJP2.sh and lein-uberjarET.sh scripts.
Profile the Noir Blog site as two blog messages are added to the site using the profile-noirJP2.sh and profile-noirET.sh scripts. Note that the first sleep interval in profile-noirET.sh will need to be increased if the Noir Blog server takes longer than 600 seconds to start.

Clojure Incanter

Download Incanter from https://github.com/liebke/incanter.
Download the incanter_scripts.tar.gz and extract to the JP2 directory.
Copy the contents of modules/incanter-core to the JP2 directory.
Profile Incanter running its core unit tests using lein-incanterJP2.sh and lein-incanterET.sh.

JRuby Warbler and Ruby on Rails Blog Application

Although we managed to profile the three JRuby applications with JP2, none of the applications completed executiuon using ET.

Start by installing the Warbler and Ruby on Rails gems for JRuby:

jruby -S gem install warbler
jruby -S gem install rails

The next step is to install PostGreSQL, create a database for the blog and set up a user account with the ability to . I used the guide provided here. Make sure the user account has the LOGIN, REPLICATION and CREATEDB roles. Be sure to note down the database name, the user name and user password for the next step.
Download the Ruby blog application files. The blog application was created using the tutorial found here. Update the config/database.yml file with the database name, user name and password from the previous step.
Start the database. Depending on where you have installed PostgreSQL, use the command:

/usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data >logfile 2>&1 &

Use the following commands to create the database tables required by the blog:

jruby -S rake db:create
jruby -S rake db:migrate

Copy the jrubyJP2 script to the $JRUBY_HOME/bin directory.
Copy the contents of the blog directory to the JP2 directory, merging folders if necessary. You may want to make a clean copy of the JP2 directory before you do so, allowing you to remove these files later.
Within the JP2 directory, run the profile-warble.sh script. This should produce a JP2 trace from compiling and packaging the blog application.
Using Archive Manager or a similar utility, open the blog.war file and copy the postgresql-9.2-1002.jdbc4.jar into the /WEB-INF/lib directory.
Run the profile-jrailsJP2.sh script within the JP2 directory. Once the script is finished, you should have a JP2 trace for the blog application.

JRuby Lingo

Start by installing the Lingo gem for JRuby using:

jruby --1.9 -S gem install lingo

Find a suitable text file to be indexed. We used "Alice's Adventures in Wonderland", from Project Gutenberg. Place the text file in the JP2 directory.
Use the following command within the JP2 directory to produce a JP2 trace for Lingo:

jrubyJP2 --1.9 -S lingo -l en alice.txt

If you encounter problems, it may be caused by the format of the text file. try copying the contents to a new text file.