Archive for February, 2008

HAMMER

Friday, February 15th, 2008

Another bit on distributed file systems and the like: HAMMER is a highly available clustering file system being developed as part of DragonFlyBSD. More about HAMMER on KernelTrap.

Hyperic SIGAR

Monday, February 11th, 2008

Hyperic SIGAR (System Information Gatherer and Reporter) is a cross-platform, cross-language library and command-line tool for accessing operating system and hardware information in C, Java, Perl and C#. SIGAR is licensed under the GPL version 2. Not quite sure what this implies for Java projects, but anyway.

Just spotted this library as part of the upcoming GridGain 2.0 release.

Distributed File Systems

Saturday, February 9th, 2008

I’ve been thinking about the best way to configure a bunch of computers for doing large-scale machine learning experiments. One problem that always pops up is how to get some piece of the data to the node that needs to process it (a mapping in the Map Reduce framework).

You can cook up various schemes to distribute the data, but in the end I don’t think anything is going to beat the simplicity of a shared file system. However, when your cluster starts getting big and your data starts getting large, you start running into problems with traditional shared file systems like NFS (contention mostly). This leads one to consider a truly distributed file system.

It should come as no surprise that Google has the Google File System. I think many of the amazing things the people at Google are able to do can be attributed to the fact that they have their map-reduce and distributed file system infrastructure properly sorted out.

For the rest of us, there’s Hadoop, which is nice, but still not quite as easy to use as I’d like it. Ideally, I want to install the latest version of my Linux distribution or run a setup program on Windows and it should just work. No mess, no fuss. On Windows I want to see my distributed file system as a drive letter (or as a directory on Linux): this makes it easy to make legacy applications (C++ programs, MATLAB scripts, etc.) operate on your data. Along these lines, Hadoop has something called Pipes which could be used in some cases, but ideally I want the fact that I’m operating on distributed data to be completely transparent to my applications.

Here OpenAFS is showing some promise. It seem some guys are working on an IFS driver for OpenAFS (see OpenAFS for Windows Requested Features and Road Map). IFS looks like the right way to integrate a new file system with the Windows platform. Last I checked, Hadoop didn’t support all the functions of a general purpose file system, but maybe it could still be integrated with IFS to give a it a really nice interface for Windows users. I don’t know what OpenAFS does on Linux, but I’m assuming it works nicely there already. I should investigate…

I mention Hadoop and OpenAFS, since they seem to be the only candidates in the list of distributed file systems on Wikipedia that appear to be free, properly maintained and generally useful.

Once you have your data sorted out, you still need to distribute your computation across the nodes in your cluster. I’ll discuss that in another post.

By the way, the Hadoop folks recently created a subproject called Mahout, that is focusing on building distributed implementations of various machine learning algorithms, following the ideas published in Map-Reduce for Machine Learning on Multicore.

PPTP with DD-WRT blows chunks

Friday, February 8th, 2008

I recently purchased another Linksys WRT54GL and flashed it with the latest release of DD-WRT. Basic configuration was a breeze, but when I tried to configure the router to make a connection to a PPTP server, the wheels came off. Badly. The various forums and bug trackers are rife with reports of these problems, but the developers seem to be oblivious. Great.

The dysfunction inherent in this project is described here: DD-WRT - An affront to the good will of the F/OSS community.

If anyone knows of a firmware for Linksys routers that can do PPTP properly, let me know. Tomato doesn’t seem to cut it (no PPTP support). As far as I could tell from the documentation, OpenWrt doesn’t support the wireless part of the Linksys when used with a 2.6 kernel. I still need to look at HyperWRT and friends.

Am I the only person who wishes that Mikrotik would port RouterOS to the Linksys and other routers? I’m sure they’d sell tens of thousands of copies. Might cut into their hardware sales though…

Update: Looked at thibor’s hyperwrt. Seems to support PPTP for the WAN interface, but this doesn’t do what I need (DD-WRT could do what I need, but it doesn’t work). Also looked at the Mikrotik Routerboards again. They really need to package up one of those things into a little box like the Linksys, include a level 4 RouterOS license and sell the whole package for somewhere between $80 and $120. I’d buy it.

GCC OpenMP and Python

Friday, February 8th, 2008

I had high hopes for the OpenMP support introduced in GCC 4.2, and then came this:

Reported by Nathan Bell on the NumPy mailing list.

Jar Jar Links and One-JAR

Tuesday, February 5th, 2008

Java links of the day: Jar Jar Links (jarjar) and One-JAR. I’ve used jarjar before, but I’ve run into some bugs when trying to bundle certain libraries. The JRuby project have also had this issue. I guess jarjar’s maintenance went south after Google bought Tonic Systems…

Java Native Access (JNA)

Friday, February 1st, 2008

I’ve been meaning to write something about Java Native Access (JNA), but I’ve been too busy actually using it! According to the JNA site, “JNA provides Java programs easy access to native shared libraries.” Python folks have had the same functionality in ctypes for a while now.

I’ve been using JNA for about 9 months for code related to my master’s thesis. I’ve built Java code on top native libraries for BLAS (mostly Intel MKL for now), HDF, PRIMME and MATLAB.

JNA’s future is looking bright. It provides an easy-to-use alternative to JNI. The JNA maintainer, Timothy Wall, is extremely active on the mailing list. Even the JRuby folks are catching on.

Java compiler bugs

Friday, February 1st, 2008

During the work on the Java code related to my master’s thesis, I’ve run into two bugs in Sun’s Java compiler (using Java 6).

The first bug was Bug ID: 6570761 Possible generics regression - inconvertible types. I’ve since changed my the design so that this bug no longer affects me, but it was annoying none the less.

The second bug has been reported by others, but there doesn’t seem to exist a report for it in Sun’s bug database. I submitted a bug report to them in November of 2007, but the report seems to have been ignored since then. I figured I’d reproduce the report here in case anybody else runs into this problem.

The offending code looks like this:

interface IA {
IA op();
}
interface IB {
IB op();
}
public interface IC extends IA, IB {
IC op();
}

The error message is:

IC.java:7: types IB and IA are incompatible; both define op(), but with unrelated return types

This same issue has previously been raised for the Eclipse compiler:

It seems §9.4.1 of the Java Language Specification applies to this problem. It says:

“An interface inherits from its direct superinterfaces all methods of the superinterfaces that are not overridden by a declaration in the interface. It is possible for an interface to inherit several methods with override-equivalent signatures (§8.4.2). Such a situation does not in itself cause a compile-time error. The interface is considered to inherit all the methods. However, one of the inherited methods must must be return type substitutable for any other inherited method; otherwise, a compile-time error occurs.”

While Sun drags its feet with this issue (maybe it’ll get fixed for Java 8 in a few decades from now), you can use Eclipse. If you need to build with Ant, you can get it to use the Eclipse compiler by setting the build.compiler property to org.eclipse.jdt.core.JDTCompilerAdapter and including ecj.jar in your Ant classpath.

It seems some French guys were also grappling with this problem: Héritage multiple des interfaces et surcharge de méthodes.

Update: Finally found the right bug for the covariant return problem with the help of Jonathan Gibbons from Sun. It is Bug ID: 6294779 Problem with interface inheritance and covariant return types. The bug was created in 2005. Don’t know why I couldn’t find it with my previous searches, but maybe other people will have better luck now.

libffi

Friday, February 1st, 2008

It seems there is some renewed interest in libffi, the library used by ctypes and JNA to call functions in native libraries from Python and Java, respectively. Until recently, the efforts around libffi have been very fragmented, with various patches only being available in only ctypes or only JNA (or only elsewhere).

On the ctypes side, there are some patches to build libffi with Visual Studio, which are useful for Windows junkies like myself. There is also a patch for Win64 support, which really needs to get into JNA (Java on Windows Server 2003 x64 rocks!). Timothy Wall of JNA fame has also produced some patches. A lot of this work has featured on the gcc-patches mailing list.

Anthony Green is also doing some work on libffi, but this seems to be happening separately from the work of the gcc-patches folks. Hopefully all these disparate efforts can be unified so that we can all benefit from a single libffi that works on many platforms (including Win64, please!).