Java internals, or when true != true

Most programmers have heard jokes about inserting a Greek question mark (;, U+037E) into Java code in place of a semicolon to cause "inexplicable" compilation errors.

But, it's too easy to discover. What about something that manifests itself at runtime, but when inspected — either by printing to stdout or through a debugger — shows nothing amiss?

Using the internal sun.misc.Unsafe (and targetting Hotspot VMs), we can create a boolean that compares equal to neither true nor false, but when inspected, will always manifest itself as true.

Let's take a look.

import sun.misc.Unsafe;
import java.lang.reflect.Field;

public class Tainted {
    public static boolean toTaint = true;

    public static void main(String[] argv) throws Exception {
        Field _unsafe = Class.forName("sun.misc.Unsafe").getDeclaredField("theUnsafe");
        _unsafe.setAccessible(true);

        Unsafe unsafe = (Unsafe) _unsafe.get(null);
        unsafe.putInt(Tainted.class, unsafe.staticFieldOffset(Tainted.class.getDeclaredField("toTaint")), 2);

        test(toTaint, false);
        test(toTaint, true);
        test(toTaint, toTaint);
    }

    public static void test(boolean a, boolean b) {
        System.out.printf("%s == %s: %s\n", a, b, a == b);
    }
}

The output of the above code is shown below.

true == false: false
true == true: false
true == true: true

So, what's going on?

The Unsafe class allows us to play around with the raw data backing Java objects. Since this is inherently unsafe, we have to jump around a few hoops: specifically, we must use reflection to grab the Unsafe instance (this can be blocked by a security manager, for security-concious applications). The alternative is to set our classes as part of the bootclasspath and use Unsafe.getUnsafe() directly, but that's less elegant.

Once we have our Unsafe instance, we can use it to determine the offset in memory from the base of our class of our toTaint boolean. Then, we can use putInt to set the value of toTaint to the integer 2.

But what does this mean?

If we look into the internals of the JVM, we can find the declaration of jboolean (the internal representation of a boolean object) in jni.h as an unsigned char.

...
typedef unsigned char   jboolean;
typedef unsigned short  jchar;
typedef short           jshort;
typedef float           jfloat;
typedef double          jdouble;
...

This makes sense: there's no data type for storing just one bit of data, and an unsigned char is guaranteed to be at least 8 bits. That is, a boolean can actually store any number in the range 0 to 255, and we're setting it to the integer value 2.

Internally, when the JVM does equality comparisons, it doesn't only check one specific bit of both boolean values (that'd be a silly waste of time); instead, it simply compares all 8 bits. A real true value has only the least significant bit set (i.e., is equal to the integer 1). So, a real true will not compare equal to our tainted boolean (set to 2), nor will it to a real false (stored as 0).

However, this boolean is functionally equivalent otherwise: conditional branching operations look to see only if the value is nonzero, so an if (toTaint) block of code would still execute as expected.

With that in mind, we can take a look at the code of the Boolean class to explain the final bit of the puzzle:

...
public static String toString() {
    return value ? "true" : "false";
}
...

When we're printing out our boolean, internally toString must be called on our object, so the boolean is autoboxed to a Boolean, and the above code is called. As we've discussed already, branch operations treat any nonzero value as true, so our boolean will always be represented by the string true.

And that wraps up our goal! The Unsafe class has many practical uses for legitimate applications, but sometimes trying out illegitimate things is the best way to learn something new — which hopefully this post has helped with!