Easily optimizing Python: Extending with Java

July 19 2010

0 comments 0

Our RESTx project – a platform for the rapid and easy creation of RESTful web services and resources – is largely written in Python. Python is a dynamic, duck-typed programming language, which puts very little obstacles between your idea and working code.cartoon_duck At least that’s the feeling I had when I started to work with Python several years ago: Never before was I able to be so productive, so quickly with so few lines of code. I was hooked. No wonder some people describe it as ‘executable pseudo code’: Just write down what you want and for the most part, it will actually work.

Now there’s a price to pay for this increased developer productivity, where the interpreter somehow figures out at run time what it is that you actually want to do and how to let your code deal with all sorts of different types: Python is interpreted, it’s dynamic and this means that for the most part it’s slower than compiled languages.

In this article here, I will talk about an interesting optimization technique for Python, which will come as a surprise to many Python developers.

Much of the underlying data structures and algorithms used by the interpreter are implemented with highly optimized C code. So as long as you perform built-in operations such as finding a member in a list, dealing with dictionaries (maps), etc., you are moving along at a pretty good pace. However, once you need to create many objects, write your own loops or perform a bit of good old fashioned number crunching, Python isn’t really the fastest kid on the block.

So, what to do? The traditional answer to this has been to write an extension in C. That’s a well-defined API, but it is pretty difficult to deal with, to say the least. Some tools have been developed to ease the pain, but in the end a big problem remains. Actually, two problems:

  1. C is not platform independent. You can’t just take C code from one platform, for example a 32 bit Windows system, and hope to run it without issues on a 64 bit Windows system. Sure, it might work some times, but often it will not. Python is by and large platform independent, so once you start using C , you are losing this potential advantage.
  2. C is not Python. That’s probably the biggest issue. The two languages work on completely different levels and their concept of data types to do even the easiest tasks is vastly different. Many of the data structures that are entirely natural in Python don’t exist in C. Many of the operations you naturally do in C (for example, shifting bits in integers) aren’t a good idea if you deal with Python types. Of course that’s not surprising: C is a low level language and Python is not. They are made for very different purposes that they are both very good in. But still, going from one language to the other is a complete break in the way you have to think about your algorithm and your approach to solving the problem. Dealing with Python’s optimized, high-level data structures doesn’t come naturally in C.

However, there exists a very interesting alternative to this. jythonIt’s called Jython and is essentially a Python interpreter, which is not implemented in C, but in . Thus, it runs in an ordinary JVM as a application. You can start it on the command line to get a normal Python “>>>” prompt or to run a normal Python program. But it comes in the form of a JAR, so you can also easily include it in your applications as well.

There has been quite a bit of talk about Jython, but many Python developers look at it as “Well, a Python interpreter written in Java… interesting, but so what?” Unbeknown to many, though, Jython opens up entirely new dimensions when it comes to easy optimization of your Python code: Don’t write your extension in C, but instead, write it in Java!

Java? Yes, indeed. Why would you want to use Java to implement performance critical parts of your Python program?

  1. When it comes to loop performance, number crunching, object creation and many other CPU intensive applications, Java is significantly faster than pure Python. This is not just based on some micro benchmark, but on actual examples I had to deal with (computation, traversing data structures, parsing of complex input data, etc.). Consider also that the JVM had decades of optimizations now. The old adage of Java being slow for the most part just doesn’t apply anymore. Maybe some of the overly heavy, convoluted software and frameworks written in Java are slow, and the interpreter start up is slower, but once it starts going the language itself actually performs remarkably well.
  2. Java is much closer to Python than C. Of course, Java is not Python as we know, but it definitely comes much closer in its expressiveness and standard data structures than C. Therefore, writing an extension in Java feels much less like a break from Python and working with Python data structures (and vice versa!) is surprisingly easy.
  3. Jython offers excellent integration between Python and Java. In fact, it’s possible to construct mixed-language objects, where some methods are implemented in Java and others in Python, which is very, very cool. I will show some examples in a moment.
  4. Jython doesn’t suffer from the GIL! Hurray!

So, how easy is it to write an extension in Java? Very easy, as you will see.

First, it’s important to realize that you can simply import and use any Java library in your Python program. For example, we can type this at the Jython command prompt:

The printout in the end looks almost like what you would get from a Python dictionary, doesn’t it? Jython tries to make this sort of thing easy for you and helps you to map Python’s lists to Java’s Collections and Python’s dicts to Java’s Maps. It doesn’t really convert the type: For example, you won’t be able to call keys() on the HashMap object, since a Java HashMap doesn’t offer that method. But Jython has enough smarts to assist you with the most common tasks here. For example, it allows us to add elements the same way we normally do with dictionaries in Python, using the [] operator. Java can’t re-define operators, but Python can. Thus, we can now deal in Python code more conveniently with a HashMap than in Java.

But where things really start to get interesting is this: You can make Python classes, which inherit from a Java class. Consider this example of an abstract Java class:

Now compile this class. Then you will be able to write the following snipped of Python:

Now this was easy! TestClass is an abstract class, which leaves the cubed() method unimplemented. We now implemented this method here in MyTestClass, a Python class, which extended the Java class. As a result, we now have an object where some methods are implemented in Java and others in Python and where both can be seamlessly called and used.

Please note that TestClass does not have to be an abstract class. We could have just added the cubed() method anyway when we implemented MyTestClass. However, there’s an advantage to declaring the cubed() method in Java already: You see, Java is statically typed. It likes to know exactly what it deals with. Python objects are funny in that you can add methods to them even at runtime. This would leave Java very much confused if you then pass one of those objects back to it. So, what we did here is we defined a static interface, which already contained a signature for the function we still had to implement in Python. In effect, we created a static view of the entire object (Java and Python side) for Java’s benefit. If we then pass a MyTestClass object back to Java, it can easily deal with it since it knows about the signature of the cubed() method already.

That way, Python code can use Java objects, Java code can use Python objects, and objects can be made up of a mixture of Python and Java.

We used this technique quite a bit in RESTx, where even the API for developers of  components – custom data access or integration functionality – is provided in Java and in Python flavor, so you can chose the languages to work with.

“Sure”, you might think, “it’s easy to pass integers between languages. But what about more complex structures, such as dictionaries?” Well, that is surprisingly simple as well. Assume you want to pass a Python dictionary to Java. You can write some Java code that looks like this:

Assume this is a method in a Java class called  TestClass2.

In Python you can then write:

We passed an ordinary Python dictionary to the  Java code, which added an element to the Python dictionary. The type of the dictionary – as far as Java is concerned – is org.python.core.PyDictionary, which comes as part of the Jython JAR. However, that class implements Java’s Map interface and thus can be dealt with in a simple, generic manner by Java.

Note that the added key "foo" is unicode, since all Java strings are treated by Jython as unicode.

Integrating Java and Python code is incredibly easy with Jython. The performance of Java will often be sufficient, even for CPU intensive tasks. This makes optimizing Python by means of Java extensions easy, natural and an option worth considering for your next Python project.

We'd love to hear your opinion on this post