The alarms go off. You get an alert on your cellphone. You look at your systems dashboard and notice that your Mule server, which was happily chewing messages until just a minute ago, is struggling. What can you do?
It is statistically very rare for Mule applications to fail, but when something unexpected happens it is good to know how to react in the crucial moments after a problem has been detected. Consider this post a fire drill: you don’t want your office to catch fire, but it’s always good to know what to do just in case.
In this post we’ll look at four pieces of information you can gather in only a few seconds that will allow you to, once the service is restored, do a post-mortem investigation of the cause of the problem. Incidentally, if you contact MuleSoft Support, all the information you will have gathered will be very useful to the engineers for identifying the problem and will enable them to give you an answer much faster.
The information below is important: do not wait until you have a problem to read this guide. Try the instructions now, and make sure that you will be able to carry them out when (and if) a problem actually happens. When your server is down you don’t have time to think or to look for a specific tool. Make sure that you are familiar with the procedures. NOW.
First, find your Mule process
For the first two pieces of information, we will need to know the Process ID of the Java Virtual Machine that is running Mule. Under Linux and other unixes, you can find it by using ps, and selecting only those that contain the java command. As an example, under Linux you would use:
ps -aux | grep java
A typical Mule JVM will look very similar to this:
username 1581 0.0 4.8 4152960 406468 s000 S+ 12:02PM 1:20.99 /usr/bin/java -Dmule.home=...
If you have more than one Java application running in your server, look for the one that has the parameter -Dmule.home; this is the JVM that runs Mule. The PID for the JVM is the first number after the username Java runs under. In this case, the PID is 1581.
On Windows, you can use your JDK’s jps command. It will list all the JVMs running on your system (including jps itself). The Process ID will be the number to the left of the one identified as MuleContainerBootstrap:
In this case, the Process ID is 2348.
Doing a thread dump
A thread dump is a textual representation of the state of all the different execution threads within your application. It lists the method (down to the file name and line of code) that each of the threads is executing, and the chain of calls from that point down to the first method executed in the thread.
It also shows which locks or semaphores (if any) that the threads hold and/or are waiting for. A thread dump is the best tool to learn about the state of an application at a given point in time and to diagnose deadlocks. Without them, situations where programs simply stop responding would be next to impossible to debug post-mortem.
To obtain a thread dump you need to send a signal (a form of message) to the JVM process. Obtaining a thread dump does NOT stop or kill your JVM. After producing the thread dump the JVM will continue running as if nothing had happened.
On Linux and other unix-like systems, send signal 3 (QUIT) to the java Process ID you got in the previous step:
kill -3 {java_PID}
Again, this will NOT kill your JVM. The stack trace will be appended to Mule’s mule-ee.log (you will not see it on screen).
On windows systems, as the kill command is not available, you will need to use your JVM’s jstack command. This command will output the stack trace to the console, so it is recommended that you redirect it to a text file in the following manner:
c:path_to_JDKbinjstack {java_PID} > stacktrace.txt
Doing a heap dump
A heap dump is nothing more (and nothing less) than a snapshot of the whole object memory space of your JVM, written into a file. Every single String, and Integer, every single Array, List and Map, every single object present in your JVM’s memory will be written to disk.
Be aware that creating a heap dump will take time, and the JVM will be PAUSED during that time. Your Mule will not be killed, but will be unavailable to respond to requests. Also be aware that a heap dump will be nearly as big as the JVM’s heap memory area, so a JVM that has been assigned 1GB for heap will most likely create a 1GB heap dump file.
To create a heap dump, use the jmap command from your JDK:
jmap -dump:format=b,file=heap.jmap {java_PID}
The heap will be copied to a file named heap.jmap.
Saving the logs
Now you’ll need to go to your MULE_HOME/logs directory and zip everything that’s in there.
Mule logs all exceptions to the various global and per-application log files, so if anything goes wrong, it will be there together with a timestamp and information on what Mule was doing at the time of the event.
Looking at the general status of the system
If the problem is more than a Mule-specific issue, for example, if your whole server feels sluggish or if you are at times unable to connect from outside, it would be a good idea to gather as much information on the general status of the system as possible. Under Linux and other unixes, tools like vmstat, and top provide a good overview of what the system is doing at a given instant.
Under Windows, use the Performance Console (under Administrative Tools) to look for bottlenecks in CPU or memory.
Now what?
You should now have more than enough information to start diagnosing the problem you experienced. If you want to restart your server or your Mule to get it back to normal, now is a good time.
If you are a MuleSoft customer and need assistance, our Support Team will be happy to help you figure out the cause of your problem. Make sure to provide all the information you gathered, and the response will be much quicker and spot-on.