Thursday, October 11, 2012

Using Hudson/Jenkins to diagnose that intermittent failure


I have been working on one of those intermittent bugs that just won't reproduce on my machine; but will reproduce intermittently on other machines while they are running automated testing. I filled the code with trace statements, now I suspect the problem is in code that I don't control and doesn't appear to have much in the way of diagnostics in the place I am working on.

So I did the obvious thing which is to run the tests on a loop on my machine overnight, 12 hours later and 8 test iterations later, no test failures and I am no further forward.

Since the tests are failing in the Hudson farm, it make sense to try to connect the debugger up to those jobs; but I don't want to hang around to attach the remove debugger to each. Thankfully there is a workaround that will allow me to set suitable breakpoints and manage the debugger connection for me.

First of all you need to configure you IDE to accept incoming debugger connections, here are some notes on configuring JDeveloper for a listening debugger, in Netbeans you need to use the Debug->Attach menu item and select "SocketListen" as the connector and configure as per JDeveloper. In Eclipse you need to configure the debug type as "Socket Listen".

The second step is modifying your build system so that there is a target you can call that will start the test cases in debug mode. This is an example of the parameters for one of our CI jobs that passes in the right information. Note of course the blacked out text the the name of then machine you are trying to connect back to. (The java tests are started with the parameter -agentlib:jdwp=transport=dt_socket,address=xxxx.oracle.com:5000,server=n) Make sure that you don't have any firewalls running on that machine that will block the in-coming connections.




You probably will want to run to run multiple jobs at the same time if you have the nodes available, so consider checking this concurrent build box. Always a good idea to bring cakes / cookies into the office if you are going to tie up all the preflight nodes for the day.




And then all that remains is to run a bunch of jobs and wait for your breakpoint to be hit, might take a little while; but it is going be quicker than running these jobs in series on your own machine. And if your farm is Heterogeneous so much the better for reproducing intermittent failures.



You can sit back and then wait for your code to fail..... may I suggest some sessions from JavaOne while you wait?

2 comments:

Torsten Kleiber said...

It seems that the link to configure the JDeveloper for incoming debug sessions is broken?

Gerard Davison said...

Sorry about that, fixed now. Looks like a bug in the blogspot editor.