Monday, November 17, 2014

But that's impossible, or finding out that the JIT has broken your code.

Every now and then you look at some code and think that it cannot be possibly be wrong. Once you have ruled out a simple programmer screw up / enemy action in code (Make sure you read Java Puzzlers or similar) or a concurrency issue (Read Java Concurrency or go on Dr Heniz excellent course) you should sit back and take a few days and then starting thinking about whether the JDK is indeed out to get you. I haven't seen one in the wild in my 18 odd years as a Java programmer so it kinda took me by surprise.

If you are running against JDK 8 in a large scale Swing application you might eventually see the following exception, lots of lots of times. (Unless you forgot the lesson learned in my previous blog in your logging code, in which case you might see lots of ArrayOfOutBoundsException)

Caused by: java.lang.NullPointerException 
    at javax.swing.text.GlyphView.getBreakSpot(GlyphView.java:799) 
    at javax.swing.text.GlyphView.getBreakWeight(GlyphView.java:724) 
    at javax.swing.text.FlowView$LogicalView.getPreferredSpan(FlowView.java:733) 
    at javax.swing.text.FlowView.calculateMinorAxisRequirements(FlowView.java:233) 
    at javax.swing.text.ParagraphView.calculateMinorAxisRequirements(ParagraphView.java:717) 
    at javax.swing.text.BoxView.checkRequests(BoxView.java:935) 
    at javax.swing.text.BoxView.getMinimumSpan(BoxView.java:568) 
    at javax.swing.text.BoxView.calculateMinorAxisRequirements(BoxView.java:903) 
    at javax.swing.text.BoxView.checkRequests(BoxView.java:935) 
    at javax.swing.text.BoxView.setSpanOnAxis(BoxView.java:343) 
    at javax.swing.text.BoxView.layout(BoxView.java:708) 
    at javax.swing.text.BoxView.setSize(BoxView.java:397) 
    ...

This error is particularly insidious because it takes around ten minutes to show itself and sometimes not at all. If you look at the code for this class the line in question, starts with "startsFrom = break", only accesses two local variables both of which have been previously referenced in the method.

            Segment s = getText(pstart, pend); 
            s.first(); 
            BreakIterator breaker = getBreaker(); 
            breaker.setText(s); 

            // Backward search should start from end+1 unless there's NO end+1 
            int startFrom = end + (pend > end ? 1 : 0); 
            for (;;) { 
                startFrom = breaker.preceding(s.offset + (startFrom - pstart))   
                          + (pstart - s.offset); 
                if (startFrom > start) { 
                    // The break spot is within the view 
                    bs[ix++] = startFrom; 
                } else { 
                    break; 
                } 
            } 

The most direct way to rule out a JIT error is to disable compilation for just this single method, here is an example; but you can fine more in the documentation for the command line java tool.

javaThing -XX:CompileCommand=exclude,javax/swing/text/GlyphView,getBreakSpot

When this parameter is added, the problem goes away. - since we have ruled out enemy action by code or a concurrency issue we can be more sure this is a JIT issue. Now as part of the bug logging for this I output diagnostics for this single method, and found out that the problem, didn't occur until the method was JITted for a fifth time.

javaThing -XX:CompileCommand=print,javax/swing/text/GlyphView,getBreakSpot

Here is some diagnostic output seen with the above command:

Compiled method (c2)  914078 33142       4       javax.swing.text.GlyphView::getBreakSpot (247 bytes)
 total in heap  [0x00002aaab0749e10,0x00002aaab0750fe0] = 29136
 relocation     [0x00002aaab0749f38,0x00002aaab074a1e8] = 688
 constants      [0x00002aaab074a200,0x00002aaab074a2a0] = 160
 main code      [0x00002aaab074a2a0,0x00002aaab074cde0] = 11072
 stub code      [0x00002aaab074cde0,0x00002aaab074ce40] = 96
 oops           [0x00002aaab074ce40,0x00002aaab074ce58] = 24
 metadata       [0x00002aaab074ce58,0x00002aaab074d058] = 512
 scopes data    [0x00002aaab074d058,0x00002aaab074ea20] = 6600
 scopes pcs     [0x00002aaab074ea20,0x00002aaab0750c50] = 8752
 dependencies   [0x00002aaab0750c50,0x00002aaab0750c80] = 48
 handler table  [0x00002aaab0750c80,0x00002aaab0750e90] = 528
 nul chk table  [0x00002aaab0750e90,0x00002aaab0750fe0] = 336
OopMapSet contains 113 OopMaps
#0 
OopMap{[8]=Oop [32]=Oop [40]=Oop off=892}
#1 
OopMap{[32]=Oop [40]=Oop off=960}
#2 
OopMap{[32]=Oop [40]=Oop off=980}
#3 
OopMap{[32]=Oop [40]=Oop [48]=Oop off=1048}
#4 
OopMap{[32]=Oop [40]=Oop [48]=Oop off=1084}
#5 
OopMap{[0]=Oop [24]=Oop [48]=Oop [56]=Oop [80]=Oop off=2500}
#6 
OopMap{rbx=Oop rdi=Oop [32]=Oop [40]=Oop [112]=Oop off=2533}
#7 
OopMap{rbx=Oop rdi=Oop r14=Oop [32]=Oop [112]=Oop off=3081}
#8 
OopMap{rbx=Oop rdi=Oop r14=Oop [32]=Oop [40]=Oop [112]=Oop off=3190}
#9 
OopMap{[8]=Oop [32]=Oop [40]=Oop off=4408}
#10 
OopMap{[32]=Oop [40]=Oop [48]=Oop off=4640}
#11 
OopMap{rbp=Oop [16]=Oop [24]=Oop [40]=Oop [64]=Oop off=5232}
#12 
OopMap{rbp=Oop [0]=NarrowOop [32]=Oop off=5364}
#13 
OopMap{[32]=Oop [40]=Oop [48]=Oop off=5408}
#14 
OopMap{rbp=Oop [32]=Oop [40]=Oop [48]=Oop off=5436}
#15 
OopMap{rbp=Oop [32]=Oop [40]=Oop [48]=Oop off=5468}
#16 
OopMap{rbp=Oop [32]=Oop [40]=Oop [48]=Oop off=5524}
#17 
OopMap{rbp=Oop [32]=Oop [40]=Oop [48]=Oop [88]=Oop off=5552}
#18 
OopMap{[32]=Oop [40]=Oop [48]=Oop [64]=Oop [72]=Derived_oop_[64] [112]=Oop off=5608}
#19 
OopMap{[8]=Oop [32]=Oop off=5680}
#20 
OopMap{rbp=Oop off=5720}
#21 
OopMap{rbp=Oop off=5752}
#22 
OopMap{rbp=Oop [24]=NarrowOop [28]=NarrowOop [32]=Oop [40]=Oop [48]=Oop [56]=Oop [64]=Oop [88]=Oop off=5812}
#23 
OopMap{rbp=Oop [16]=Oop [24]=Oop [40]=Oop [64]=Oop [88]=Oop off=5960}
#24 
OopMap{[0]=Oop [24]=Oop [48]=Oop [56]=Oop [72]=Oop [88]=NarrowOop off=6056}
#25 
OopMap{[40]=Oop off=6088}
#26 
OopMap{[0]=Oop off=6120}
#27 
OopMap{[8]=Oop [24]=Oop [56]=Oop [72]=Oop [112]=Oop off=6216}
#28 
OopMap{[0]=Oop [32]=NarrowOop [40]=Oop off=6284}
#29 
OopMap{rbp=Oop [16]=Oop [40]=Oop [64]=Oop [112]=Oop off=6384}
#30 
OopMap{[0]=Oop off=6412}
#31 
OopMap{[0]=Oop [16]=Oop [32]=NarrowOop [40]=Oop [48]=Oop off=6488}
#32 
OopMap{rbp=Oop [16]=Oop [40]=Oop [48]=Oop off=6560}
#33 
OopMap{[32]=Oop [40]=Oop [48]=Oop [64]=Oop [112]=Oop off=6608}
#34 
OopMap{[8]=Oop [28]=NarrowOop [32]=Oop [40]=Oop [48]=Oop off=6768}
#35 
OopMap{rbp=NarrowOop [0]=Oop [16]=Oop [32]=Oop [40]=NarrowOop off=6860}
#36 
OopMap{[0]=Oop [16]=Oop [32]=NarrowOop [40]=Oop [48]=Oop off=6988}
#37 
OopMap{rbp=Oop [32]=Oop off=7024}
#38 
OopMap{rbp=NarrowOop [0]=Oop [24]=Oop [32]=Oop off=7260}
#39 
OopMap{rbp=NarrowOop [0]=Oop [24]=Oop [32]=Oop off=7344}
#40 
OopMap{rbp=Oop [16]=Oop [24]=Oop [40]=Oop [60]=NarrowOop [64]=Oop off=7452}
#41 
OopMap{rbp=NarrowOop [32]=Oop off=7476}
#42 
OopMap{rbp=NarrowOop [0]=Oop off=7524}
#43 
OopMap{[32]=Oop [40]=Oop [48]=Oop off=7588}
#44 
OopMap{[32]=Oop [40]=Oop [48]=Oop off=7616}
#45 
OopMap{[32]=Oop [40]=Oop [48]=Oop off=7632}
#46 
OopMap{rbp=NarrowOop [32]=Oop off=7676}
#47 
OopMap{rbp=NarrowOop [0]=Oop off=7724}
#48 
OopMap{[0]=Oop [16]=Oop [28]=NarrowOop [40]=Oop [48]=Oop [56]=NarrowOop [64]=Oop off=7868}
#49 
OopMap{[8]=Oop [28]=NarrowOop [32]=Oop [40]=Oop [48]=Oop [56]=Oop off=7916}
#50 
OopMap{rbp=Oop [16]=Oop [24]=NarrowOop off=8016}
#51 
OopMap{rbp=Oop [16]=Oop [28]=NarrowOop off=8080}
#52 
OopMap{rbp=NarrowOop [0]=Oop [24]=Oop [32]=Oop off=8152}
#53 
OopMap{rbp=Oop [8]=NarrowOop off=8212}
#54 
OopMap{rbp=NarrowOop [32]=Oop off=8236}
#55 
OopMap{rbp=Oop [16]=NarrowOop off=8272}
#56 
OopMap{rbp=NarrowOop [0]=Oop off=8320}
#57 
OopMap{rbp=Oop [12]=NarrowOop off=8360}
#58 
OopMap{rbp=NarrowOop [32]=Oop off=8400}
#59 
OopMap{rbp=Oop [12]=NarrowOop off=8460}
#60 
OopMap{rbp=NarrowOop [0]=Oop off=8508}
#61 
OopMap{rbp=Oop [24]=NarrowOop [40]=Oop off=8572}
#62 
OopMap{rbp=Oop off=8600}
#63 
OopMap{rbp=Oop [8]=Oop [28]=NarrowOop off=8640}
#64 
OopMap{rbp=Oop [8]=Oop [20]=NarrowOop [112]=Oop off=8704}
#65 
OopMap{rbp=Oop [16]=Oop [24]=Oop [48]=Oop off=8788}
#66 
OopMap{rbp=Oop [16]=Oop [24]=Oop [40]=Oop [64]=Oop off=8912}
#67 
OopMap{rbp=Oop [16]=Oop [24]=Oop [40]=Oop [64]=Oop off=9036}
#68 
OopMap{rbp=Oop [16]=Oop [24]=Oop [40]=Oop [64]=Oop off=9160}
#69 
OopMap{rbp=Oop [16]=Oop [24]=Oop [40]=Oop [64]=Oop off=9284}
#70 
OopMap{rbp=Oop [16]=Oop [24]=Oop [40]=Oop [64]=Oop off=9408}
#71 
OopMap{rbp=Oop [16]=Oop [24]=Oop [40]=Oop [64]=Oop off=9532}
#72 
OopMap{off=9556}
#73 
OopMap{off=9580}
#74 
OopMap{off=9604}
#75 
OopMap{[112]=Oop off=9628}
#76 
OopMap{rbp=Oop [8]=Oop [24]=Oop [32]=NarrowOop off=9696}
#77 
OopMap{rbp=Oop [8]=Oop [24]=NarrowOop off=9760}
#78 
OopMap{off=9784}
#79 
OopMap{off=9812}
#80 
OopMap{off=9836}
#81 
OopMap{off=9860}
#82 
OopMap{off=9884}
#83 
OopMap{off=9908}
#84 
OopMap{off=9932}
#85 
OopMap{off=9956}
#86 
OopMap{off=9980}
#87 
OopMap{off=10004}
#88 
OopMap{off=10028}
#89 
OopMap{rbp=Oop [16]=Oop [28]=NarrowOop off=10092}
#90 
OopMap{rbp=Oop [16]=Oop [24]=Oop [48]=Oop off=10176}
#91 
OopMap{off=10200}
#92 
OopMap{off=10224}
#93 
OopMap{off=10248}
#94 
OopMap{off=10272}
#95 
OopMap{off=10296}
#96 
OopMap{off=10320}
#97 
OopMap{off=10344}
#98 
OopMap{off=10368}
#99 
OopMap{off=10392}
#100 
OopMap{off=10416}
#101 
OopMap{off=10440}
#102 
OopMap{off=10464}
#103 
OopMap{off=10488}
#104 
OopMap{off=10512}
#105 
OopMap{off=10536}
#106 
OopMap{off=10560}
#107 
OopMap{off=10584}
#108 
OopMap{off=10608}
#109 
OopMap{off=10632}
#110 
OopMap{off=10656}
#111 
OopMap{off=10680}
#112 
OopMap{off=11028}
java.lang.NullPointerException
 at javax.swing.text.GlyphView.getBreakSpot(GlyphView.java:799)
 at javax.swing.text.GlyphView.getBreakWeight(GlyphView.java:724)
 at javax.swing.text.html.InlineView.getBreakWeight(InlineView.java:150)
 at javax.swing.text.FlowView$LogicalView.getPreferredSpan(FlowView.java:733)
 at javax.swing.text.FlowView.calculateMinorAxisRequirements(FlowView.java:233)
 at javax.swing.text.ParagraphView.calculateMinorAxisRequirements(ParagraphView.java:717)
 at javax.swing.text.html.ParagraphView.calculateMinorAxisRequirements(ParagraphView.java:157)
 at javax.swing.text.BoxView.checkRequests(BoxView.java:935)
 at javax.swing.text.BoxView.getMinimumSpan(BoxView.java:568)
 at javax.swing.text.html.ParagraphView.getMinimumSpan(ParagraphView.java:270)
 at javax.swing.text.BoxView.calculateMinorAxisRequirements(BoxView.java:903)

Now I am still working with the JDK team for a fix for this one; but I do feel I have discovered a useful set of tools for providing some evidence that the JIT compiler is causing my bad day. And more importantly I have a workaround so I can run my tests until this is resolved.

19 comments:

Carsten said...

A NPE is not too bad :-)

Run this (extracted from Glazedlists) on a recent 64 bit jre and you will get a crash :-(

This makes impossible to create a 64 bit Swing app. using Glazedlists.

public class CharArrayCrash {

static char[] pattern0 = { 0 };
static char[] pattern1 = { 1 };

static void test(char[] array) {
if (pattern1 == null)
return;

int i = 0;
int pos = 0;
char c = array[pos];

while (i >= 0 && (c == pattern0[i] || c == pattern1[i])) {
i--;
pos--;
if (pos != -1) {
c = array[pos];
}
// i--;
}
}

public static void main(String[] args) {
for (int i = 0; i < 1000000; i++) {
test(new char[1]);
}
}

}

Gerard Davison said...

Hi,

Do you have a tracking bug with Oracle? If not I can log something and will follow it up.

Gerard

Carsten said...

Hi,

I got the following

https://bugs.openjdk.java.net/browse/JDK-8054478

https://java.net/jira/browse/GLAZEDLISTS-564

https://github.com/glazedlists/glazedlists/pull/3

It took nearly a year to "prove" (get issue into bugs.openjdk.java.net) it was a JIT thing. And until recently a fix was only scheduled for java 9

Carsten said...

Hi,

I got the following

https://bugs.openjdk.java.net/browse/JDK-8054478

https://java.net/jira/browse/GLAZEDLISTS-564

https://github.com/glazedlists/glazedlists/pull/3

It took nearly a year to "prove" (get issue into bugs.openjdk.java.net) it was a JIT thing. And until recently a fix was only scheduled for java 9

Anonymous said...

Hi

I run into the same problem - and then the JTextpane in my program goes entirely non-deterministic (bizarre linewraps, cursor), and it seems to affect other swing components as well.


I've been trying to find the source of this bug for weeks
(I'm using a custom NavigationFilter which I thought was causing the mistake), but I can't reproduce the error..

I've been going crazy, as I am using the software is an academic research tool (so crashes are really critical..) and have spent nearly 2 weeks trying to find the error.


I looked for the stacktrace and found your page (which is the only one on the web that mentions it)

Do you have any more information on this bug? Or whether it is likely to be fixed in future versions of java?


Please help!

Unknown said...

Gerard,

Is there any tracking bug with Oracle for the JIT breaking your code issue? Can you please point me to that.

Glever said...

I think I'm hitting the same problem also. Almost gave up hope as it was occuring in my first swing app since about 8 years, thought it was due to my misunderstanding of the swing threading model.
Error occurs in subclass of DefaultTableCellRender with call on setText(value.toString() ) (first statement of "getTableCellRendererComponent()".

Started calling this "Shrodingers' bug" as everything I did to try to catch the root cause (brakepoints, adding logging,...) made the bug disappear. Therefore I thought it was a race condition.

-XX:-OmitStackTraceInFastThrow finally gave me a stacktrace pointing to a NPE in GlyphView

Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException
at javax.swing.text.GlyphView.getBreakSpot(GlyphView.java:799)
at javax.swing.text.GlyphView.getBreakWeight(GlyphView.java:724)
at javax.swing.text.html.InlineView.getBreakWeight(InlineView.java:150)
at javax.swing.text.FlowView$LogicalView.getPreferredSpan(FlowView.java:733)
... snip ....
at javax.swing.JLabel.setText(JLabel.java:330)
at be.glever.superfluity.view.renderers.MultilineTableCellRenderer.getTableCellRendererComponent(MultilineTableCellRenderer.java:14)
...snip....
at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)



Do you have any more info on this error? Is it a confirmed bug in the JIT?

-- Glen

Gerard Davison said...

For those who has asked, and sorry for the delay the Oracle bug is 19787445 and the JDK bug is JDK-8060036.

It is a confirmed JIT issues, the JVM team is working on it; but it is a hard one to reproduce.

aryes said...

I tried to look up the bug 8060036 at:
http://bugs.java.com/view_bug.do?bug_id=8060036

But it says that this is not a valid bug number.

Can you please update on its status?
thanks.

Gerard Davison said...

Ayres,

I think that not all bugs are publicly available I am afraid.

Gerard

Gerard Davison said...

All,

We are looking for more succinct test case to aid fixing this bug. At the moment we have to run around 3-4 hours of tests to be sure of hitting it.

If anybody have a smaller case then please drop me a note at gerard dot davison at oracle dot com.

Thanks,

Gerard

Gerard Davison said...

Ayres,

You can now access the bug at:

https://bugs.openjdk.java.net/browse/JDK-8060036

Gerard

Anonymous said...

I had a look a the bug-report. It says that it's on linux.

I also had this bug occurring on windows machines

Gerard Davison said...

GM,

Thanks bug is now marked as generic.

Gerard

miquelcvcv said...

This bug is so annoying-I have this bug for too much time. I wasted too much time for trying to fix it. Impossible. Maaaaaan i got crazy. I hate this java bug!!

Gerard Davison said...

Update,

A fix for JDK 9 has ben committed,

http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/53707cf9a443

And one for JDK 8 should be committed soon if no other problems arise. Thanks for the patience and test cases provided.

Gerard

WesHinsley said...

Hi Gerard,

A quick note to say thanks for posting this - I encountered this totally weird and unpredictable bug today on Windows in Java 8u5l; it was reproducible very quickly for about 2 minutes with a set of steps, then suddenly became irreproducible for an hour or two...

If I understand this page (https://bugs.openjdk.java.net/browse/JDK-8060036) rightly, 8u60 will have this fixed, which hopefully won't be too long?

Thanks again
Wes

Gerard Davison said...

Wes,

Thanks for your feedback, comments like this make sure I write up bug like this again!

Gerard

Markus Schlegel said...

Hi Gerard
We have encountered this bug too and were searching for the cause for weeks (since we always saw the AIOOB Exception only...). After discovering the -XX:-OmitStackTraceInFastThrow option, it quickly directed us to your blog.
Finding it directly in the JDK's Bugsystem is IMPOSSIBLE, since they unfortunately do not include enough user-comments about it (the AIOOB Exception does not occur there).
So many many thanks to you for blogging about this!
Markus