HotSpot问题诊断HotSpot即OpenJDK也是Sun/Oracle官方JDK标准JVM实现。所以我们最常用的JVM也就是HotSpot了,对于能够查找HotSpot相关的问题就相当重要,对于问题我们需要根据现象判断,对现象简单分类如下: 一、HotSpot异常退出这个问题比较棘手,因为通常JVM异常退出是JVM自身的Bug,所以一般需要读懂JVM源代码才能够找到/解决这类Bug。 也有非Bug引起的JVM异常退出的问题,例如: 二、JVM加载的类有问题常常在一个应用里有同一个jar包的不同版本,但我们无法断定当前JVM加载的是哪个版本,这时候可以通过 classdump 打印URLClassLoader中加载的所有jar包,并列出他们的顺序。而我们实现的使用场景中常常还会使用类增强的手段,如使用ASM、javasist等,这时候需要分析内存中的字节码,此时也可以使用 classdump 将对应的类输出到磁盘文件。 $ java -jar classdumpall.jar <PID> -filter org.apache.zookeeper.server.persistence.File* [INFO] Add system classloader jar url: /usr/java/jdk1.6.0_25/lib/tools.jar [INFO] Add system classloader jar url: /usr/java/jdk1.6.0_25/lib/sa-jdi.jar Attaching to process ID 23537, please wait... Debugger attached successfully. Server compiler detected. JVM version is 20.0-b11 [INFO] Output directory: . [INFO] Dump class: org.apache.zookeeper.server.persistence.FileTxnSnapLog$PlayBackListener @ sun.misc.Launcher$AppClassLoader // ^-- 类名 ^-- 加载的ClassLoader [INFO] jar: /home/admin/storm/zookeeper-3.3.5/zookeeper-3.3.5.jar // ^-- 类加载的Jar包 [INFO] Dump class: org.apache.zookeeper.server.persistence.FileTxnLog$PositionInputStream @ sun.misc.Launcher$AppClassLoader [INFO] jar: /home/admin/storm/zookeeper-3.3.5/zookeeper-3.3.5.jar [INFO] Dump class: org.apache.zookeeper.server.persistence.FileTxnSnapLog @ sun.misc.Launcher$AppClassLoader [INFO] jar: /home/admin/storm/zookeeper-3.3.5/zookeeper-3.3.5.jar [INFO] Dump class: org.apache.zookeeper.server.persistence.FileTxnLog @ sun.misc.Launcher$AppClassLoader [INFO] jar: /home/admin/storm/zookeeper-3.3.5/zookeeper-3.3.5.jar [INFO] Dump class: org.apache.zookeeper.server.persistence.FileHeader @ sun.misc.Launcher$AppClassLoader [INFO] jar: /home/admin/storm/zookeeper-3.3.5/zookeeper-3.3.5.jar [INFO] Dump class: org.apache.zookeeper.server.persistence.FileSnap @ sun.misc.Launcher$AppClassLoader [INFO] jar: /home/admin/storm/zookeeper-3.3.5/zookeeper-3.3.5.jar [INFO] Dump class: org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator @ sun.misc.Launcher$AppClassLoader [INFO] jar: /home/admin/storm/zookeeper-3.3.5/zookeeper-3.3.5.jar Dump出来的类路径如下: . `-- org `-- apache `-- zookeeper `-- server `-- persistence |-- FileHeader.class |-- FileSnap.class |-- FileTxnLog$FileTxnIterator.class |-- FileTxnLog$PositionInputStream.class |-- FileTxnLog.class |-- FileTxnSnapLog$PlayBackListener.class `-- FileTxnSnapLog.class 现在你可以通 javap 反编译出字节码: $ javap -p -v FileHeader Compiled from "FileHeader.java" public class org.apache.zookeeper.server.persistence.FileHeader extends java.lang.Object implements org.apache.jute.Record SourceFile: "FileHeader.java" minor version: 0 major version: 49 Constant pool: const #1 = Method #43.#112; // java/lang/Object."<init>":()V ... ... public org.apache.zookeeper.server.persistence.FileHeader(int, int, long); Code: Stack=3, Locals=5, Args_size=4 0: aload_0 1: invokespecial #1; //Method java/lang/Object."<init>":()V 4: aload_0 5: iload_1 6: putfield #2; //Field magic:I 9: aload_0 10: iload_2 11: putfield #3; //Field version:I 14: aload_0 15: lload_3 16: putfield #4; //Field dbid:J 19: return LineNumberTable: line 33: 0 line 34: 4 line 35: 9 line 36: 14 line 37: 19 LocalVariableTable: Start Length Slot Name Signature 0 20 0 this Lorg/apache/zookeeper/server/persistence/FileHeader; 0 20 1 magic I 0 20 2 version I 0 20 3 dbid J 你也可以通过 JD|Java Decompiler 查看源代码了。 三、应用停止处理业务逻辑这种情况需要查看服务器Load是不是Load过高导致不处理正常业务逻辑,参看 [应用Load过高] 或 [应用占用内存过高] 。如果Load正常且内存占用也正常则需要通过命令 jstack 查看堆栈情况,分析程序未正常工作的原因。此类情况有可能是:
public static void main(String[] args) { final Lock lock1 = new ReentrantLock(); final Lock lock2 = new ReentrantLock(); Thread t1 = new Thread() { public void run() { lock1.lock(); try { lock2.lock(); try { // code } finally { lock2.unlock(); } } finally { lock1.unlock(); } } }; Thread t2 = new Thread() { public void run() { lock2.lock(); try { lock1.lock(); try { // code } finally { lock1.unlock(); } } finally { lock2.unlock(); } } }; t1.start(); t2.start(); } 通过 jstack可以看到:Found one Java-level deadlock: ============================= "Thread-2": waiting for ownable synchronizer 7f31a0208, (a java.util.concurrent.locks.ReentrantLock$NonfairSync), which is held by "Thread-1" "Thread-1": waiting for ownable synchronizer 7f31a0238, (a java.util.concurrent.locks.ReentrantLock$NonfairSync), which is held by "Thread-2" Java stack information for the threads listed above: =================================================== "Thread-2": at sun.misc.Unsafe.park(Native Method) - parking to wait for <7f31a0208> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262) at me.hatter.tool.a.tests.classes.LockL$2.run(LockL.java:41) "Thread-1": at sun.misc.Unsafe.park(Native Method) - parking to wait for <7f31a0238> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262) at me.hatter.tool.a.tests.classes.LockL$1.run(LockL.java:21) Found 1 deadlock.
public static void main(String[] args) { final ExecutorService es = Executors.newFixedThreadPool(1); es.submit(new Runnable() { public void run() { System.out.println("AAAAAA"); Future<?> f = es.submit(new Runnable() { public void run() { System.out.println("BBBBBB"); } }); try { f.get(); } catch (Exception e) { e.printStackTrace(); } System.out.println("CCCCCC"); } }); System.out.println("DDDDDD"); } 在这里的线程池被共用,但当线程池为1时,共用该线程池导致无法处理正常业务逻辑。 如:URLConnection在使用时需要同时设置 void setConnectTimeout(int timeout) 和 void setReadTimeout(int timeout),否则有可能在使用时出现客户端等待数据但网络连接一直保持的情况。使用HttpClient时也需要设置类似的两个参数(生产环境悲剧过,大家要引以为戒)。 四、应用Load过高这种情况可以使用 jtop 查看,通过对堆栈对CPU使用量排序,查找出问题的Java代码。 Load如何计算: CPU,IO etc.
通过 jtop 查找CPU最高的线程$java -jar jtop.jar -thread 3 -stack 8 <PID> DefaultQuartzScheduler_Worker-8 TID=86 STATE=RUNNABLE CPU_TIME=2110 (99.90%) USER_TIME=2110 (99.90%) Allocted: 0 java.util.WeakHashMap.put(WeakHashMap.java:405) org.aspectj.weaver.Dump.registerNode(Dump.java:253) org.aspectj.weaver.World.<init>(World.java:150) org.aspectj.weaver.reflect.ReflectionWorld.<init>(ReflectionWorld.java:50) org.aspectj.weaver.tools.PointcutParser.setClassLoader(PointcutParser.java:221) org.aspectj.weaver.tools.PointcutParser.<init>(PointcutParser.java:207) org.aspectj.weaver.tools.PointcutParser.getPointcutParserSupportingSpecifiedPrimitivesAndUsingContextClassloaderForResolution(PointcutParser.java:128) org.springframework.aop.aspectj.AspectJExpressionPointcut.<init>(AspectJExpressionPointcut.java:100) ...
如果排除Java线程占用CPU高,但JVM占用CPU高时则说明是JVM Native线程占用CPU高了,这时需要通过 pstack 或 gdb -ex "set pagination 0" -ex "thread apply all bt" --batch -p <PID> 来查找问题了。$ jstack <PID> | grep 'VM Thread' "VM Thread" prio=10 tid=0x000000004288a000 nid=0x4b4e runnable // ^-- native thread id = 19278再通过 gdb (或 pstack)打印堆栈信息:$ gdb -ex "set pagination 0" -ex "thread apply all bt" --batch -p <PID> | grep <LWPID> -A 10 Thread 49 (Thread 0x40781940 (LWP 19278)): // ^-- light weight process id = 19278 #0 0x0000003bc8a0ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002ad0cd0370e6 in os::PlatformEvent::park () from /usr/java/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so #2 0x00002ad0cd008abb in Monitor::IWait () from /usr/java/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so #3 0x00002ad0cd00914e in Monitor::wait () from /usr/java/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so #4 0x00002ad0cd193df3 in VMThread::loop () from /usr/java/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so #5 0x00002ad0cd1939ee in VMThread::run () from /usr/java/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so #6 0x00002ad0cd03796f in java_start () from /usr/java/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so #7 0x0000003bc8a06367 in start_thread () from /lib64/libpthread.so.0 #8 0x0000003bc7ed30ad in clone () from /lib64/libc.so.6 五、应用占用内存过高,发生OOME异常遇到这种问题我们首先需要判断是哪个区内存OOME。 JVM内存主要分为:
通过增加以下JVM参数,打开GC的相关详细信息到文件: -Xloggc:gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 通过为JVM增加参数,可以在JVM发生OOM的时候将内部数据Dump到文件,输出同命令 jmap : -XX:+HeapDumpOnOutOfMemoryError 可以通过 jstat -gcutil 查找各个代的使用情况。
S0 S1 E O P YGC YGCT FGC FGCT GCT 0.00 100.00 6.47 0.21 81.13 1 0.015 0 0.000 0.015
Heap区对象创建过多,一般情况是程序的问题,也有可能是 -Xmx 大小设计过小。这种情况也可使用 jtop 查看,通过参数 --sortmem ,查找对的Java代码。还可以通过 jmap -histo 查找每个类的实例数据及占用的内存大小,也可通过 histodiff 查找该断时间内创建的对象情况。命令 jmap -histo:live <PID> 可以强制JVM做GC,通过多次这个命令可以看看内存中的对象是否会被回收。如果最终还是没有办法,可通过命令 jmap -dump:format=b,file=<file name> <PID> 将内存映象层出为文件,然后通过 mat 分析。
正常情况下Perm区的大小应该大于 <size of lib> * 2 ,如你的lib大小为 80M,则你的Perm需要设置 >160M,生产环境应用一般设置 -XX:PermSize=256M,当然具体需要视应用情况而定(需要分析应用占用Perm区的情况)。
通过 directbufferana 查看当前JVM的 Direct Buffer 使用情况,如命令 java -jar directbufferanaall.jar <PID> 输出:NIO direct memory: (in bytes) reserved size = 0.014094 MB (14779 bytes) max size = 123.937500 MB (129957888 bytes) malloc'd size = 0.014001 MB (14681 bytes) 进一步学习
参考资料[1]. Study_Java_HotSpot_OOME |
||||