快速找到Tomcat中最耗CPU的线程

1、找出TOMCAT的JVM的进程ID

[work@112542000 ~]$ jps                                                                                                                                         
290 Bootstrap                                                                                                                                                   
61213 Jps                                                                                                                                                       

2、查看该进程中,最耗费CPU的线程

[work@112542000 ~]$ ps -mp 290 -o THREAD,tid,time | sort -k2 -r | head -n 20                                                                                    
USER     %CPU PRI SCNT WCHAN  USER SYSTEM   TID     TIME                                                                                                        
work     15.3   -    - -         -      -     - 03:11:58                                                                                                        
work      1.2  19    - -         -      -   326 00:15:45                                                                                                        
work      0.2  19    - -         -      -   872 00:03:44                                                                                                        
work      0.1  19    - -         -      -   992 00:01:44                                                                                                        
work      0.1  19    - -         -      -   972 00:01:16                                                                                                        
work      0.1  19    - -         -      -   870 00:01:19                                                                                                        
work      0.1  19    - -         -      -   869 00:01:34                                                                                                        
work      0.0  19    - -         -      -  9993 00:00:00                                                                                                        
work      0.0  19    - -         -      -   997 00:00:06                                                                                                        
work      0.0  19    - -         -      -  9969 00:00:00                                                                                                        
work      0.0  19    - -         -      -  9968 00:00:00                                                                                                        
work      0.0  19    - -         -      -   996 00:00:34                                                                                                        
work      0.0  19    - -         -      -  9960 00:00:00                                                                                                        
work      0.0  19    - -         -      -   995 00:00:00                                                                                                        
work      0.0  19    - -         -      -  9944 00:00:00                                                                                                        
work      0.0  19    - -         -      -   994 00:00:31                                                                                                        
work      0.0  19    - -         -      -  9936 00:00:00                                                                                                        
work      0.0  19    - -         -      -  9934 00:00:26                                                                                                        
work      0.0  19    - -         -      -  9933 00:00:21                                                                                                        

3、以TID==326为例,查看该线程的堆栈

首先,将十进制的326转换成十六进制,可以在线转换:

http://tool.oschina.net/hexconvert

结果等于146

使用jstack查询该线程堆栈:

[work@112542000 ~]$ jstack 290 | grep "0x160" -A 10                                                                                                             
"Timer-2955" daemon prio=10 tid=0x00007f39d3ecd000 nid=0x1602 in Object.wait() [0x00007f38fb273000]                                                             
   java.lang.Thread.State: TIMED_WAITING (on object monitor)                                                                                                    
        at java.lang.Object.wait(Native Method)                                                                                                                 
        at java.util.TimerThread.mainLoop(Timer.java:552)                                                                                                       
        - locked <0x00000007c2a0fc40> (a java.util.TaskQueue)                                                                                                   
        at java.util.TimerThread.run(Timer.java:505)                                                                                                            
                                                                                                                                                                
"Timer-2954" daemon prio=10 tid=0x00007f392481e000 nid=0x15e4 in Object.wait() [0x00007f38fbe7f000]                                                             
   java.lang.Thread.State: TIMED_WAITING (on object monitor)                                                                                                    
        at java.lang.Object.wait(Native Method)                                                                                                                 
        at java.util.TimerThread.mainLoop(Timer.java:552)                                                                                                       
--                                                                                                                                                              
"Timer-2132" daemon prio=10 tid=0x00007f39d367d000 nid=0x1603 in Object.wait() [0x00007f3933f46000]                                                             
   java.lang.Thread.State: TIMED_WAITING (on object monitor)                                                                                                    
        at java.lang.Object.wait(Native Method)                                                                                                                 
        at java.util.TimerThread.mainLoop(Timer.java:552)                                                                                                       
        - locked <0x00000007b44ef3c8> (a java.util.TaskQueue)                                                                                                   
        at java.util.TimerThread.run(Timer.java:505)                                                                                                            
                                                                                                                                                                
"Timer-2131" daemon prio=10 tid=0x00007f39d3b61800 nid=0x15e6 in Object.wait() [0x00007f3934d54000]                                                             
   java.lang.Thread.State: TIMED_WAITING (on object monitor)                                                                                                    
        at java.lang.Object.wait(Native Method)                                                                                                                 
        at java.util.TimerThread.mainLoop(Timer.java:552)                                                                                                       
--                                                                                                                                                              
"Timer-450" daemon prio=10 tid=0x00007f39d0728800 nid=0x160 in Object.wait() [0x00007f39b1312000]                                                               
   java.lang.Thread.State: TIMED_WAITING (on object monitor)                                                                                                    
        at java.lang.Object.wait(Native Method)                                                                                                                 
        at java.util.TimerThread.mainLoop(Timer.java:552)                                                                                                       
        - locked <0x00000007bddf8788> (a java.util.TaskQueue)                                                                                                   
        at java.util.TimerThread.run(Timer.java:505)                                                                                                            
                                                                                                                                                                
"Timer-449" daemon prio=10 tid=0x00007f39ec01f000 nid=0xffc7 in Object.wait() [0x00007f39af5f5000]                                                              
   java.lang.Thread.State: TIMED_WAITING (on object monitor)                                                                                                    
        at java.lang.Object.wait(Native Method)                                                                                                                 
        at java.util.TimerThread.mainLoop(Timer.java:552)                                                                                                       

 

参考资料:

http://scau-fly.iteye.com/blog/1884606

http://www.blogjava.net/hankchen/archive/2012/05/09/377735.html

 

本文地址:http://crazyant.net/2145.html

通过JVM堆栈分析出现大量线程的原因

最近收到线上Tomcat线程数目超出的报警,于是想要分析下问题的原因:

首先进入线上,使用ps -aux命令,查看jvm进程,可以得到运行tomcat的jdk的地址:

/home/work/app/.jdk/bin/java

于是就知道了jdk的jstack、jps等命令的目录,然后找到jvm进程

/home/work/app/.jdk/bin/jps
29145 Jps
208 Bootstrap

得到了jvm的tomcat进程是208;

把堆栈导出,下载到本地:

jstack -l 208 > log.txt

下载后,发现线程堆栈中,有大量的这样的日志:

"pool-103-thread-1" prio=10 tid=0x00007f038001e000 nid=0x759d waiting on condition [0x00007f022e5e4000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000912fab28> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
	at java.lang.Thread.run(Thread.java:662)

"pool-102-thread-1" prio=10 tid=0x00007f0380011000 nid=0x71ed waiting on condition [0x00007f022e6e5000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000912fa170> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
	at java.lang.Thread.run(Thread.java:662)

可以看到,线程处于WAITING状态,阻塞在试图从任务队列中取任务(LinkedBlockingQueue.take),这个任务队列指的是ThreadPoolExecutor的线程池启动的线程任务队列;

也就是说,这些线程都是空闲状态,在等着任务的到来呢!

补充下LinkedBlockingQueue的知识:

并发库中的BlockingQueue是一个比较好玩的类,顾名思义,就是阻塞队列。该类主要提供了两个方法put()和take(),前者将一个对象放到队列尾部,如果队列已经满了,就等待直到有空闲节点;后者从head取一个对象,如果没有对象,就等待直到有可取的对象。 

定位到问题就简单了,查找代码,发现有个位置启动了线程池提交了任务,但是任务执行完返回后,线程池没有关闭导致的;

问题总结:

1、使用ExecutorService提交的线程任务,也要记得关闭;

2、启动新线程的时候,最好给线程起个名字,这样线程堆栈的问题排查更加容易;

 

文章地址:http://crazyant.net/1858.html,转载请注明来源

Java堆溢出OutOfMemoryError之代码实例和原因分析

本文演示了编写代码使得出现”java.lang.OutOfMemoryError: Java heap space”异常,分析GC日志得出OOM的原因,同时对堆转储文件进行分析,以查看把Heap塞满的罪魁祸首;

实例代码

这段代码来自《深入理解Java虚拟机-JVM高级特性与最佳实践》一书:

package jvmtest;

import java.util.ArrayList;
import java.util.List;

/**
 * VM Args:-Xms20m -Xmx20m -XX:+HeapDumpOnOutOfMemoryError
 * 
 * @author zzm
 */
public class HeapOOM {
    
    static class OOMObject {
    }
    
    public static void main(String[] args) {
        List<OOMObject> list = new ArrayList<OOMObject>();
        
        while (true) {
            list.add(new OOMObject());
        }
    }
}

在eclipse运行该代码时,需要设置堆size的最小值和最大值,同时使用-XX:+PrintGCDetails参数开启GC日志打印,使用-XX:+HeapDumpOnOutOfMemoryError参数当OOM时转储堆数据

-verbose:gc -Xms20M -Xmx20M -Xmn10M -XX:+PrintGCDetails -XX:SurvivorRatio=8 -XX:+HeapDumpOnOutOfMemoryError

在eclipse中设置vm arguments参数为了测试OOM

然后点运行,会出现以下的GC日志、异常、堆信息、转储信息

[GC[DefNew: 7640K->1024K(9216K), 0.0164901 secs] 7640K->4593K(19456K), 0.0166075 secs] [Times: user=0.02 sys=0.00, real=0.02 secs] 
[GC[DefNew: 9216K->1024K(9216K), 0.0190825 secs] 12785K->10510K(19456K), 0.0191593 secs] [Times: user=0.02 sys=0.00, real=0.02 secs] 
[GC[DefNew: 9216K->9216K(9216K), 0.0000369 secs][Tenured: 9486K->5704K(10240K), 0.0520898 secs] 18702K->14129K(19456K), [Perm : 148K->148K(12288K)], 0.0522306 secs] [Times: user=0.06 sys=0.00, real=0.05 secs] 
[Full GC[Tenured: 5704K->5704K(10240K), 0.0435436 secs] 14616K->14616K(19456K), [Perm : 148K->148K(12288K)], 0.0436766 secs] [Times: user=0.03 sys=0.00, real=0.04 secs] 
[Full GC[Tenured: 5704K->5695K(10240K), 0.0499650 secs] 14616K->14606K(19456K), [Perm : 148K->147K(12288K)], 0.0500832 secs] [Times: user=0.06 sys=0.00, real=0.05 secs] 
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid12428.hprof ...
Heap dump file created [29715028 bytes in 0.335 secs]
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOf(Arrays.java:2245)
	at java.util.Arrays.copyOf(Arrays.java:2219)
	at java.util.ArrayList.grow(ArrayList.java:242)
	at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:216)
	at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:208)
	at java.util.ArrayList.add(ArrayList.java:440)
	at jvmtest.HeapOOM.main(HeapOOM.java:20)
Heap
 def new generation   total 9216K, used 8921K [0x33640000, 0x34040000, 0x34040000)
  eden space 8192K, 100% used [0x33640000, 0x33e40000, 0x33e40000)
  from space 1024K,  71% used [0x33e40000, 0x33ef6600, 0x33f40000)
  to   space 1024K,   0% used [0x33f40000, 0x33f40000, 0x34040000)
 tenured generation   total 10240K, used 5695K [0x34040000, 0x34a40000, 0x34a40000)
   the space 10240K,  55% used [0x34040000, 0x345cfc28, 0x345cfe00, 0x34a40000)
 compacting perm gen  total 12288K, used 150K [0x34a40000, 0x35640000, 0x38a40000)
   the space 12288K,   1% used [0x34a40000, 0x34a658c8, 0x34a65a00, 0x35640000)
    ro space 10240K,  44% used [0x38a40000, 0x38eb73f0, 0x38eb7400, 0x39440000)
    rw space 12288K,  52% used [0x39440000, 0x39a8dd28, 0x39a8de00, 0x3a040000)

GC日志分析

从图中可以看出,发生了3次GC和2次FULL GC,当两次FULL GC完成后,仍然发现没有空间,于是抛出了OOM异常;

首先贴一下GC和FULL GC的日志格式:

young gc log format

full gc log format

还有jvm heap的内存结构

jvm-memory-model

在正常运行状况下:

  1. 新的对象都在eden上创建
  2. 如果eden的大小达到了阈值,就会触发GC,运行“标记-复制”算法,将Eden中有效的对象复制到Survivor中,并清除eden中无效的对象
  3. 如果eden的大小再次达到阈值,就会触发GC,将对象从eden对象复制到survivor,如果survivor中的对象达到了年龄限制,就会复制到old区;

如果young区向old区复制对象的时候,发现old区的空间无法满足,就会触发FULL GC,如果FULL GC之后,仍然无法承载Young区要晋升的对象大小,那么就会抛出OOM;

从Heap开头的日志可以看到eden区已经占用了100%,from survivor区也占用了71%无法承载来自eden的对象,eden+from survivor区对象之和为8192K+1024K*71%=8919.04K,但是old区只剩下10240*(1-55%)=的空间4608K的空间;新的对象无法在100%占用率的eden区创建,eden+survivor的对象又不能复制到old区,所以OOM;

注意:本文的代码很特殊,每个HeapOOM对象都没有被释放,正常情况下,eden很多无用对象是会被清除掉的,晋升到OLD的大小也不等于eden+survivor的大小之和;

堆转储文件分析

由于加上了-XX:+HeapDumpOnOutOfMemoryError参数,所以OOM时,自动生成了一个java_pid12464.hprof的堆转储文件,给eclipse安装Memory Analyzer插件后,直接可以用该插件打开;

打开后如下图所示

oom-memory-analyzer

可以看到main函数线程占用了13.9MB几乎所有的Heap空间,在饼图上右键 >  list objects > with outgoing references,可以查看该部分包含了那些对象:

oom-object-list

打开后发现,一个Obejct数组中包含了1215488个元素,每个元素都是HeapOOM对象,每个元素(引用类型)为8byte,这就是让堆OOM的罪魁祸首;

结论

  • 使用JVM参数-XX:+PrintGCDetails可以打印GC日志
  • 使用JVM参数-XX:+HeapDumpOnOutOfMemoryError可以在OOM时打印堆转储文件
  • 使用eclipse插件Memory Analyzer可以分析对转储文件
  • OOM的原因,是Old区在FULL GC之后的剩余空间,仍然无法承载Young区要晋升的对象大小

本文地址:http://crazyant.net/1810.html,转载请注明