注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

网易杭州 QA Team

务实 专注 分享 做有态度的QA

 
 
 
 
 

日志

 
 

性能测试、高并发长连接下端口号用尽问题的原因及解决  

来自Linsa.Liu   2011-09-22 11:54:26|  分类: 性能测试 |举报 |字号 订阅

  下载LOFTER 我的照片书  |

1. 在云存储linkserver性能测试过程中,并发3W连接后,grinder抛出大量异常日志。



Caused by: java.net.NoRouteToHostException: Cannot assign requested address
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
        at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
        at java.net.Socket.connect(Socket.java:529)
        at java.net.Socket.connect(Socket.java:478)
        at java.net.Socket.<init>(Socket.java:375)
        at java.net.Socket.<init>(Socket.java:218)
        at HTTPClient.HTTPConnection.getSocket(HTTPConnection.java:3386)
        at HTTPClient.HTTPConnection.sendRequest(HTTPConnection.java:3082)
        at HTTPClient.HTTPConnection.handleRequest(HTTPConnection.java:2876)
        at HTTPClient.HTTPConnection.setupRequest(HTTPConnection.java:2668)
        at HTTPClient.HTTPConnection.Post(HTTPConnection.java:1170)

从异常堆栈来看是客户端无法为新连接分配地址。

2. 查了一些相关资料,如下:



Another important ramification of the ephemeral port range is that it limits the maximum
number of connections from one machine to a specific service on a remote machine!
The TCP/IP protocol uses the connection's 4-tuple to distinguish between connections,
so if the ephemeral port range is only 4000 ports wide, that means that there can
only be 4000 unique connections from a client machine to a remote service at one time.
So maybe you run out of available ports. To get the number of available ports, see
$ cat /proc/sys/net/ipv4/ip_local_port_range
32768   61000
The output is from my Ubuntu system, where I'd have 28,232 ports for client connections.
Hence, your test would fail as soon as you have 280+ clients.

参考网址:
http://stackoverflow.com/questions/1572215/how-to-avoid-a-noroutetohostexception

以上大致的意思是指系统端口被用尽,导致新的连接无法分配到端口地址。

3. 查看了一下我的测试机的端口占用情况:



tcp6       0      0 172.19.2.195:45129      172.19.0.89:8082        ESTABLISHED
tcp6       0      0 172.19.2.195:45128      172.19.0.89:8082        ESTABLISHED
tcp6       0      0 172.19.2.195:45131      172.19.0.89:8082        TIME_WAIT
tcp6       0      0 172.19.2.195:45130      172.19.0.89:8082        ESTABLISHED
tcp6       0      0 172.19.2.195:45141      172.19.0.89:8082        ESTABLISHED
tcp6       0      0 172.19.2.195:45140      172.19.0.89:8082        ESTABLISHED
tcp6       0      0 172.19.2.195:45143      172.19.0.89:8082        ESTABLISHED
tcp6       0      0 172.19.2.195:45142      172.19.0.89:8082        ESTABLISHED
tcp6       0      0 172.19.2.195:45137      172.19.0.89:8082        ESTABLISHED
tcp6       0      0 172.19.2.195:45136      172.19.0.89:8082        TIME_WAIT
tcp6       0      0 172.19.2.195:45139      172.19.0.89:8082        ESTABLISHED

有大量ESTABLISHED 状态的TCP6连接,并且有若干TIME_WAIT的状态。

端口占用大概在5W6以上。
qatest@db-62:~$ netstat -an|wc -l
56179

而测试机端口数为:
qatest@db-62:~$ cat /proc/sys/net/ipv4/ip_local_port_range
8192 65535

可见,端口基本被用尽。

4. 问题解决

  1. 根据TCP/IP协议,连接断开之后,端口不会立刻被释放,而是处于TIME_WAIT状态,等待60s后(貌似/proc/sys/net/ipv4/tcp_fin_timeout配置),才会被释放掉,才能被新连接使用。
    而性能测试并发了3W连接,每个连接关闭后,grinder又迅速创建新的连接,这时已关闭的连接所占用的端口实际是TIME_WIAT状态,未被释放,不能为新的连接所使用,当所有的端口号均被占用之后,新建连接因为无法分配到端口号而失败。
  2. 修改tpc/ip协议配置,通过配置TCP_TW_REUSE参数,来释放TIME_WAIT状态的端口号给新连接使用
    /proc/sys/net/ipv4/tcp_tw_reuse
    (boolean, default: 0)

    Note: The tcp_tw_reuse setting is particularly useful in environments where numerous short connections are open and left in TIME_WAIT state, such as web servers. Reusing the sockets can be very effective in reducing server load.

  3. 同时修改   /proc/sys/net/ipv4/tcp_tw_recycle
    (boolean, default: 0)
     

    TCP_TW_RECYCLE
    It enables fast recycling of TIME_WAIT sockets. The default value is 0 (disabled). The sysctl documentation incorrectly states the default as enabled. It can be changed to 1 (enabled) in many cases. Known to cause some issues with hoststated (load balancing and fail over) if enabled, should be used with caution.

    参考资料:
    http://www.speedguide.net/articles/linux-tweaking-121
  4. 设置参数后,重新测试,不再出现异常情况。
5. 长连接服务器的性能测试中, 修改以上两个参数可以解决问题。但是在并发短连接情况下,还不足以解决问题。比如短连接10ms的情况下,仍然会出现端口号用尽的情况,这个需要修改TIME_WAIT时间,需要进一步调研。

 本人对tpc/ip协议底层配置是一知半解,大家有兴趣多多指教~~O(∩_∩)O~

  评论这张
 
阅读(3368)| 评论(1)
推荐 转载

历史上的今天

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2016