Puppet: System Administration Automated

Support

Ticket #1095 (closed defect: fixed)

Opened 9 months ago

Last modified 7 months ago

Puppetmaster leaving half-open connections

Reported by: fs Assigned to: community
Priority: normal Milestone: 0.24.5
Component: library Version: 0.24.4
Severity: normal Keywords:
Cc: Triage Stage: Needs more info
Attached Patches: None Complexity: Unknown

Description

After a period of time ranging from a few hours to several days, puppetmaster begins leaving half open TCP connections in a CLOSE_WAIT state. It usually seems to happen to connections from clients, though at least once I've seen it hit the database connection (MySQL). Here's an example:

[root@lorien ~]# lsof -i | grep 8140
puppetd   13420     root    7u  IPv4 48150014       TCP lorien.wpi.edu:52225->lorien.wpi.edu:8140 (ESTABLISHED)
puppetmas 13744   puppet   10u  IPv4 47981997       TCP *:8140 (LISTEN)
puppetmas 13744   puppet  205u  IPv4 48146861       TCP lorien.wpi.edu:8140->DELENN.WPI.EDU:63688 (CLOSE_WAIT)
puppetmas 13744   puppet  206u  IPv4 48145681       TCP lorien.wpi.edu:8140->IVANOVA.WPI.EDU:54630 (CLOSE_WAIT)
puppetmas 13744   puppet  208u  IPv4 48146636       TCP lorien.wpi.edu:8140->DELENN.WPI.EDU:63687 (CLOSE_WAIT)
puppetmas 13744   puppet  210u  IPv4 48146848       TCP lorien.wpi.edu:8140->IVANOVA.WPI.EDU:58605 (CLOSE_WAIT)

Once puppetmaster starts leaking sockets like this, it seems unable to answer any new requests. In this example, you can see that the puppet client on the local machine (lorien) has opened a connection to puppetmaster, but puppetmaster has not responded. None of the log files on either master or client show that any progress has been made.

Sending a HUP to the server generates "Restarting" and "Shutting down" messages in syslog, but it never restarts. lsof shows that there are puppetmaster processes hanging around keeping the original set of half open sockets open, but nothing is listening for new connections anymore:

[root@lorien ~]# lsof -i | grep 8140
puppetmas 13744   puppet  205u  IPv4 48146861       TCP lorien.wpi.edu:8140->DELENN.WPI.EDU:63688 (CLOSE_WAIT)
puppetmas 13744   puppet  206u  IPv4 48145681       TCP lorien.wpi.edu:8140->IVANOVA.WPI.EDU:54630 (CLOSE_WAIT)
puppetmas 13744   puppet  208u  IPv4 48146636       TCP lorien.wpi.edu:8140->DELENN.WPI.EDU:63687 (CLOSE_WAIT)
puppetmas 13744   puppet  210u  IPv4 48146848       TCP lorien.wpi.edu:8140->IVANOVA.WPI.EDU:58605 (CLOSE_WAIT)

A full restart of puppetmaster appears to be the only way to get things flowing again.

This is on 0.24.1 plus the patch from ticket 959. Let me know what other debugging info you'd like me to gather up.

Change History

02/26/08 14:14:22 changed by DavidS

Please supply at least your Distro and Ruby version. It might also help to run the puppetmasterd with --trace and watch whether some hidden error occurs.

02/29/08 08:25:10 changed by jamtur01

  • stage changed from Unreviewed to Needs more info.

04/08/08 17:55:46 changed by luke

  • status changed from new to closed.
  • resolution set to fixed.

I expect that this is fixed now that we're no longer using keep-alive.

Please reopen if it's still a problem in 0.24.4.

05/07/08 20:40:46 changed by fs

  • version changed from 0.24.1 to 0.24.4.

It looks like I'm seeing the same problem creeping back in on 0.24.4 running on Fedora 8. I tried switching from webbrick to mongrel behind an apache load balancer, and still had the same symptoms occur.

[fs@lorien webrick]$ rpm -q puppet puppet-0.24.4-1.fc8 [fs@lorien webrick]$ rpm -q ruby ruby-1.8.6.114-1.fc8

I have puppetmaster running with debug and trace on right now; I'll post the captured output once the problem reoccurs.

05/07/08 22:17:40 changed by luke

  • milestone set to 0.24.5.

05/09/08 14:16:39 changed by fs

I was able to reproduce the problem with trace and debug, but there is no additional information or any kind of errors in the output. I can still post the entire transcript if you like, though.

Any suggestions on how else to go about gathering troubleshooting information?

05/09/08 20:16:07 changed by fs

More evidence that the OS is accepting the connection, but puppetmaster is never noticing:

[root@lorien ~]# lsof -i | grep 8140
puppetmas 14849 puppet    9u  IPv4 121846444       TCP *:8140 (LISTEN)
puppetmas 14849 puppet  202u  IPv4 122681142       TCP lorien.wpi.edu:8140->NOC1.WPI.EDU:57493 (CLOSE_WAIT)
puppetmas 14849 puppet  206u  IPv4 122681360       TCP lorien.wpi.edu:8140->GKAR.WPI.EDU:61305 (CLOSE_WAIT)
puppetmas 14849 puppet  208u  IPv4 122681924       TCP lorien.wpi.edu:8140->NOC1.WPI.EDU:57494 (CLOSE_WAIT)
puppetmas 14849 puppet  209u  IPv4 122681930       TCP lorien.wpi.edu:8140->GKAR.WPI.EDU:61306 (CLOSE_WAIT)
puppetd   14961   root    7u  IPv4 123868883       TCP lorien.wpi.edu:60217->lorien.wpi.edu:8140 (ESTABLISHED)
[root@lorien ~]# netstat --inet | grep 8140
tcp        0      0 lorien.wpi.edu:8140         lorien.wpi.edu:60217        SYN_RECV    
tcp        0      0 lorien.wpi.edu:8140         LENNIER.WPI.EDU:50599       SYN_RECV    
tcp        0      0 lorien.wpi.edu:8140         centos.wpi.edu:52918        SYN_RECV    
tcp        0      0 lorien.wpi.edu:8140         ERWIN.WPI.EDU:50847         SYN_RECV    
tcp      117      0 lorien.wpi.edu:8140         LONDO.WPI.EDU:54496         ESTABLISHED 
tcp       38      0 lorien.wpi.edu:8140         GKAR.WPI.EDU:61305          CLOSE_WAIT  
tcp      118      0 lorien.wpi.edu:8140         LENNIER.WPI.EDU:63643       CLOSE_WAIT  
tcp       38      0 lorien.wpi.edu:8140         NOC1.WPI.EDU:57494          CLOSE_WAIT  
tcp      117      0 lorien.wpi.edu:8140         NOC1.WPI.EDU:60061          ESTABLISHED 
tcp        0    117 lorien.wpi.edu:60217        lorien.wpi.edu:8140         ESTABLISHED 
tcp       38      0 lorien.wpi.edu:8140         GKAR.WPI.EDU:61306          CLOSE_WAIT  
tcp       38      0 lorien.wpi.edu:8140         NOC1.WPI.EDU:57493          CLOSE_WAIT  
tcp      117      0 lorien.wpi.edu:8140         GKAR.WPI.EDU:61311          ESTABLISHED 
tcp      117      0 lorien.wpi.edu:8140         TRAPEZE.WPI.EDU:56065       ESTABLISHED 
tcp      118      0 lorien.wpi.edu:8140         LENNIER.WPI.EDU:65165       CLOSE_WAIT