Puppet: System Administration Automated

Support

Ticket #791 (reopened defect)

Opened 1 year ago

Last modified 5 months ago

Users and groups created mid-transaction are not found

Reported by: porridge Assigned to: luke
Priority: normal Milestone:
Component: user Version: 0.24.1
Severity: normal Keywords: uid cache exec
Cc: marthag@mit.edu, jbq@caraldi.com Triage Stage: Accepted
Attached Patches: None Complexity: Medium

Description

I recently noticed the following:

Aug 23 18:09:35 extdns03 puppetd[987]: [ID 702911 daemon.error] (//extdns03/common_foglight/foglight_agents_setup/Exec[/opt/foglight/script/setup.sh]) Failed to call refresh on Exec[/opt/foglight/script/setup.sh]: Invalid user: foglight

While I'm reasonably sure that the user did exist at that point. So after long and painful debugging I did the following:

extdns01# RUBYLIB=. irb
irb(main):001:0> require 'puppet'
=> true
irb(main):002:0> Puppet::Util.uid('boom')
=> nil

[ Now, in a different window, ran "adduser boom" ]

irb(main):003:0> Puppet::Util.uid('boom')
=> nil
irb(main):004:0>

I guess puppet should flush the uid/gid cache on any user/group operation, or possibly even every time.

Change History

08/29/07 20:25:56 changed by wyvern

I had this for gid caching too. It was biting me when I had this situation: I wanted to create say /etc/bind/foo owned by bind/bind but wanted the package to create the user if possible. The package would be installed first, but then within the same run puppet would try and create the file and complain that it couldn't find the bind group.

The workaround I did which seems to fix it is to put a Etc.endgrent() call in the posix.rb/gid() call. Presumably an equivalent Etc.endpwent() in the uid() call would be in order?

However when I had some debug in get_posix_fields it showed gid() iterating over all the groups which seems suboptimal compared to just looking for that exact group.

09/04/07 23:23:29 changed by michael

  • owner changed from luke to michael.
  • status changed from new to assigned.
  • stage changed from Unreviewed to Accepted.
  • milestone set to misspiggy.

09/04/07 23:23:57 changed by michael

  • owner changed from michael to luke.
  • status changed from assigned to new.

09/04/07 23:31:43 changed by luke

  • status changed from new to assigned.

The reason that we're iterating over all of the entries, instead of just looking for a specific entry, is that some platforms always return the current user when you call 'getXbyY'. It looks like a system library bug, since it happens in perl, also.

I think the platform we encountered it on was Debian, but I do not remember. You should be able to track the details down by following the change history of util.rb.

09/05/07 23:35:12 changed by wyvern

many thanks for the pointer - looks like change: 6fe01cedeb2fe00df62bf63d7f7375d18d0663ca (new to git I'm afraid)

I wish my ruby/puppet internals skills were better

11/09/07 00:16:07 changed by DavidS

Hmm, adding Etc.endpwent() did not help. Also it looks like lib/puppet/util/posix.rb already evolved further. This bug is still reproducible in git.

11/23/07 23:04:35 changed by luke

  • status changed from assigned to closed.
  • resolution set to fixed.

Fixed in [dbe70a10dd5d3d29f6529b1a0e6b6545308c6a7d].

In my testing, the endgrent/endpwent did fix the problems. It was definitely a Ruby caching problem, not a Puppet cache.

12/07/07 19:06:10 changed by wyvern

FWIW I think I've found a trickier one to resolve :( Either that or as DavidS says, it's not a 100% fix.

Part of the install I'm doing sets up LDAP for uid/gid - if this is done during a puppet run then it seems to think that the user doesn't exist (looking it up in the cache) and then it fails to create the user.

...
notice: //basenode_vars/basenode_node_vars/mer_basenode_node_vars/somebox.example.com
/basenode/basenode_node/ldap::ldap/ldap::core/Service[nscd]:
 Triggering 'refresh' from 13 dependencies
...
err: //basenode_vars/basenode_node_vars/mer_basenode_node_vars/somebox.example.com
/basenode/basenode_node/hids::hids/User[osirisd]/ensure: change
 from absent to present failed: Could not create user osirisd: Execution of '/us
r/sbin/useradd osirisd' returned 2304: useradd: user osirisd exists

02/19/08 20:01:05 changed by marthag

  • stage changed from Accepted to Unreviewed.
  • version changed from 0.23.2 to 0.24.1.
  • milestone deleted.

This is broken again in 0.24.1. This manifest:

  file { "/etc/bind/named.conf.options":
    mode    => "644",
    source  => "puppet:///dns/named.conf.options",
    owner   => "root",
    group   => "bind",
    require => Package["bind9"],
  }

  package { "bind9": ensure => present; }

  service { "bind9":
    ensure    => running,
    enable    => true,
    pattern   => "named",
    subscribe => File["/etc/bind/named.conf.options"],
  }

produces the following:

notice: Starting catalog run
debug: Prefetching aptitude resources for package
debug: Executing '/usr/bin/dpkg-query -W --showformat '${Status} ${Package} ${Version}\n''
debug: Puppet::Type::Package::ProviderAptitude: Executing '/usr/bin/dpkg-query -W --showformat '${Status} ${Package} ${Version}\n''
debug: //dns::named/Service[bind9]/subscribe: subscribes to File[/etc/bind/named.conf.options]
debug: //dns::named/File[/etc/bind/named.conf.options]/require: requires Package[bind9]
debug: Puppet::Type::Package::ProviderAptitude: Executing '/usr/bin/dpkg-query -W --showformat ${Status} ${Package} ${Version}\n bind9'
debug: //dns::named/Package[bind9]: Changing ensure
debug: //dns::named/Package[bind9]: 1 change(s)
debug: Puppet::Type::Package::ProviderAptitude: Executing '/usr/bin/aptitude -y -o DPkg::Options::=--force-confold install bind9'
notice: //dns::named/Package[bind9]/ensure: ensure changed 'purged' to 'present'debug: //dns::named/File[/etc/bind/named.conf.options]/checksum: Initializing checksum hash
debug: //dns::named/File[/etc/bind/named.conf.options]: Creating checksum {md5}d6b678ac90fd6ab163d74dfe5d68c2c9
debug: Calling fileserver.describe
err: //dns::named/File[/etc/bind/named.conf.options]: Failed to retrieve current state of resource: Could not find group bind
notice: //dns::named/Service[bind9]: Dependency file[/etc/bind/named.conf.options] has 1 failures
warning: //dns::named/Service[bind9]: Skipping because of failed dependencies
debug: Finishing transaction -613724178 with 1 changes

02/19/08 20:52:02 changed by luke

  • status changed from closed to reopened.
  • resolution deleted.
  • summary changed from puppetd's uid caching breaks user parameter of "exec" to Users and groups created mid-transaction are not found.

02/19/08 20:52:45 changed by marthag

  • cc set to marthag@mit.edu.

02/27/08 10:14:00 changed by jbq

Hi there, I'm facing the same problem:

Here is the first run with puppetd --test:

Feb 27 09:54:25 nc80 puppetd[5944]: Starting catalog run
Feb 27 09:54:25 nc80 puppetd[5944]: (//Node[nc80]/Exec[disk-space 90]/returns) executed successfully
Feb 27 09:54:25 nc80 puppetd[5944]: (//Node[nc80]/testuser/User[testuser]/ensure) created
Feb 27 09:54:26 nc80 puppetd[32484]: Lock file /var/lib/puppet/state/puppetdlock exists; skipping catalog run
Feb 27 09:54:28 nc80 puppetd[5944]: (//Node[nc80]/testuser/Ssh_keys[testuser]/File[/var/testuser/.ssh]/ensure) change from absent to directory failed: Could not set directory on ensure: Could not find group testuser at /home/svnup/_svn/conf/puppet/manifests/site.pp:73
Feb 27 09:54:28 nc80 puppetd[5944]: (//Node[nc80]/testuser/Ssh_keys[testuser]/File[/var/testuser/.ssh/known_hosts]/ensure) created
Feb 27 09:54:28 nc80 puppetd[5944]: (//Node[nc80]/testuser/Ssh_keys[testuser]/File[/var/testuser/.ssh/id_dsa.pub]/ensure) created
Feb 27 09:54:28 nc80 puppetd[5944]: (//Node[nc80]/testuser/Ssh_keys[testuser]/File[/var/testuser/.ssh/id_dsa]) Dependency file[/var/testuser/.ssh] has 1 failures
Feb 27 09:54:28 nc80 puppetd[5944]: (//Node[nc80]/testuser/Ssh_keys[testuser]/File[/var/testuser/.ssh/id_dsa]) Skipping because of failed dependencies
Feb 27 09:54:31 nc80 puppetd[5944]: (//Node[nc80]/svnup/Exec[svn up]/returns) executed successfully
Feb 27 09:54:31 nc80 puppetd[5944]: Finished catalog run in 6.43 seconds

The second run of puppetd --test works however:

Feb 27 09:55:09 nc80 puppetd[6121]: Ignoring cache
Feb 27 09:55:10 nc80 puppetd[6121]: Caching catalog at /var/lib/puppet/state/localconfig.yaml
Feb 27 09:55:10 nc80 puppetd[6121]: Starting catalog run
Feb 27 09:55:12 nc80 puppetd[6121]: (//Node[nc80]/testuser/Ssh_keys[testuser]/File[/var/testuser/.ssh]/group) group changed 'root' to 'testuser'
Feb 27 09:55:13 nc80 puppetd[6121]: (//Node[nc80]/Exec[disk-space 90]/returns) executed successfully
Feb 27 09:55:15 nc80 puppetd[6121]: (//Node[nc80]/svnup/Exec[svn up]/returns) executed successfully
Feb 27 09:55:15 nc80 puppetd[6121]: (//Node[nc80]/testuser/Ssh_keys[testuser]/File[/var/testuser/.ssh/id_dsa]/ensure) created
Feb 27 09:55:15 nc80 puppetd[6121]: Finished catalog run in 5.38 seconds

Then the first run that is triggered by the long-running puppetd daemon fails:

Feb 27 09:59:27 nc80 puppetd[32484]: Starting catalog run
Feb 27 09:59:28 nc80 puppetd[32484]: (//Node[nc80]/Exec[disk-space 90]/returns) executed successfully
Feb 27 09:59:30 nc80 puppetd[32484]: (//Node[nc80]/svnup/Exec[svn up]/returns) executed successfully
Feb 27 09:59:32 nc80 puppetd[32484]: (//Node[nc80]/testuser/Ssh_keys[testuser]/File[/var/testuser/.ssh]) Failed to retrieve current state of resource: Could not find group testuser
Feb 27 09:59:32 nc80 puppetd[32484]: (//Node[nc80]/testuser/Ssh_keys[testuser]/File[/var/testuser/.ssh/id_dsa]) Dependency file[/var/testuser/.ssh] has 1 failures
Feb 27 09:59:32 nc80 puppetd[32484]: (//Node[nc80]/testuser/Ssh_keys[testuser]/File[/var/testuser/.ssh/id_dsa]) Skipping because of failed dependencies
Feb 27 09:59:32 nc80 puppetd[32484]: Finished catalog run in 5.12 seconds

While the second run of the long-running puppetd daemon works OK:

Feb 27 10:04:33 nc80 puppetd[32484]: Starting catalog run
Feb 27 10:04:33 nc80 puppetd[32484]: (//Node[nc80]/Exec[disk-space 90]/returns) executed successfully
Feb 27 10:04:37 nc80 puppetd[32484]: (//Node[nc80]/svnup/Exec[svn up]/returns) executed successfully
Feb 27 10:04:38 nc80 puppetd[32484]: (//Node[nc80]/testuser/Ssh_keys[testuser]/File[/var/testuser/.ssh]/checksum) checksum changed '{time}Wed Feb 27 09:54:28 +0100 2008' to '{time}Wed Feb 27 09:55:15 +0100 2008'
Feb 27 10:04:39 nc80 puppetd[32484]: Finished catalog run in 5.69 seconds

Note that I didn't had to restart the long-running puppetd daemon. The next run recognizes the new group OK. Looks like puppet does not properly refresh the groups.

This is on Ubuntu 7.10.

02/27/08 23:27:56 changed by jbq

  • cc changed from marthag@mit.edu to marthag@mit.edu, jbq@caraldi.com.

02/29/08 08:32:37 changed by jamtur01

  • complexity changed from Unknown to Medium.
  • stage changed from Unreviewed to Accepted.

04/24/08 07:04:57 changed by luke

  • component changed from library to user.