Fixing Nested groups with winbind part 1: Why the hell is it broken?

I've previously mentioned nested groups not working on our active directory/winbind/FreeBSD setup. For some reason groups in groups were not properly unfolded by winbind. I suspected this to be because the "domain functional level" was "windows 200 mixed", which is more compatible with older DC's. By now, we've moved to a new DC and raised the level to "windows 2003 native". Unfortunately, this didn't help one tiny bit. Still broken.

For example, the user mkooijma is present in various groups on our AD. Most of these groups are also member of the Actievelingen group, so mkooijma should be too. Yet, running id mkooijma, gives

uid=10008(mkooijma) gid=10000(Domain Users) groups=10000(Domain Users),
11001(Beheer), 11013(KasCo), 11004(BoCie), 11019(NoiZiA), 11029(Webredactie)


So, no Actievelingen group there.

Again motivated to start digging into source code to find out why this is, I ended up finding a nice big FIXME in the code. That part of the code, which was used by the id utility on FreeBSD (and presumably also by the file system), does not support nested groups.

Seeing a fine oppurtunity to do some coding, I've hacked up a couple of lines of code to supported nested groups there. And I'll be damned, but it worked. Running id mkooijma now gives:

uid=10008(mkooijma) gid=10000(Domain Users) groups=10000(Domain Users),
11001(Beheer), 11013(KasCo), 10001(Actievelingen), 11004(BoCie),
11019(NoiZiA), 10002(Webmasters.nested), 11029(Webredactie)


Nice job, so I submitted my patch to the samba-technical mailing list yesterday. I also started hanging out in #samba-technical, where I got into a discussion with Wilco (which I happen to know IRL) this morning about my patch. We discussed in what way I could make it not break on groups that are members of themselves (indirectly), or users that would end up in the same group multiple times. A few minutes into this discussion, vl walked in on IRC. Since he apparently has been working on winbind group mappings before, he pointed me on a few more fundamental flaws of my approach.

It turns out that fixing the getgrent NSS interface is not really the way to go, for two reasons.

• Samba's getgrent interface is not really working that well. It does not support everything as it should, such as nested groups. Yet, the code is stable now, and trying to support nested groups, but not accounting for all possible scenarios will make the code unstable. Or more to the point:

17:01 < blathijs> okay, so the current code doesn't work but doesn't break
17:01 < blathijs> but my patch makes it break :-)

• Using getgrent for finding the groups a user is in is terribly inefficient. In short, getgrent returns a list of members for a group. So, to get the groups a user is on using getgrent the code has to iterate all existing groups and see if the user is in that particular group. Doable for a couple of groups, but it will not scale to hundreds or thousands groups and users.

Also, since active directory and LDAP store group membership as "memberOf" attributes for the users, creating a list of members for a group probably involves iterating all users to find members (not sure about that though).

It turns out there is a getgrouplist function in FreeBSD and there is also a getgrouplist function in winbind. Yet, when calling getgrouplist, this call is not forwarded to winbind through nsswitch, but implemented by iterating with getgrent (as described above. getgrent is forwarded through nsswitch). So, FreeBSD should just call getgrouplist through nsswitch and be done with it?

Also, these features are supported on linux, so it should be possible. Wilco pointed out than NetBSD does support this and has nicely documented how. Basically, the problem is that the getgrouplist function uses a parameter is input and output paramater, which makes it unsuitable to forward using nss. NetBSD fixes this by making getgrouplist a wrapper around a (new) getgroupmembership function, which has a slightly different interface that is compatible with nss.

Yet, it seems that this getgroupmembership nss function is not implemented by winbind. This probably means NetBSD was changed to support this with other nss backends, but nobody got around to adding the function to winbind yet. But, we know that linux does support this using nss and winbind, so how is that implemented then?

It turns out that linux has implemented something similar, yet it wraps getgrouplist in a nss-compatible function initgroups_dyn. It's interface is practically compatible with getgroupmembership, the only difference is that getgroupmembership needs a preallocated buffer, while initgroups_dyn can allocate and resize the buffer itself when needed (though this seems like a better idea, the function must still be wrapped in getgrouplist, so it might not really matter anyway....).

Anyway, I now have a clear view what has to bo done. I will either:

• Port getgrouplist nss code from NetBSD to FreeBSD and implement getgroupmembership in winbind (for which I can probably borrow code from nss_ldap, which I think supports both linux and NetBSD).
• Port getgrouplist nss code from linux. This requires no winbind changes, since initgroups_dyn is already implemented in winbind.

Just now, I've found I'm not the only one that wants these changes. So I'll do some more exploring about the necessary changes soon, but now it's time for dinner.

Edit: I've looked at both Linux and NetBSD libc, and it seems NetBSD is the way to go. Both FreeBSD and NetBSD have taken their libc implementations from something called 4.4BSD-Lite2, meaning that both are already quite similar in contrast to Linux, whose libc is structured quite differently.