Graham Sutherland / Polynomial's Post

In Reply To: this post

I do have a Grafana alert set up for "the CPU has been slammed solid for more than an hour", but it turns out the logic for it was broken so the alert never got sent.

going through my metrics, I can see that my average power consumption on the server rack was elevated by roughly 2kWh/day for the past two days, so this bug probably cost me about £1 in electricity.

Likes: 0
Boosts: 0
Hashtags:
Mentions:

Comments

Displaying 0 of 1 comments

Graham Sutherland / Polynomial

2 months ago

In response to this post

from what I can tell, the middleware bug is something to do with the contents of /dev changing during the execution of a cleanup script that runs periodically, which would explain why it's a rare edge-case.

looking through the logs it might've been a HBA hiccup because it did complain about something on /dev/da1, but it's hard to line up the timing because I don't exactly know when the script started.

I just found the actual answer to this. /etc/periodic/security/ has two periodic scripts that by default run daily: 100.chksetuid and 110.neggrpperm

by default (/etc/defaults/periodic.conf) these are enabled and configured to run daily. these scripts scan your system for files that have insecure setuid and negative group permissions, using `find`.

the problem is that this gets run *per jail* and if the jails mount large datasets it eats a ton of CPU time for several hours at a time.

by Graham Sutherland / Polynomial ; 7 hours ago

Likes: 0

Replies: 1

Boosts: 0