Published by Graham Sutherland / Polynomial

published

Graham Sutherland / Polynomial's Post

had some mild panic last night when I thought my NAS had been popped by a cryptominer. it was acting strange, so I SSH'd in and found a bunch of Python processes slamming the CPU, all running as root, no jail associated. they were running code passed on the command line rather than a file, and the imports were threading related. killing them led them to come back.

in the end it turned out to be a middleware bug in TrueNAS. the code was getting stuck in a loop doing nothing.


Likes: 0
Boosts: 0
Hashtags:
Mentions:

Comments

Displaying 0 of 1 comments

Graham Sutherland / Polynomial

In response to this post

I do have a Grafana alert set up for "the CPU has been slammed solid for more than an hour", but it turns out the logic for it was broken so the alert never got sent.

going through my metrics, I can see that my average power consumption on the server rack was elevated by roughly 2kWh/day for the past two days, so this bug probably cost me about £1 in electricity.


from what I can tell, the middleware bug is something to do with the contents of /dev changing during the execution of a cleanup script that runs periodically, which would explain why it's a rare edge-case.

looking through the logs it might've been a HBA hiccup because it did complain about something on /dev/da1, but it's hard to line up the timing because I don't exactly know when the script started.

by Graham Sutherland / Polynomial ;


Likes: 0

Replies: 1

Boosts: 0