All entries for Thursday 16 August 2007

August 16, 2007

Monitoring Environmental status on Sun servers with ipmitool + nagios

So, fresh from a successful integration of fmadm and nagios, I thought I’d have a go at doing the same thing with some of the environmental information that ipmitool provides. Specifically, temperature, voltage, and fan status throughout the machine.

On solaris 10u3, this turns out to be pretty easy. ipmitool can be called from inside a nagios plugin, and the results parsed. In theory I could send every sensor value to Nagios, and draw pretty and complicated graphs. In practice, though, I’m not that bothered by the actual values, I just want to know whether anything is reporting an error. If it does, I can go to the iLOM to get more info.

#!/opt/csw/bin/ruby

# isn't there a built-in for this?
def higher_of(a,b)
   if (a > b): return a
   else return b
   end
end

$ok=0
$warn=1
$crit=2

$status_names = {$ok=>:ok,$warn=>:warn,$crit=>:critical}

def parse_status(sensor) 

 ipmitool = '/usr/sfw/bin/ipmitool -I bmc -U root sdr type'
 status = $ok;

 temp=`#{ipmitool} #{sensor}`
 temp.each_line do |line|
    #puts line
    line =~ /cr/ && status = $crit;
    line =~ /nr/ && status = $crit;
    line =~ /nc/ && status = higher_of(status, $warn)
    # other matches (ns, ok, ...) are deemed OK
 end
 return status
end

result = $ok
info_line = "" 
sensors = ['temperature','fan','voltage']
sensors.each do | sensor|
 #puts "checking #{sensor}, currently result is #{result}" 
 st = parse_status(sensor)
 info_line ="#{info_line} #{sensor} #{$status_names[st]}" 
 result=higher_of(result, st)
end
info_line="#{$status_names[result]}: #{info_line}" 
puts info_line 
exit result

I plug this into an nrpe check, and that’s that. Easy.

One caveat: On solaris 10u3, ipmitool runs in a second or so, with no noticeable effect on the machine. On u2, however, it takes about a minute to run the same query, sucking up about 10% CPU on a V40. Not so good. I’m not sure yet whether this is a problem with ipmitool (unlikely, they seem to be the same binary) the server itself, or the bmc driver. Once I’ve got a v40 running s10u2 with a working iLOM network connection, I’ll try the lan driver and see if that’s better.


Most recent entries

Loading…

Search this blog

on twitter...


    Tags

    Not signed in
    Sign in

    Powered by BlogBuilder
    © MMXIX