All 21 entries tagged Solaris

View all 41 entries tagged Solaris on Warwick Blogs | View entries tagged Solaris at Technorati | There are no images tagged Solaris on this blog

January 07, 2009

Haproxy / Apache / Solaris follow–up

Follow-up to Solaris SMF and HAProxy won't play nicely from Secret Plans and Clever Tricks

A couple of years ago, I blogged about haproxy, and in particular the incompatibility of it’s soft-restart mechanism with solaris’ SMF. A few people have contacted me since then to ask where we got with it. So here’s the answer:

In the end, we decided simply to not use the graceful-restart feature of haproxy. If we ever restart via SMF, we stop the service, then start it again – so there would be a period of unavailability. This works fine.

We very very rarely need to restart haproxy though, so it’s not an issue for us. We use apache and mod_proxy_http to talk to a single haproxy instance, which then balances requests between several http back-ends (java for us, but it would work exactly the same with any HTTP service). On the odd occasions when we need to reconfigure haproxy (usually only once or twice a year), what we do is start another instance of haproxy on a different port, reconfigure apache to connect to the new haproxy instance (if you use a rewrite map for the proxy, then this doesn’t even need an ‘apachectl graceful’), then we can safely restart the first haproxy. Once apache is connecting to the (reconfigured) original haproxy server again, we can remove the second instance.

haproxy has been very very stable for us; I don’t think it’s ever crashed (yet!) in the two years since I wrote that blog entry, and on our main server it’s handling between 500,000 and 1,000,000 http requests per day (we have it on a bunch of more minor servers too). Performance is fine; it adds perhaps a couple of milliseconds of latency per request but for our apps that’s not significant. To be honest, we have more problems from apache than from haproxy!

So, If you’re looking for a pure load-balancing solution, and you don’t plan on re-configuring very often (or you can do the same ‘run multiple haproxies and switch between them’ trick) then I think haproxy would be a strong contender – it’s simpler, lighter weight, and probably faster/more scalable than nginx/lightttpd/apache. (though you probably won’t notice the “faster” bit for most web apps)

However, if you want to do other ‘web-server-ish’ things (serve static content, rewrite urls, do SSL…) then you’ll have to have some kind of web server in front, so it would be worth looking at the load-balancers that are built in to either nginx or apache – it may be simpler to just support one server rather than having a chain.

If I had a time machine, I would probably be looking at either nginx, apache + mod_proxy_balancer, or lighttpd. But Apache + haproxy is more than good enough for us, and it’s very much a case of ‘better the devil you know’ – so I probably won’t be migrating any time soon!

July 29, 2008

Opensolaris adventure, part 5; CIFS

Follow-up to Opensolaris adventure, part 4; a quick diversion into iLOM from Secret Plans and Clever Tricks

A user asks

I am only a poor lowly windows user, but I want some of your storage. Can haz CIFS please?

CIFS and SMB are the way that windows shares work. And, as it happens, setting them up in opensolaris is nice and easy.

I used this as a guide, but NOTE THE TYPO where it says add the packages SUNWsmbkr & SUNWsmbs, it should be add the packages SUNWsmbskr & SUNWsmbs (note extra ’s’). As you can see below, if you fail to do this, it’s fixable, but painful.

So, here we go…

# pkg install SUNWsmbs SUNWsmbkr
 ... complains about non-existence of SUNWsmbkr
# pkg install SUNWsmbs
# svccfg import /var/svc/manifest/network/smb/server.xml
#  zfs create -o casesensitivity=mixed pool0/cifs0
# zfs set sharesmb=on  pool0/cifs0 
# smbadm join -w solcifs
... error about "SMB cannot start service"...

head-scratching follows. Eventually work out solution

# pkg install SUNWsmbskr
... next two steps probably not required if you do it right the first time
# rem_drv smbsrv 
# add_drv -m '* 0640 root sys' smbsrv
# svcadm clear smb/server
# smbadm join -w solcifs
# zfs set sharesmb=name=disk1 pool0/cifs0

victory! windows clients can now connect to ‘disk1’ on my server :-)

The only additional requirement is that, in order to generate the smbpasswd file, each user that wants to connect needs to do ‘passwd’, to regenerate their password. I’m going to see if we can switch to use an LDAP backend to get around that.

July 15, 2008

Opensolaris adventure, part 4; a quick diversion into iLOM

Follow-up to Opensolaris adventure, part 3: pkg from Secret Plans and Clever Tricks

Whilst I’ve got a machine that I can happily reboot without pissing anyone off, I thought I’d see if I could do anything to improve the default iLOM setup.

By default, the iLOM GUI console redirection is enabled, but not the serial console. This is ok, but it has some issues. You can’t use the GUI console if there’s any kind of firewall between you and the server, since it initiates connections to the server on random high ports. So ssh-level access to the console would be nice.

Going into the ilom and doing ‘start /SP/console’ produces no output. A bit more googling suggested that doing eeprom console=ttya might help, so I tried that. Now when rebooting I get output, and once the box is booted I can get a console session via ssh, but I also seem to have aquired a bogus, and totally unbootable, new default entry in the grub menu, titled ‘solaris bootenv RC’ . I note also that the grub menu isn’t accessible via the serial connection – one of the few things that you really would want to be able to access. So that’s not a roaring success, then …

So, I tried eeprom console=text to put it back, which disabled the text-based console access, but still left the bogus grub entry. So, I edited /rpool/boot/grub/menu.lst and changed the ‘default’ option from 3 to 2 (2 being the new snv_93 entry, assuming they’re numbered starting from 0). I also commented out the ‘splashscreen’ entry. Reboot again, and hey presto! I can now select which image to boot from a ssh connection :-)

I suspect that I’ve broken the ability to do serial access to the machine. But TBH that’s probably not too much of a pain. I should probably check that it’s still possible to put a ‘real’ keyboard/monitor/mouse into the back of the box in extremis, should that be needed. Update serial access works just fine :-)

Opensolaris adventure, part 3: pkg

Follow-up to Opensolaris adventure part 2: network and mirroring from Secret Plans and Clever Tricks

For my next trick, I’ll be trying out pkg, the opensolaris packaging system.

I have moderately high standards for this; I use ubuntu on the desktop, so apt is my benchmark for how a unix package management system should operate. Ian Murdock (ex. Debian) is behind OpenSolaris’s pkg, so hopefully the experience should be comparable.

Step 1:

# pkg refresh
# pkg install SUNWipkg

... time passes...


so, a basic smoke-test passes. I can install packages :-). Now, let’s have a look in the repo and see if we can find anything useful

#pkg search -r apache

... nothing ...

hmm… that’s not so good. Where’s apache? a bit of googling suggests that the package is called ‘SUNWapch22’

# pkg search -r apch
# pkg search -r SUNWapch22
# pkg search -r httpd
basename   file      usr/apache2/bin/httpd     pkg:/SUNWapch2@2.2.3-0.75

ah-ha! It seems like pkg search doesn’t search package names and descriptions (as aptitude search does) but rather, file contents. I think that’s a little bit weird; it wouldn’t have occurred to me to specify the name of a file rather than some part of the description, but maybe it makes more sense. For a lot of packages (lynx , say), the name of the binary is the term you’d most likely search on, so it should work well in that circumstance.

pkg install SUNWapch2

So now I have apache; import the manifest and enable the service and away we go!

OK, now to try something a bit more exciting; a distribution upgrade to the latest (snv_93) kernel and associated goodies.

pkg image-update

It dowloads for a couple of hours (Sun repos seem to be slower than the ubuntu ones), and finally announces that it’s made me a new boot environment.

zfs list shows me a new set of file systems under rpool/ROOT/opensolaris-1, and beadm list tells me I now have 2 boot environments; with the new one set to become active on a reboot. Off we go then!

I reboot, wait a bit, and try and ssh back into the box. Hmm; connection refused. I fire up the ilom console and take a look. The system appears to be continually rebooting – when the grub screen comes up, the new BE is present, but as soon as grub tries to boot it, the system goes back to the BIOS startup screen and reboots again :-(

So, next time round I select the old boot environment from the grub menu, and I’m back into my 05.08 environment. So, the fact that the image-update appears to have failed is a bit sucky, but, unlike a failed kernel update in ubuntu (which is rare, but generally leaves you with a broken X and a whole bunch of packages needing to be manually rolled back), I’m seamlessly back to where I started. ZFS FTW!

Anyway, a bit more googling reveals that image-update isn’t quite working yet, you need to manually update the boot archive.

# mkdir /tmp/foo
# mount -F zfs rpool/ROOT/opensolaris-1
# /tmp/foo/boot/solaris/bin/update_grub -R /tmp/foo

Reboot again, and this time when the grub menu comes up, I have a third option, for snv_93. It’s pre-selected, so I let it boot; and it works fine :-) It’s a bit confusing that I still have the ‘opensolaris-1’ item in the grub menu, but it’s not causing any harm so I’m going to leave it there for the moment.

So, now I have a shiny new kernel with all the latest toys. What else do I want?

Well, blastwave and the SFW repositories would be nice:

# pkg set-authority -O sunfreeware
# pkg set-authority -O blastwave
# pkg refresh
# pkg search -r lynx
INDEX      ACTION    VALUE                     PACKAGE
basename   file      opt/csw/bin/lynx          pkg:/IPSlynx@0.5.11-2.6
basename   file      opt/sfw/bin/lynx          pkg:/IPSFWlynx@0.5.11-5.7

Sorted. Next, some zones, I think…

July 14, 2008

Opensolaris adventure part 2: network and mirroring

Follow-up to Opensolaris adventure, part 1 from Secret Plans and Clever Tricks

OK; this bit wasn’t quite as smooth…

first off, networking. I used this blog post for guidance here. It doesn’t seem like NWAM is really aimed at static-IP server configuration, so the ‘traditional’ approach looks most appropriate

pfexec bash # cheat
svcadm disable nwam
svcadm enable network/physical:default

cat {hostname} > /etc/hostname.bge0
vi /etc/inet/resolv.conf # add in dns stuff
vi /etc/nsswitch.conf # change 'hosts' entry to 'files dns'
vi /etc/hosts # add IP address for my hostname
vi /etc/defaultrouter #add default router
vi /etc/netmasks #add netmasks

err.. ok.. what now? I tried ifconfig nge0 down;ifconfig nge0 up, but nge0 was stuck resolutely at . I probably should have tried svcadm restart network/physical:default , but that thought didn’t occur to me until too late. I needed to reboot anyway, so I figured that would probably fix it.

Wrong! On a reboot, only the lo0 interface was present. I was about to try ‘ifconfig plumb nge0’, when it occurred to me to check svcadm. Sure enough, svcs showed me that the network milestone was offline because neither the nwam nor default physical network service was running. A quick look back through my command history showed that when I meant to run svcadm enable network/physical:default I had in fact run disable instead (lazy use of bash command history!). Enabling the service bought nge0 magically into existence, and I had a network.

Phew. Now I can work over ssh rather than over the ilom console…

So, next job; mirror the root partition.

Again using Denis Clarke’s blog entry as a guide, I started off by using format to list the disks for me.

# format
Searching for disks...done

       0. c5t0d0 <DEFAULT cyl 17830 alt 2 hd 255 sec 63>
       1. c5t1d0 <Sun-STK RAID INT-V1.0-136.61GB>
      ... continue for 14 more disks...  
I used fdisk /dev/rdsk/c5t1d0p0 to delete the existing partition, and create a new one of type ‘solaris’. I then used this piece of trickery to copy the disk layout from the root disk onto the new mirror:
 prtvtoc /dev/rdsk/c5t0d0s0 | fmthard -s - /dev/rdsk/c5t1d0s0

c5t0d0s0 was the current root disk, so c5t1d0 seemed like the logical disk to use as a mirror.

I have to confess I have very little idea what that just did. Solaris disk partitioning is something I’ve never really been involved with. As we’ll see in a moment, this may turn out to have unexpected consequences…

# zpool attach  rpool c5t0d0s0 c5t1d0s0
invalid vdev specification
use '-f' to override the following errors:
/dev/dsk/c5t1d0s0 overlaps with /dev/dsk/c5t1d0s2

hmmm… does that matter? I didn’t make a c5t1d0s2; I guess it got copied over from the vtoc on c5t0. Might I end up doing nasty things to my ZFS mirror? gah. Confusing. Let’s just try it with ‘-f’ and see what happens…

...What happens is that it appears to just work. Whether it will subsequently come back to bite me remains to be seen, but after a couple of minutes the mirror is in sync, and all looks good. A reboot comes back fine, with swap working as expected (still only on the one disk) and the ZFS mirror intact.

Next jobs: try out pkg, get the grub config mirrored, find out what the hell I did with fmthard.

Opensolaris adventure, part 1

I have a shiny new Sun X4240 to play with, and I decided that this would be a good candidate for our first OpenSolaris test box. So, I’m going to try setting it up, and blog how I get on here:

Step 0: Racking the box has got marginally easier; it came with a nice screwless rack that installed with much less tearing of flesh than the usual fiddly little bolt thingies.

Step 1: Ilom. Never actually tried configuring one of these before, but it turns out to be pretty simple if you’ve got the right hardware. A laptop with a USB-to-serial connector plugged into the serial port on the back of the server, a copy of TerraTerm, and we’re in. I’ve got an IP address allocated for the ilom, so I just need to configure it.

cd /SP/network
set pendingipaddress={ip}
set pendingipdiscovery=static
set pendingipgateway={gateway ip}
set pendingipnetmask={netmask}
set commitpending=true
cd ../console

hey presto, I can now log into the ilom over ssh, and better still, I can access the GUI console via the ilom’s java console applet thingy.

Step 2: install OpenSolaris

I downloaded the OpenSolaris 05.08 CD image, burned it, and stuck the resulting CD into the X4240. (actually, I did this before configuring the ilom). When I connected to the GUI console, the LiveCD was already booting. I clicked on ‘install’ and answered a few basic questions about timezones, and a default user, and the install started. The only thing I couldn’t see how to do was created a mirrored boot disk; hopefully that’s something I’ll be able to do post-install

I then let the box reboot. Unfortunately this turned out to be a mistake, as it booted back into the LiveCD (guess I wasn’t paying attention enough to switch it at the grub prompt). I ejected the CD and tried to reboot, but the LiveCD doesn’t seem to be able to run reboot once the CD is ejected (maybe it needs to load a binary off the LiveCD!). A swift power cycle from the ILOM fixed that though. The fans whizzed, and in a remarkably short space of time (compared with Solaris 10) I had a login prompt. Enter the user details from the install screen, and I’m in :-)

So, I have a working system. Now I need to configure TCP/IP, fix the unmirrored root disk, and do something with the remaining 10 disks. Then I’ll be finding out how the new pkg system works for installing stuff, and trying out LU to bring the kernel up-to-date with the latest OpenSolaris build.

September 24, 2007

solaris NFS performance wierdness

Spanky new Sun X4600 box. Solaris 10u4. Multipathed e1000g GB interfaces. NFS-mounted volume, totally default.

$nfsstat -m /package/orabackup
/package/orabackup from nike:/vol/orabackup/dionysus-sbr
 Flags:         vers=4,proto=tcp,sec=sys,hard,intr,link,symlink,acl,rsize=32768,wsize=32768,retrans=5,timeo=600
 Attr cache:    acregmin=3,acregmax=60,acdirmin=30,acdirmax=60

$ /opt/filebench/bin/filebench
filebench> load webserver
filebench> run 60
IO Summary:      255700 ops 4238.6 ops/s, (1366/138 r/w)  23.1mb/s,   4894us cpu/op,  66.0ms latency

mutter, grumble,... remount the NFS vol with -v3

$ nfsstat -m /package/orabackup
/package/orabackup from nike:/vol/orabackup/dionysus-sbr
 Flags:         vers=3,proto=tcp,sec=sys,hard,intr,link,symlink,acl,rsize=32768,wsize=32768,retrans=5,timeo=600
 Attr cache:    acregmin=3,acregmax=60,acdirmin=30,acdirmax=60

$ /opt/filebench/bin/filebench
filebench> load webserver
filebench> run 60

IO Summary:      4397877 ops 72839.3 ops/s, (23495/2351 r/w) 396.4mb/s,    221us cpu/op,   3.1ms latency

What the … ? The default configuration for an NFS 4 mount on this box appears to be 20 times slower than the equivalent V3 mount. How can this be right? Either there’s something very wierd going on with our network topology, or there’s something badly broken about the way the mount is configured. Either way, it’s beyond me to work out what it is. NFS 3 ain’t broken (well, not very) so, unless Sun support can offer some illumination we’ll be sticking with that.

August 16, 2007

Monitoring Environmental status on Sun servers with ipmitool + nagios

So, fresh from a successful integration of fmadm and nagios, I thought I’d have a go at doing the same thing with some of the environmental information that ipmitool provides. Specifically, temperature, voltage, and fan status throughout the machine.

On solaris 10u3, this turns out to be pretty easy. ipmitool can be called from inside a nagios plugin, and the results parsed. In theory I could send every sensor value to Nagios, and draw pretty and complicated graphs. In practice, though, I’m not that bothered by the actual values, I just want to know whether anything is reporting an error. If it does, I can go to the iLOM to get more info.


# isn't there a built-in for this?
def higher_of(a,b)
   if (a > b): return a
   else return b


$status_names = {$ok=>:ok,$warn=>:warn,$crit=>:critical}

def parse_status(sensor) 

 ipmitool = '/usr/sfw/bin/ipmitool -I bmc -U root sdr type'
 status = $ok;

 temp=`#{ipmitool} #{sensor}`
 temp.each_line do |line|
    #puts line
    line =~ /cr/ && status = $crit;
    line =~ /nr/ && status = $crit;
    line =~ /nc/ && status = higher_of(status, $warn)
    # other matches (ns, ok, ...) are deemed OK
 return status

result = $ok
info_line = "" 
sensors = ['temperature','fan','voltage']
sensors.each do | sensor|
 #puts "checking #{sensor}, currently result is #{result}" 
 st = parse_status(sensor)
 info_line ="#{info_line} #{sensor} #{$status_names[st]}" 
 result=higher_of(result, st)
info_line="#{$status_names[result]}: #{info_line}" 
puts info_line 
exit result

I plug this into an nrpe check, and that’s that. Easy.

One caveat: On solaris 10u3, ipmitool runs in a second or so, with no noticeable effect on the machine. On u2, however, it takes about a minute to run the same query, sucking up about 10% CPU on a V40. Not so good. I’m not sure yet whether this is a problem with ipmitool (unlikely, they seem to be the same binary) the server itself, or the bmc driver. Once I’ve got a v40 running s10u2 with a working iLOM network connection, I’ll try the lan driver and see if that’s better.

August 15, 2007

Monitoring solaris FMD with nagios

Solaris 10 has a very cool subsystem called FMD, the Fault Management Daemon. This bit of code monitors your server, looking for failed hardware, and re-configuring the system to avoid it. So if a CPU fails, FMD will offline it; if a memory DIMM reports an unacceptable number of errors, FMD will mark the bad segments off-limits. This is very cool, but it’s all done terribly quietly. I wanted a way to have FMD tell me about it when it spotted a problem. This blog entry describes using fmadm to report on any known faults and email details, which is cool, but I wanted something that would hook into our nagios management server. Here’s how I did it:

1) add a crontab entry to run fmadm every 10 minutes and dump the output into a file:

0,10,20,30,40,50 * * * * /usr/sbin/fmadm faulty > /tmp/fmadm.out

This needs to run as root, or as a user who has the SYS_CONFIG privilege. SYS_CONFIG appears to be fairly wide-ranging, so I didn’t want to grant this to the nagios user (which is a bit of a shame really, it would have made things much simpler and also more timely if I could have run fmadm inside the nagios check.

Next step, the nagios check. I’m using a local script, which is invoked by nrpe (available from blastwave). I’ve written mine in ruby, but it would be easy to port to perl or even sh:

require 'ftools'
def exit_unknown(message)
  puts message
  exit 3
def exit_critical(message)
  puts message
  exit 2
def exit_warning(message)
  puts message
  exit 1
def exit_ok(message)
  puts message
  exit 0

if (!File.exist?(fname)) then
  exit_warning("Status file #{fname} not found")
now =
mtime = file.mtime
# How many minutes old can the check file be ?
# we're running the check every 10 mins, so allow no more than 11
warn_mins = 11
crit_mins = 15
warn_threshold = now - (warn_mins*60)
crit_threshold = now - (crit_mins * 60)
if (mtime < crit_threshold)
  exit_critical("Marker file #{fname} more than #{crit_mins} mins old")
if ( mtime < warn_threshold)
  exit_warning("Marker file #{fname} more than  mins #{warn_mins} old}")
if text.length < 2
   exit_warning "Status file does not appear valid" 
elsif text.length == 2
    exit_ok("Marker file #{fname} is up to date")
   exit_critical "Hardware faults found: check #{fname} for details" 

Now just configure a check in nrpe.conf:

command[check_fmadm]=/usr/local/nagios-plugins/check_fmadm.rb /tmp/fmadm.out

and you’re good to go! add in an nrpe check on your nagios server, and you’ll get notified whenever fmadm detects a hardware failure.

May 22, 2007

Three cheers for the Fair Share Scheduler

Writing about web page

The more I use my Solaris Zones boxes, the more (mostly) I like them. Yes, there are some niggles about how you cope with container failures, how you clone zones between boxes, the odd un-killable process, and so on, but for the most part, they just do exactly what you’d expect them to, all the time.

Take the FSS for instance. This little widget takes care of allocating CPU between your zones. A big problem in server consolidation, at least in web-land, is the “spiky” nature of CPU usage; Web-apps tend to be relatively low consumers of CPU most of the time, but occasionally will want a much larger amount.
If you’re consolidating apps, you don’t want one busy app to steal CPU off all the others, but if all the others are idle, then you might as well let the busy app take whatever it needs.

The FSS solves this problem elegantly. Each zone is allocated “shares” representing the minimum proportion of CPU that it is able to allocate if needed. So if I have 3 zones, and give them each 1 share, then if each zone is working flat out, they’ll get 1/3 of the CPU time allocated. But if one zone goes idle, the other two will get 50% each. If only one zone is busy, it’ll get 100%. Better still, if one zone has 100% of CPU, and another zone becomes busy, the first is reined in instantly to give the other one the CPU it’s entitled to.

And does it work in real life? Oh yes…here’s one of our apps getting a bit overloaded. You can see the box load go up to 20 (on an 8-way box; in this case it was about 99% busy for 15-20 minutes), and the zone that’s causing all the trouble gets pretty unresponsive. But the other zone doesn’t even register the extra CPU load. Awesome.

maia cpu load

maia forums response timemaia rtv response time

Most recent entries


Search this blog

on twitter...


    RSS2.0 Atom
    Not signed in
    Sign in

    Powered by BlogBuilder
    © MMXXI