All 2 entries tagged Zfs
No other Warwick Blogs use the tag Zfs on entries | View entries tagged Zfs at Technorati | There are no images tagged Zfs on this blog
September 17, 2010
While I was working on a perl script for automating ZFS snapshots I happened to discover the automatic ZFS snapshot service for opensolaris by Tim Foster.
Which is, of course, exactly what I need – the problem is that I need it to run on Solaris 10, for which it is not currently available. However, this thread on Tim’s blog (especially the comments) suggest that it is possible to get the service working, without the time slider element on Solaris 10:
So, I attempted yesterday to get this installed, compiled into an SVR4 package with pkgmk and pktrans and install it on Solaris 10. It was remarkably straightforward and early testing shows it to work very well.
First, get the ‘bits’ and make the pkg as described on Tim’s blog, using hg to clone and make to create the pkg tree.
$ hg clone ssh://firstname.lastname@example.org/hg/jds/zfs-snapshot
Before actually creating the package, there are 3 changes as detailed by Eli Kleinamm, again on the blog comment : http://blogs.sun.com/timf/entry/zfs_automatic_snapshots_0_12
These changes are made in
# vi zfs-auto-snapshot
1) The shell to run in /usr/dt/bin/dtksh
2) Remove on line 511 and 517 the -o com.sun:auto-snapshot-desc=”$EVENT”,
3)Remove the space/tab on line 970 between $SWAPVOLS and
$(echo SWAPVOLS=”$SWAPVOLS$(echo $swap | sed -e ’s#/dev/zvol/dsk/##’)”
Once done, run make – it should create a pkg format file tree for you in the current working directory – read the Makefile, its not long.
Next, create the pkg datastream for portability
$ pkgtrans -s . /var/spool/pkg/SUNWzfs-auto-snapshot.pkg SUNWzfs-auto-snapshot Transferring <SUNWzfs-auto-snapshot> package instance
This will now simply pkgadd to a Solaris 10 system. I can’t help but think this is particularly awesome, but as a colleague pointed out – this now really has to go through some very extensive testing before any rollout. I still think its a lot better than my dangerous perl scripting attempts.
This will now (if I can get agreement internally) be added as a package to the standard build for all Solaris 10 hosts using ZFS within our control.
The service provides 5 SMF services for managing automatic snapshots. These are:
Enable these to have snapshots taken at a regularity as suggested by the name. Each has properties to tune, the most notable being how many snapshots to retain;
-bash-3.00$ svccfg -s auto-snapshot:daily listprop zfs/keep zfs/keep astring 31 -bash-3.00$ svccfg -s auto-snapshot:frequent listprop zfs/keep zfs/keep astring 4 -bash-3.00$ svccfg -s auto-snapshot:frequent listprop zfs/period zfs/period astring 15
The defaults for daily and frequent being 1 months worth of dailies and 4 frequents. Frequent is defined initially as once every 15 mins as shown by the zfs/period property.
Whether a ZFS dataset is snapshot’ed or not is controlled by a user level ZFS property com.sun:auto-snapshot which can be true or false for a dataset and com.sun:auto-snapshot:frequent, com.sun:auto-snapshot:daily, etc, etc properties for each periodicity as required. These are inherited, so it is probably a good idea to set com.sun:auto-snapshot to false at the root of a pool and then change the required sub-datasets to true as needed. Perhaps an example would help;
-bash-3.00$ pfexec zfs set com.sun:auto-snapshot=true datapool/zfstestz1_ds -bash-3.00$ -bash-3.00$ pfexec zfs set com.sun:auto-snapshot=false rpool -bash-3.00$ pfexec zfs set com.sun:auto-snapshot=true rpool/export -bash-3.00$ pfexec zfs set com.sun:auto-snapshot:frequent=true rpool/export -bash-3.00$ pfexec zfs set com.sun:auto-snapshot:frequent=true datapool/zfstestz1_ds -bash-3.00$ <getting bored with pfexec for each cmd> -bash-3.00$ pfexec bash bash-3.00# zfs set com.sun:auto-snapshot=false rpool/dump bash-3.00# zfs set com.sun:auto-snapshot=false rpool/swap bash-3.00# cat /etc/release Solaris 10 10/09 s10x_u8wos_08a X86 Copyright 2009 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 16 September 2009 bash-3.00# bash-3.00# svcs -a |grep snapshot disabled 17:59:54 svc:/system/filesystem/zfs/auto-snapshot:monthly disabled 17:59:54 svc:/system/filesystem/zfs/auto-snapshot:weekly disabled 17:59:54 svc:/system/filesystem/zfs/auto-snapshot:daily disabled 17:59:54 svc:/system/filesystem/zfs/auto-snapshot:hourly online 18:00:44 svc:/system/filesystem/zfs/auto-snapshot:event online 18:09:18 svc:/system/filesystem/zfs/auto-snapshot:frequent bash-3.00# bash-3.00# bash-3.00# zfs list -t snapshot | grep frequent datapool@zfs-auto-snap_frequent-2010-09-23-1809 0 - 28.0K - datapool/zfstestz1_ds@zfs-auto-snap_frequent-2010-09-23-1809 0 - 28.0K - rpool/export@zfs-auto-snap_frequent-2010-09-23-1809 0 - 23K - rpool/export/home@zfs-auto-snap_frequent-2010-09-23-1809 0 - 954K - bash-3.00# bash-3.00# pkginfo SUNWzfs-auto-snapshot application SUNWzfs-auto-snapshot ZFS Automatic Snapshot Service bash-3.00# bash-3.00# date Thursday, 23 September 2010 18:15:48 BST bash-3.00# zfs list -t snapshot | grep frequent datapool@zfs-auto-snap_frequent-2010-09-23-1809 0 - 28.0K - datapool@zfs-auto-snap_frequent-2010-09-23-1815 0 - 28.0K - datapool/zfstestz1_ds@zfs-auto-snap_frequent-2010-09-23-1809 0 - 28.0K - datapool/zfstestz1_ds@zfs-auto-snap_frequent-2010-09-23-1815 0 - 28.0K - rpool/export@zfs-auto-snap_frequent-2010-09-23-1809 0 - 23K - rpool/export@zfs-auto-snap_frequent-2010-09-23-1815 0 - 23K - rpool/export/home@zfs-auto-snap_frequent-2010-09-23-1809 0 - 954K - rpool/export/home@zfs-auto-snap_frequent-2010-09-23-1815 0 - 954K -
I think this should prove to be very useful for us, I would be very interested, as usual in hearing your comments.
September 15, 2010
This blog entry, wasn’t intended to be published. It was simple a place for me to collect my thoughts on the state of one of our servers and its remaining capacity for extra zones.
The server is an X4170 with 2 Quad-Core AMD Opterons and 64gb of ram. It is currently running 11 zones, each running an instance of apache and an instance of tomcat. The load currently looks reasonably light, certainly from a CPU viewpoint:
ZONEID NPROC SWAP RSS MEMORY TIME CPU ZONE 0 61 1314M 1308M 2.0% 804:24:29 2.9% global 13 58 1381M 936M 1.4% 35:16:01 0.5% zoneA 16 45 1238M 496M 0.8% 11:20:16 0.2% zoneB 29 55 1378M 576M 0.9% 4:30:13 0.1% zoneC 9 47 1286M 496M 0.8% 21:08:49 0.1% zoneD 2 60 1290M 464M 0.7% 7:35:19 0.0% zoneE 7 51 1298M 460M 0.7% 6:50:11 0.0% zoneF 24 57 835M 896M 1.4% 17:14:18 0.0% zoneG 27 45 138M 191M 0.3% 7:52:47 0.0% zoneH 4 47 345M 408M 0.6% 11:26:22 0.0% zoneI Total: 573 processes, 2649 lwps, load averages: 0.38, 0.29, 0.27
Note that each zone is consuming several hundred meg of ram (see RSS column), this consumption will almost certainly be due to the java heap settings / sizings of the various tomcat instances.
Lets take a look at one of the zones
> zlogin zoneA ps -futomcat UID PID PPID C STIME TTY TIME CMD tomcat 29407 29406 0 Jun 07 ? 1294:57 /usr/java/bin/java -Djava.util.logging.config.file=/usr/local/tomcat/conf/loggi tomcat 29406 20781 0 Jun 07 ? 0:47 /usr/local/sbin/cronolog /usr/local/tomcat/logs/catalina.out.%Y-%m-%d
How to interrogate this process to find the resident set size? Well, there are a few options; hunt down the config entries that lauch this tomcat server (usually set as a var CATALINA_OPTS), take the easy route and use prstat or the ‘process tools’ such as pmap. Ever used pmap? pmap -x will show more detail about the address space mapping of a process than you ever wanted to know; shown below, we can see that the tomcat process in zoneA is using >700mb of ram.
bash-3.00# pmap -x 29407 29407: /usr/java/bin/java -Djava.util.logging.config.file=/usr/local/tomcat/c Address Kbytes RSS Anon Locked Mode Mapped File 08008000 12 - - - ----- [ anon ] 0803D000 44 44 44 - rwx-- [ stack ] <... truncated ...> FEFF0000 4 4 4 - rwx-- [ anon ] FEFFB000 8 8 8 - rwx-- ld.so.1 FEFFD000 4 4 4 - rwx-- ld.so.1 -------- ------- ------- ------- ------- total Kb 1349932 780360 744800 -
Thought 1, there is only 6gb of free memory (as defined as actually on the freelist). Where is it all? I would expect that the ZFS arc cache will be consuming the lions share, and this blog entry will show you how to check. As applications use more ram, ZFS should behave and reluinquish the previously “unused” ram; but remember that reducing ZFS’s available allocation (by default 7/8ths of RAM) will in theory impact read performance as we will likely see more cache misses. ZFS should be pretty good at reducing the arc size as the system requests more and more ram. Let’s check the numbers, both of ram used for arc and the cache efficiency statistics.
Here is the mdb memstat that shows the memory usage detail:
bash-3.00# echo "::memstat" | mdb -k Page Summary Pages MB %Tot ------------ ---------------- ---------------- ---- Kernel 13015620 50842 78% Anon 1416662 5533 8% Exec and libs 42400 165 0% Page cache 695383 2716 4% Free (cachelist) 27757 108 0% Free (freelist) 1577146 6160 9% Total 16774968 65527 Physical 16327309 63778 bash-3.00#
Rather a lot used by the kernel (indicative of ZFS arc usage), rather than anon, exec and libs, note also a currently a rather small freelist as a percentage of total physical ram, but as mentioned I suspect ZFS will be consuming a lot (perhaps 40g plus?) of the othewise unused ram.
The free memory values and zfs arc stats can conveniently be confirmed with a quick look at the kstats via arc_summary, which is an incredibly useful tool written by Ben Rockwood (thanks Ben) which saves us all an immense amount of time both remember how, where and which kstats to interrogate for memory and zfs statistics.
Get it here: http://cuddletech.com/arc_summary/
System Memory: Physical RAM: 65527 MB Free Memory : 4369 MB LotsFree: 996 MB
The arcsize, as suspected is large; at over 45gb
ARC Size: Current Size: 46054 MB (arcsize)
The ARC cache, is however doing a good job with that 45gb, a hit ratio of 96% isn’t bad.
ARC Efficency: Cache Access Total: 4384785832 Cache Hit Ratio: 96% 4242292557 [Defined State for buffer] Cache Miss Ratio: 3% 142493275 [Undefined State for Buffer] REAL Hit Ratio: 92% 4060875546 [MRU/MFU Hits Only]
Notice also that the ARC is doing a fairly reasonable job of correctly predicting our prefetch requirements, 36% isn’t too bad compared to a directly demanded cache hit rate of 72%.
Data Demand Efficiency: 72% Data Prefetch Efficiency: 36%
OK, so in theory if we keep adding zones we have to be very careful about the available ram allocations given that these zones are running some reasonably memory-hungry (tomcat) java instances. As more and more memory is allocated to apache/tomcat and java within new zones, less and less ram will be available for the zfs arc. This will need some careful monitoring.
Currently the cpu usage on this server is also fairly light, but I have no view of the future levels of usage or traffic intended to run through these services. I’ll park this one for now.
Thought 2, disk space. Currently the zones have a zfs dataset allocated in the root pool for each zone root:
rpool/ROOT/s10x10-08/zones/zoneB 6.23G 5.77G 6.23G /zones/zoneB
Each of which has a 12gb quota, the output above (from zfs list) is typical of the usage for the zones, around 50% of the 12g quota. There are 11 zones, each with this 12gb quota, giving a potential total usage of 132gb. The entire root pool is only 136gb and already has just 32.5 gb available as shown by zfs list:
root > zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT datapool 816G 31.2G 785G 3% ONLINE - rpool 136G 104G 32.5G 76% ONLINE -
It seems, then, rather than concerning myself about memory or cpu usage on this server, unless we find elsewhere for the zone root filesystems, we are likely to run out of local storage space way before we stretch the capabilities of this server.
Soon a blog entry on increasing the size of your root pool with some mirror juggling, I’ve done this successfully on a VM, I just need a test server to perfect the process.