Site Tools


staff-wiki:storage_zfs

ZFS

All of our network storage is now hosted on ZFS filesystems. We have four physical ZFS servers zfs1, zfs2, and zfs3. There is an additional DNS CNAME for zfs4, however this simply points to zfs1. These servers run SPARC Solaris 11.3 and act as head nodes in front of our SAN Storage. The SAN is primarily made up of Nexsan storage appliances which include the E48 series and SATABeast disk arrays.

We have also deployed two new Linux based ZFS file servers, named corezfs01/02. These serve the new user and project filesystems (see below), and are physically housed on a Hitachi G200 disk array.

You can find a listing of all ZFS Pools and Datasets here.

Pools

Our disk storage is broken up into storage volumes by the disk arrays. These volumes are then added to ZFS pools on the ZFS servers. The pools that were created on the new system all start with zf. We have 25 zf partitions, zf1-zf25. There are several legacy pools that were imported or migrated to ZFS, but still start with af, if or uf.

We can see a listing of pools on a server by running zpool list

root@zfs2:~# zpool list
NAME        SIZE  ALLOC  FREE  CAP  DEDUP  HEALTH  ALTROOT
if1.tank   5.96T  2.98T  2.98T  49%  1.00x  ONLINE  -
if2.tank   5.96T  4.00T  1.96T  67%  1.00x  ONLINE  -
if3.tank   5.96T  2.04T  3.92T  34%  1.00x  ONLINE  -
if4.tank   5.96T   530G  5.44T   8%  1.00x  ONLINE  -
if5.tank   5.96T  2.77T  3.19T  46%  1.00x  ONLINE  -
if6.tank   5.96T  4.47T  1.49T  74%  1.00x  ONLINE  -
rpool       136G  19.6G   116G  14%  1.00x  ONLINE  -
zf10.tank  5.97T  2.50T  3.47T  41%  1.00x  ONLINE  -
zf11.tank  5.97T  11.1G  5.96T   0%  1.00x  ONLINE  -
zf12.tank  5.97T   587G  5.40T   9%  1.00x  ONLINE  -
zf7.tank   5.97T  1.03T  4.94T  17%  1.00x  ONLINE  -
zf8.tank   5.97T  5.11T   878G  85%  1.00x  ONLINE  -
zf9.tank   5.97T  1.13T  4.84T  18%  1.00x  ONLINE  -

Importing and Exporting zpools

If a server has issues with a pool, sometimes you will find that the pool is no longer imported. If you do not see a pool that should be on a server then you may need to re-import the pool. Running zpool import will show you a list of all pools on the SAN that the server can see. We can import a pool by running zpool import poolname.

root@zfs2:~# zpool import
  pool: af4
    id: 13970819169355586171
 state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: http://support.oracle.com/msg/ZFS-8000-EY
config:

        af4                                        ONLINE
          c0t6000402001E016FE7C91F4EB00000000d0s1  ONLINE

  pool: af12.tank
    id: 11343109093909297116
 state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: http://support.oracle.com/msg/ZFS-8000-EY
config:

        af12.tank                                  ONLINE
          c0t6000402001E016FE6BACBE7300000000d0s1  ONLINE
...
root@zfs2:~# zpool import poolname.tank

We need to be careful because this also shows pools that are actively imported and in use by other servers! You should be able to import the pool without using the '-f' argument to force, however you may need to force the import under certain circumstances (such as if there are errors or faults).

Datasets

These pools house ZFS datasets, which we ultimately refer to as partitions. For example the pool zf15-18.tank on zfs3 contains two datasets, zf15-18.tank/zf15 and zf15-18.tank/zf18. These datasets are mounted on the ZFS servers as local filesystems and are then shared to the network via NFS and Samba.

Network Filesystems

ZFS has built in integration with network filesystem protocols. We can see a listing of share by running:

root@zfs2:~# share
IPC$            smb     -       Remote IPC
if1.tank/if1    /if1    nfs     sec=sys,root=argo.cs.virginia.edu:ares.cs.virginia.edu:athena.cs.virginia.edu:zfs1.cs.virginia.edu:zfs2.cs.virginia.edu:zfs3.cs.virginia.edu:coresrv01.cs.virginia.edu:coresrv02.cs.virginia.edu        
if1_smb /if1    smb     -       
if2.tank/if2    /if2    nfs     sec=sys,root=argo.cs.virginia.edu:ares.cs.virginia.edu:athena.cs.virginia.edu:zfs1.cs.virginia.edu:zfs2.cs.virginia.edu:zfs3.cs.virginia.edu:coresrv01.cs.virginia.edu:coresrv02.cs.virginia.edu        
if2_smb /if2    smb     -       
if3.tank/if3    /if3    nfs     sec=sys,root=argo.cs.virginia.edu:ares.cs.virginia.edu:athena.cs.virginia.edu:zfs1.cs.virginia.edu:zfs2.cs.virginia.edu:zfs3.cs.virginia.edu:coresrv01.cs.virginia.edu:coresrv02.cs.virginia.edu        
if3_smb /if3    smb     -       
if4.tank/if4    /if4    nfs     sec=sys,root=argo.cs.virginia.edu:ares.cs.virginia.edu:athena.cs.virginia.edu:zfs1.cs.virginia.edu:zfs2.cs.virginia.edu:zfs3.cs.virginia.edu:coresrv01.cs.virginia.edu:coresrv02.cs.virginia.edu        
if4_smb /if4    smb     -       
...
zf9.tank_zf9    /zf9    nfs     sec=sys,root=argo.cs.virginia.edu:ares.cs.virginia.edu:athena.cs.virginia.edu:zfs1.cs.virginia.edu:zfs2.cs.virginia.edu:zfs3.cs.virginia.edu    
zf9.tank_zf9    /zf9    smb     -       
zf9_smb /zf9    smb     -       
c$      /var/smb/cvol   smb     -       Default Share

Sometimes if there are issues at boot time (eg a pool is faulted) Solaris will put NFS and SMB into maintenance mode and no shares will be active. If this happens you will get no output from running share. This can sometimes be fixed by running share -a. The output of dmesg is often useful for troubleshooting NFS/SMB issues.

NFS

In Solaris 11 NFS is handled by SMF (Service Management Facility. There are several NFS related services

root@zfs2:~# svcs | grep nfs
online         Nov_20   svc:/network/nfs/fedfs-client:default
online         Nov_20   svc:/network/nfs/status:default
online         Nov_20   svc:/network/nfs/mapid:default
online         Nov_20   svc:/network/nfs/rquota:default
online         Nov_20   svc:/network/nfs/nlockmgr:default
online         Nov_20   svc:/network/nfs/server:default

You can restart the main NFS server by running

root@zfs2:~# svcadm restart nfs/server

Service scripts and log files can be found in /var/svc/manifest and /var/svc/log respectively.

SMB

Samba is also controlled through SMF, however there are elements outside of these services that are important (such as kerberos and Active Directory).

To restart the main smb service

root@zfs2:~# svcadm restart svc:/network/smb/server

We have had several issues with samba and Active Directory that have broken samba shares. To test the SMB server from the ZFS server, run the following command

root@zfs1:~# smbadm show-shares -A localhost      
c$                  Default Share
IPC$                Remote IPC
zf1.tank_zf1        
zf1_smb             
zf2.tank_zf2        
zf20_smb            
zf21_smb            
zf22_smb            
zf23_smb            
zf24_smb            
zf25_smb            
zf2_smb             
zf3_smb             
zf4.tank_zf4        
zf4_smb             
zf5.tank_zf5        
zf5_smb             
zf6.tank_zf6        
18 shares (total=18, read=18)

You should see a listing of all samba shares. If you see no output then something is wrong and most likely it's related to Active Directory. Fixing this issue involves un-joining and re-joining the domain. Start by deleting the computer object in AD. Log on to one of the AD servers, search for the ZFS server under Active Directory Users and Computers→Computers. Right click the computer object (eg ZFS2) and select “delete”

Before joining the domain, we must make sure that one of the Active Directory servers is the first entry in /etc/resolv.conf as a nameserver:

root@zfs2:~# cat /etc/resolv.conf
search  cs.virginia.edu

nameserver      128.143.67.108               <- Demeters IP address
nameserver      128.143.136.15

Then run the following and follow the prompts to re-join the domain

root@zfs2:~# smbadm join -u administrator cs.virginia.edu
After joining cs.virginia.edu the smb service will be restarted automatically.
Would you like to continue? [no]: yes
Enter domain password: 
Locating DC in cs.virginia.edu ... this may take a minute ...
Joining cs.virginia.edu ... this may take a minute ...
Successfully joined cs.virginia.edu

Snapshots

See the main article on Backups for more information.

Snapshots are automatically taken every 12 hours at 6am and 6pm. We have no mechanism in place to monitor utilization of our partitions, because of this partitions will occasionally fill up because of snapshots. Follow these steps to free up space by deleting old snapshots

First get a list of snapshots on the partition on which you want to free up space

root@zfs3:~# zfs list -r -t snap zf15-18.tank/zf15
NAME                                        USED  AVAIL  REFER  MOUNTPOINT
zf15-18.tank/zf15@12.06.17-00               544M      -  3.94T  -
zf15-18.tank/zf15@12.06.17-06               538M      -  3.94T  -
zf15-18.tank/zf15@12.06.17-12              5.28G      -  3.95T  -
zf15-18.tank/zf15@12.06.17-18              89.3M      -  3.95T  -
zf15-18.tank/zf15@12.07.17-00              67.4M      -  3.95T  -
zf15-18.tank/zf15@12.07.17-06              56.7M      -  3.95T  -
zf15-18.tank/zf15@12.07.17-12               116M      -  3.96T  -
zf15-18.tank/zf15@12.07.17-18              87.2M      -  3.96T  -
...

Say we want to delete the oldest 5 snapshots, we can pipe the output of this into head -6 to take the top 6 lines, and then pipe that into tail -5 to remove the top line that we don't want

root@zfs3:~# zfs list -r -t snap zf15-18.tank/zf15 | head -6 | tail -5  
zf15-18.tank/zf15@12.06.17-00               544M      -  3.94T  -
zf15-18.tank/zf15@12.06.17-06               538M      -  3.94T  -
zf15-18.tank/zf15@12.06.17-12              5.28G      -  3.95T  -
zf15-18.tank/zf15@12.06.17-18              89.3M      -  3.95T  -
zf15-18.tank/zf15@12.07.17-00              67.4M      -  3.95T  -

We then use awk to get only the first column (the snapshot names). Pipe this into a temporary file and run the following loop to delete the snapshots.

root@zfs3:~# zfs list -r -t snap zf15-18.tank/zf15 | head -6 | tail -5 | awk '{print $1}' > /tmp/zf15.snaps
root@zfs3:~# for i in `cat /tmp/zf15.snaps `; do zfs destroy $i; done

CoreZFS servers

zpool persistence

After installing the ZFS On Linux packages on CentOS, we have several ZFS related systemd services

[root@corezfs01 ~]# systemctl list-unit-files | grep zfs
zfs-import-cache.service                      enabled 
zfs-import-scan.service                       disabled
zfs-mount.service                             enabled 
zfs-share.service                             enabled 
zfs-zed.service                               enabled 
zfs-import.target                             enabled 
zfs.target                                    enabled 

In the default state of these ZFS services after installation, any zpools created on the system will disappear once the system reboots. There is a fix for this behavior, but I have not found the best way to perform this action using Puppet. For the meantime, we need to run the following command on new ZFS servers:

[root@corezfs01 ~]# systemctl preset zfs-import-cache zfs-import-scan zfs-mount zfs-share zfs-zed zfs.target

ZFS quotas

Each ZFS dataset has a disk space quota set. To see the quotas, login to the zfs server and use the zfs command:

[root@corezfs01 ~]# zfs get refquota,quota u/av6ds
NAME     PROPERTY  VALUE     SOURCE
u/av6ds  refquota  19G       local
u/av6ds  quota     20G       local

The refquota is the disk space utilization at which the user will get a warning, but still be able to remove files. The quota is a hard quota that cannot be exceeded, and the user is locked out from being able to act on files. We only set the refquota. Do not set the hard quota! To set a quota, use the zfs set command:

[root@corezfs01 ~]# zfs set refquota=29g u/av6ds
[root@corezfs01 ~]# zfs get refquota u/av6ds
NAME     PROPERTY  VALUE     SOURCE
u/av6ds  refquota  29G       local
staff-wiki/storage_zfs.txt · Last modified: 2023/08/29 19:56 by 127.0.0.1