Anyone who deals with huge amount of critical data takes steps to ensure redundancy and backup of that data. Big companies do it as a standard practice. But small and medium businesses, SOHO (small office/ home office) and even individuals have data they can't afford to lose.
Unlike big companies, however, many smaller businesses, SOHOs and individuals don’t have IT departments, much less CTOs and CIOs.
I personally deal with a huge amount of data -- I have over 6TB of storage. I run a file server at home; it’s one of the most critical parts of my life and I store family photos and all my documents on it.
There are multiple advantages to this approach:
- Since all of my data is stored in one location, I can easily manage, maintain and secure it.
- It’s a centralized file server so anyone in my local network can access these files.
- I can work across multiple devices – laptop, PC, tablet, phone – without worrying about copying files between them.
- If my main PC or laptops breaks, I don’t have to worry about rescuing data that’s stored on that machine. Because nothing is ever stored on such devices in my house.
But it’s not a good idea to keep all your eggs in one basket. So you you have to add redundancy to it, you have to ensure multiple physical backups.
ZFS is the answer
There are two excellent file systems that can do this job: ZFS and Btrfs. Btrfs is very new and is still under heavy development so I don’t recommend using it yet. ZFS, on the other hand, is time-tested, stable and extremely feature rich: It includes pooled storage (zpool), copy-on-write, snapshots, data integrity verification, automatic repair, protection against corruption, etc. But there are some licensing issues around ZFS, which I've written about previously.
Initially I considered FreeNAS and ran it for a while, but I am more comfortable with Linux and I prefer complete control over my server. So I chose ZFS file system with Ubuntu.
Here's how to get started with ZFS on Ubuntu.
- Fully patched 14.04 Ubuntu server
- At least three hard drives. One for Ubuntu, which must be installed on a separate SSD or hard drive) and two for the storage pool.
- At least 8GB of RAM (1GB for Ubuntu and then 1GB of RAM per TB of data)
- Any decent CPU.
- I strongly recommend using the LTS (long term support) release of Ubuntu on any file server.
- To create RAID-Z, you need a minimum of two hard drives with same storage capacity. If you have hard drives of different capacity, your total storage will be the size of the smaller hard drive.
- I strongly recommend having a third external hard drive so that you can take regular backups of your data. (In my case, I take 2 external backups and keep one at my inlaws place.)
Once you have your hard drives, you can install ZFS. While ZFS comes pre-installed in the upcoming Ubuntu 16.04 release, we need a PPA to install it on 14.04
sudo apt-add-repository ppa:zfs-native/stable
Once it’s added update the repositories
sudo apt-get update
Now install the package and load the module:
sudo apt-get install ubuntu-zfs
sudo /sbin/modprobe zfs
Then run the following command to check if any ‘pools’ or hard drives exist on the system:
sudo zfs list
The output should be ‘no datasets available’. We know there is no dataset created yet, but we wanted to check that it’s installed properly.
Now it’s time to create a pool of drives. For that you need to identify the drives on your system. So first run lsblk command to identify each drive.
Note down the block device names, which should be something like ‘sda, sdb, sdc,’
Now we need to find the ID of each drive that we are going to use in the drive ‘pool’
sudo ls -l /dev/disk/by-id/
This command will give you a long output. Here is an example from my system (I have removed output for sda, that’s the drive where Ubuntu is installed. In my case the two drives that I will be using for pool are sdc and sdd):
swapnil@nas:~$ sudo ls -l /dev/disk/by-id/
lrwxrwxrwx 1 root root 9 Feb 20 17:48 ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0318132 -> ../../sdc
lrwxrwxrwx 1 root root 9 Feb 20 17:48 ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E4LRJ111 -> ../../sdd
lrwxrwxrwx 1 root root 9 Feb 20 17:48 wwn-0x50014ee20922e8a8 -> ../../sdc
lrwxrwxrwx 1 root root 9 Feb 20 17:48 wwn-0x50014ee20b453ec7 -> ../../sdd
From this output note down the IDs that start with ‘wwn’, so in my case it was ‘wwn-0x50014ee20922e8a8’ and ‘wwn-0x50014ee20b453ec7’
Let’s now create the zpool in this format:
zpool create -f -o ashift=12 -m <mount_point> <pool_name> raidz <ID_of_the_first_drive> <id_of_the_second_drive>
In my case the mount point was ‘/mnt/bank, pool was ‘swapool’
zpool create -f -o ashift=12 -m /mnt/bank swapool raidz wwn-0x50014ee20922e8a8 wwn-0x50014ee20b453ec7
This may take a while. Once done check status.
sudo zfs list
This time you will get some output that means your pool is created. Now the two hard drives will be mirroring the files. To further increase redundancy you can add more drives.
Before you start copying files to the pool, you still have some more work to do. You must create ‘datasets’ and store everything inside these datasets. Datasets are like directories or folders created with ZFS features. You can create a few datasets for different file types or just one dataset. And once a dataset is created, you can then create regular directories or folders inside it.
This is the command to create a dataset:
sudo zfs create <pool_name>/directory_name
Here is an example on my system where I created different datasets for different types of files:
sudo zfs create swapool/images
sudo zfs create swapool/music
sudo zfs create swapool/movies
sudo zfs create swapool/documents
sudo zfs create swapool/ebooks
This way you will have a very organized structure for your data.
Once all desired datasets are created, you need to change file permissions on the pool so that normal system users can write to it.
sudo chown -R user_name:group_name /path_of_pool
sudo chown -R swapnil:users /mnt/bank/
Now you are all set. Start storing your files on your newly created redundant file storage. In follow up articles I will cover regular maintenance of this storage, how to automate backups of the data on a drive, and accessing this storage over local network to turn it into a file server.
If you have any questions, feel free to ask in the comments below.
This article is published as part of the IDG Contributor Network. Want to Join?