ESP8266 Temperature logger for Nagios

Previously I had a post about ESP8266 microcontroller and I had some brief review of this lovely hardware here.
Now I’m going to post a new schematics and a Nagios module to use it as a real time temperature logger for Nagios server in Data-centres.

Briefly what it does and how it works:

– ESP8266 module reads the temperature sensor in every 10 seconds and sends data via UDP
– Nagios server processes the received UDP data  from the ESP module and compares with the settings in Nagios

If Nagios server picks value that triggers the alarm, than it will send warning or alarm to the Nagios admin.
So here we have the electronics schematics and the related program codes for the ESP and the Nagios server.

Connect the ESP and the Dallas sensor as on this picture below. This is the easiest way to wire up them. (1-Wire )
I’m not going to go into details of the ESP module programing, there are dozens articles on the net regarding to this.
Myself I use the LuaLoader, which I think is the easiest one to use. If you just need the related code files, then jump to the end of this article, there you can download all lua files.
You must correct the Nagios server’s address, which is this: ” cu:connect(7,”″) ” and also your SSID and Password to connect to your access point.


For power supply I used an old USB cable to power up the ESP module from a server in the Data-centre. After all this is to check the racks and server’s temperature in the DC. :)
The USB has 5V as we know, so you would need to lower this up to 3.3V. You can use an AMS1117 5V to 3V stabilizer, please check the link below about this.
Temperature Sensor = Dallas
ESP module = ESP
AMS1117-3.3 = AMS

And from here the codes for the ESP8266 module and for the Nagios server as well.
Two files need to be uploaded to ESP8266:

first file init.lua:


function startup()
if abort == true then
print(‘startup aborted’)
print(‘Starting xmitTemp’)

abort = false
print(‘Startup in 5 seconds’)

Second file xmitTemp.lua:


function getTemp()

local addr = nil
local count = 0
local data = nil
local pin = 3 — pin connected to DS18B20
local s = ”

— setup gpio pin for oneWire access

— do search until addr is returned
count = count + 1
addr = ow.reset_search(pin)
addr =
until((addr ~= nil) or (count > 100))

— if addr was never returned, abort
if (addr == nil) then
print(‘DS18B20 not found’)
return -999999


— validate addr checksum
crc = ow.crc8(string.sub(addr,1,7))
if (crc ~= addr:byte(8)) then
print(‘DS18B20 Addr CRC failed’);
return -999999

if not((addr:byte(1) == 0x10) or (addr:byte(1) == 0x28)) then
print(‘DS18B20 not found’)
return -999999

ow.reset(pin) — reset onewire interface, addr) — select DS18B20
ow.write(pin, 0x44, 1) — store temp in scratchpad
tmr.delay(1000000) — wait 1 sec

present = ow.reset(pin) — returns 1 if dev present
if present ~= 1 then
print(‘DS18B20 not present’)
return -999999
end, addr) — select DS18B20 again
ow.write(pin,0xBE,1) — read scratchpad

— rx data from DS18B20
data = nil
data = string.char(
for i = 1, 8 do
data = data .. string.char(

data:byte(1),data:byte(2),data:byte(3), data:byte(4),
data:byte(5),data:byte(6), data:byte(7),data:byte(8))

— validate data checksum
crc = ow.crc8(string.sub(data,1,8))
if (crc ~= data:byte(9)) then
print(‘DS18B20 data CRC failed’)
return -9999

— compute and return temp as 99V9999 (V is implied decimal-a little COBOL there)
return (data:byte(1) + data:byte(2) * 256) * 625

end — getTemp

function xmitTemp()
local temp = 0

temp = getTemp()
if temp == -999999 then


end — xmitTemp

function initUDP()

— setup UDP port
— cu:connect(7,”″)
end — initUDP

function initWIFI()

print(“Setting up WIFI…”)

wifi.sta.config(“Your SSID”,”SSID Password”)

tmr.alarm(1, 1000, 1,
if wifi.sta.getip()== nil then
print(“IP unavailable, Waiting…”)
print(“Config done, IP is “..wifi.sta.getip())
end — function
end — initWIFI

tmr.alarm(0, 10000, 1, xmitTem

Here follows the Nagios server modules:

Add to your localhost.cfg the following configuration.
This is usually at /usr/local/nagios/etc/objects/

define service{
use                             local-service         ; Name of service template to use
host_name                       Telehouse
service_description             Telehouse_Temperature
check_command                   check_temp
#check_interval                 0.5
#retry_interval                 1
#max_check_attempts             5
notification_interval           1
check_interval          1
retry_check_interval    1
max_check_attempts      5


Create a file called check_temp in the libexec directory and make it executable.  (check_temp at /usr/local/nagios/libexec)

DIRS=”/var/log /tmp”

temp1=`/usr/bin/cut -c 1-2 /home/nagios/current_temp.txt`
temp2=`/usr/bin/cut -c 3-4 /home/nagios/current_temp.txt`


count=$(/usr/bin/tail -n 1 /home/temp/current_temp.txt)


if [[ “$count2″ < “$op1″ ]] ; then


elif [[ “$count2″ < “$op2″ ]] ; then



echo “$status Temperature:$temp1.$temp2; Triggers: 22.00;25.00;0; $statustxt – $count2″
exit $status


Add a new crontab to run tshark which will check the UDP echo messages from the ESP module.
If you don’t have tshark/wireshark installed, then make it available for your box.
CentOS: yum install wireshark
Debian: apt-get install wireshark

nano /etc/crontab

01 * * * * root cd /home/temp && /usr/bin/tshark -a duration:3600 -i eth0 src -T fields -e data -w temp2.pcap & > /dev/null
* * * * * root /home/temp/


Create a new directory in /home as temp

mkdir /home/temp

Create a file called


cat /home/temp/temp2.pcap | tr -dc ‘[:alnum:]\n\r’ | cut -c 2-5 | awk ‘length($0) > 2′ | tail -n 1 -c 5 > /home/temp/current_temp.txt

To check ESP8266 sending the correct UDP packet run this command:

tcpdump -i eth0 udp

You need to see similar UDP packets from the ESP module every 10 seconds:

18:10:40.248116 IP > nagiosnew.echo: UDP, length 6

And also in the Nagios you will hopefully see this:


References: AT Firmware.bin



VMware networking setup for vMotion/iSCSI & VM traffic

VMware ESX/ESXi network setup.

In the following post I will show you some networking setup regarding to VMware servers.
This will involve Cisco switches(2960/3750 series) and HP or Dell servers setup.
I got these configurations in production running for quite some times now(2+years) without any issues.

As we know the networking setup for VMware servers, got much more complicated, than any other regular server setup earlier we had with “classic” Linux or Windows physical boxes.
Classic only one uplink connection with regular vlan is not enough for vmware anymore.
You must separate virtual machine traffic from the management traffic and also you must separate the storage and vmotion traffic.
Although VMware says you can have separated vswitches for all physical connections with different vlans, but the failover to other physical connections is more complicated, than if you have one or two vswitches. VMware server needs minimum 2 network uplinks for VM traffic and management traffic, but VMware recommends 4 uplinks for the physical servers.

The following picture shows briefly the current setup.


So let’s take a look the 4 uplink configuration in the VMware ESXi host:


We got all 4 uplinks connected to the same vswitch. With this configuration is very easy to create the failover for the management traffic and to separate the storage and vmotion traffic as well. Let’s take a look the vswitch properties:


Also take a look the NIC teaming for the vswitch.
As you can see all adapters are active in this vswitch:



Now take a look the management uplink settings.
The management network has one active adapters and two standby adapters.
If the active vnic0 adapter physical connection fails(switch issue or cable connection issue), then VMware kernel will activate one of the other standby adapters.
With this setup the management network will always be available and you cannot lose the connection to the VMware box.

Now we check the vMotion settings.
Here we have an added VMkernel port with vMotion and IP storage which contains extra IP address for the vMotion.
As you can see here we have one active adapters and three unused adapters. To properly separate this kind of traffic by the kernel you must tick the failover order and move down the adapters, that you don’t want to use in the kernel. This settings is the same with iSCSI storage.

Now take a look the Storage IP kernel settings.
Here we have also an extra added VMkernel port with extra IP address.
In this setup also the extra active vswitch adapters have been disconnected and unused as you can see on the picture.
Without this you won’t be able to add properly the iSCSI software storage. The VMkernel IP settings creates a point to point 1 to 1 connection to the storage and therefore only one active adapters should be enabled in any VMkernel port groups. With this setup you can have more than one path to the iSCSI storage, but for this you need to enable this feature in iSCSI setup.


So now take a brief look to the Virtual Machine Port Group settings regarding to the Vlan settings.
You can add new vlans here to the kernel and create load balance and failover for the virtual machines.
I used two adapters from the physical adapters for the virtual machines and they are activated as vnic5/vnic1 and vnic1/vnic5 opposite to each other.
But if you have 4 or 6 uplink adapters, then you could active 3-4 adapters for the virtual machines, it’s up to you.
Also this depends on how heavily loaded your virtual boxes, obviously if the boxes are pretty loaded then, it’s better if you separate the loads and leave out vmotion and management traffic from the physical uplink connections.



I know it’s getting a bit confusing, so here we are again some binding regarding to the VLAN, management traffic and vMotion traffic:


The storage traffic is not added to any of those traffics, it is just connection via the VMkernel port group IP address as a one to one connection:


In this setup I use the same vlan for the vMotion, because this is only used for maintenance, but if you use heavily the vMotion then it is better to be separeted into a different vlan.You might as well create a new physical uplink for traffic, which could help you to separate this traffic not just on a vlan level, but on the physical level also.

And finally the physical uplink ports to the Cisco switch:

interface GigabitEthernet1/0/22
description vnic0
switchport trunk allowed vlan 100,200,300,400
switchport trunk native vlan 999
switchport mode trunk
switchport nonegotiate
speed 1000
duplex full

interface GigabitEthernet1/0/23
description vnic2
switchport trunk allowed vlan 100,200,300,400
switchport trunk native vlan 999

switchport mode trunk
switchport nonegotiate
speed 1000
duplex full

The native vlan 999 command is used to change the default untagged vlan traffic which is vlan1.
With this command you can avoid unnecessary layer 2 traffic to the VMware server, like flooding and broadcast.
Also if you have a system already configured with vCenter, then sometimes you cannot change the management vlan, because vCenter won’t be able to reach the box anymore and the changes goes into error or the box could get dropped from vSphere. In that case you would need to disconnect the connected server from vCenter and create a second VMkernel interface with a different IP subnet with different physical interface, than the currently running one and connect to the box via that KMkernel. With this you can do any major changes to the main interface. (native vlan, vlan tagging etc) I have seen few times, when I wanted to do changes, then I lost the connection to the server and I needed to either reset the VMkernel management or rollback the switch configuration or change the native vlan on the switch. So you need to be careful with this changes, if you cannot reach your physical box for any reason (server is in a data-center or a different office)

So now let’s take a look the Cisco switch side, after the native vlan configuration and the trunking configuration:

Port        Mode             Encapsulation  Status        Native vlan
Gi1/0/22    on               802.1q         trunking      999

Port        Vlans allowed on trunk
Gi1/0/22    100,200,400

Port        Vlans allowed and active in management domain
Gi1/0/22    100,200,400




IoT Temperature logger with ESP8266 and DS18B20 sensor

Current living room temperature:

I will post the circuit schematics and coding shortly, in the meantime this is the module that I used.
Also I’m posting the firmware flasher and the firmware that I used for this project.
There are many available on the net and you could get confused easily, so there you go follow this links and check what I bought and used.

ESP8266 used for this project: ESP-01:


ESP8266 on ebay:

DS18B20 sensor on ebay:

Nodemcu Firmware:


Programming and testing with Lualoader:

Programming and testing with ESPlorer:





Linux server migration with VMware converter

The next post will show you how to migrate a Live Linux/Windows machine from any source to any destination remotely.

I’ll do this on website and post all related pictures regarding to this migration.
I am going to use VMware converter which will deal with everything.


– Running VMware server

– Source machine with SSH or RDP connection (Linux/Windows)

– Destination VMware server

– VMware converter (free to download from VMware site)

Ok let’s start up the Vmware converter and connect up to the source machine.

Here you need to select source type as Powered on machine.
Also you need to add the login name and password.




At the next tab you need to type the destination machine’s IP address and the login details also.




Then at the next tab you must add the machine name.
On Linux this will be picked up from the host file automatically, you could leave it as it is if you prefer.



At the next step you must be really careful with the machine version number.
VMware automatically offers Version 10 which can only be managed from VCenter and that is not free.
So change this to version 8 or lover then you will be able to manage the machine from ESXi vSphere client.
This version means the machine hardware type. I use version 8 which is the highest available free version with out any licensing issue and cost.



Also on the destination page you should choose which datastore you want to use for the machine.

At the next tab converter will ask you the final parameters regarding to the conversion.
Here you must edit the Helper VM network tab and add an extra IP on the local network where the destination VMware server is.
With out this usually the converter dies at around 1% or 2% with out any extra notification.





Also you should check the option at advanced option. Reconfigure destination virtual machine should be ticked.
This will fix the initramdisk on the destination machine.


After this we can start the real migration process.



This will create the machine at the destination server and automatically starts it up and start pulling down the data from the source machine.


Destination server console with the running machine while it’s pulling down the data from source:


You can see the progress is quiet quick, it’s depend on the actual network speed and the source and destination machine CPU and disk speed.


At the source machine VMware converter uses tar command to compress the full disk into an image and send it through via the network to the destination machine as a compressed file.
The source gets overloaded a bit wit this process, but of course it’s depend on the source box. This is not a real machine just only a virtual one with 1 CPU socket and 512 RAM.
So this is definitely not a strong box and it runs other web sites at the moment, so that’s why the load is high on top.


Now let’s see the finished converted machine:


Indeed it reached the final stage and pull down the whole machine is about an hour.
The destination machine currently switched off on the destination server, but it contains the full copy of the source.
So let’s see if it boots up with out any issues:


Seems like we got a kernel panic because of the local disk UUID was missing.
So right now we will need CentOS disk and boot up the box in rescue mode to fix the disk UUID.
Upload the CentOS version that you used on the source box and add it to VMware server.
I’ve got 64 bit on 7layer, so I’ll use that to fix the destination machine.




Boot up the machine but be quick you will have only about 1 sec to press escape at the BIOS screen and choose to boot from CDROM.
At the CentOS boot screen then you should choose Rescue installed system option to fix the box.



Then choose the continue option and then this give you the option to modify the root file system

Then at the next screen you can see the root system mounted under /mnt/sysimage.




Then choose the shell screen menu.





Then now we have all the root system mounted, so we can check the fstab and boot entries on the box.
Check the grub device map /boot/grub/ and /boot/grub/grub.conf regarding to the HDD type.
Also check the fstab /etc/fstab if it’s correct.

Old fstab:



New fstab corrected by VMware at converting:


Also device map looks correct:


After we checked these we need to fix the grub loader and the initramdisk.

This procedure can be found on VMware site also:

Rebuild initramsdisk:


mkinitrd -v -f /boot/initramfs-2.6.32-431.29.2.el6.x86_64.img 2.6.32-431.29.2.el6.x86_64

The line should matches with your grub kernel config. So check it in /boot directory.
This takes about 5-10 sec to rebuild the whole initramdisk.

Also we need to correct grub boot loader disk UUID. Either follow one of these steps below:

Correcting grub loader with UUID change:

Run ls -l /dev/disk/by-uuid to check the correct UUID for the sda3 disk.

This is a very long line and easy to make mistakes here, so it’s better to be added with ls command to grub.conf and then move it to the correct place.

ls -l /dev/disk/by-uuid >> /boot/grub/grub.conf then move it from the end.

In this case the last line contains the correct UUID which is /dev/sda3.
So move this to root=UUID=  line





Correcting grub loader with changing only kernel line in /boot/grub/grub.conf:


So now you can run grub-install with the corrected disk name:



When it finished try to reboot the box. Usually when you try to reboot ot halt commands here in rescue mode wont work.
Use from the VMware machine top menu ==>> VM ==>> Guest ==>> Send Ctrl+Alt+del tab to reboot the rescue disk.
And wait to reboot the box and see if it works fine.




Voila it’s booting up! :)

If you have trouble with /lib/modules/{kernel-number}/modules.dep at boot then you need to rebuild the initramdisk again.
Try to investigate via VMware site and carefully check the initramdisk name and kernel name at the boot directory.

I’ll mark the important parts which should be corrected otherwise the kernel wont boot:


[ grub]# cat grub.conf | head -n 20

# Hetzner Online AG – installimage
# GRUB bootloader configuration file

timeout 5
default 0

title CentOS (2.6.32-431.29.2.el6.x86_64)
root (hd0,1)
kernel /vmlinuz-2.6.32-431.29.2.el6.x86_64 ro root=UUID=c8fbeb09-a9d6-449f-8a99-6f83b7cf4362 rd_NO_LUKS rd_NO_DM nomodeset crashkernel=auto SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=de
initrd /initramfs-2.6.32-431.29.2.el6.x86_64.img


cat menu.lst | head -n 20
# Hetzner Online AG – installimage
# GRUB bootloader configuration file

timeout 5
default 0

title CentOS (2.6.32-358.6.1.el6.x86_64)
root (hd0,1)
kernel /boot/vmlinuz-2.6.32-358.6.1.el6.x86_64 ro root=UUID=c8fbeb09-a9d6-449f-8a99-6f83b7cf4362 rd_NO_LUKS rd_NO_DM nomodeset
initrd /boot/initramfs-2.6.32-358.6.1.el6.x86_64.img


(hd0) /dev/sda






SPF record setup for mail server

How to set up and test SPF record for mail server:

Let’s check Google’s SPF record first with dig command.

[root@mail ~]# dig txt

; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.30.rc1.el6_6.1 <<>> txt
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 52169
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0


;; ANSWER SECTION: 3599 IN TXT “v=spf1 ip4: ip4: ~all”

;; Query time: 12 msec
;; WHEN: Fri Feb 27 08:47:05 2015
;; MSG SIZE rcvd: 116

[root@mail ~]#

In the answer section you can see the IP addresses. These are the servers which allowed to send mails via
So you have your domain name e.g. and you have your mail server on it with an A record This server can send mails for its own name, but any other servers are not allowed to send mails. With the SPF record, you can send mail from the IP address via google’s mail server. So server and 72 can send mails (relay) via google’s mail server.

Also you can use domain names in SPF record and tell the server to use that instead of the IP address.

[root@mail ~]# dig txt

; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.30.rc1.el6_6.1 <<>> txt
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64785
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0


;; ANSWER SECTION: 21599 IN TXT “v=spf1 ip4: ~all” 21599 IN TXT “v=DMARC1\; p=none\; adkim=r\; aspf=r\; sp=none”

;; Query time: 36 msec
;; WHEN: Fri Feb 27 08:59:31 2015
;; MSG SIZE rcvd: 225

[root@mail ~]#


Create and check SPF records:

Header check for emails to analyse SPF and other issues:




MIPS Development

MIPS related development board from Imagination technology.

I just received my new IC20 MIPS based development board from Imgtec and I must say this is a piece of engineering art! :)

Already setup Apache2 web server, ssh-server, My-SQL server and also basic firewall (with iptables) and a Postfix mail server.
I’m going to do an SMS server and hook it up to my CCTV system and publish everything about it shortly.

Thank you again Imgtec!

Related links to CI20 and other Linux based hardware development:


Debian Distro Upgrade

So let's make our hands dirty with some Debian Linux distro update!
It happened to be this week I have received a complaint against one of our server, which had some dodgy outdated PHP packages installed on it.
I had to investigate that what has happened with the box and fix the issue.
I figured out it has Debian lenny installed on it, which considered quiet old and end of life support.
For this release has no security update since 2012, so it must be updated to never release to fix this issue.
Although this box is behind a firewall, but still it's dangerous to have an outdated box sitting on the net.
So I had to do full distro update on the box, which will follow here:
First I installed the latest packages from the original distro, which was lenny.
After the update I rebooted the box and changed the source to squeeze:
# nano /etc/apt/sources.list
deb wheezy main contrib non-free
deb-src wheezy main contrib non-free
deb wheezy/updates main contrib non-free
deb-src wheezy/updates main contrib non-free
Then I started the upgrade process like this:
# aptitude update
# aptitude safe-upgrade
# aptitude dist-upgrade
Follow the instructions by the aptitude, it will asks what you want to do with the conflicting packages.
For example php.ini has a modified version, then what to do?
Keep the current modified version or use the provided one by the distro?
Sometimes you need to use the distro provided config file otherwise the service wont be able to start up.
For example I kept the mysql-server config and the new version could not start up.
So I replaced to the new one and modified the config with some old settings and viola it started up just fine.
So to do upgrade from lenny to wheezy you must upgrade first to squeeze, then to wheezy:
lenny -> squeeze -> wheezy
Be patient and prepare few good coffee for the upgrade, because it will take some time!

VMware free backup solution from virtuallyGhetto


VMware free backup solution for ESXi servers:

Download the script from github: then modify it for your system to fit in.

I’m going to explain the important parts that I usually change in this script:

In file:

– Backup path
– Rotation
– Backup format
– Email server
– Email to
– Email from


# directory that all VM backups should go (e.g. /vmfs/volumes/SAN_LUN1/mybackupdir)


# Format output of VMDK backup
# zeroedthick
# 2gbsparse
# thin
# eagerzeroedthick


# Number of backups for a given VM before deleting

Also in ghettoVCB.conf







When you uploaded the and ghettovcb.conf files you need to add execute flag to the file:

chmod +x

Then you can start backing up your VM machines.

Backup only one machine run this:

./ -m vm_to_backup

Backup all machines:

./ -a

If you want to machine to be avoided from backup then use an except file to achive this:
./ -a -e vm_exclusion_list

VMware firewall wont let allow to send outgoing emails from the script, this need to be fixed.
Upload smtp.xml file to VMware server and update the firewall, with out this you will receive an error on VMware ssh console.

Script:  smtp.xml

Upload it to /etc/vmware/firewall and run esxcli update:

esxcli network firewall refresh

Then click on server name, configuration, security profile and you will see the new smtp outbond port appeared as a new outgoing firewall rule to allow smtp outgoing traffic from the server.




Altough you can use the restore script: to restore machines from backup, but you can use them straight away when you add to your machine from the backup script.
This is much quicker then the restore process, but obviously the machine will reside on the backup path not on the original path. With this you can get back the machine ASAP, then create a backup onto the original path and shut down the backup path machine and add to the inventory the original path machine and start it up.


Only one thing left to do is to make this process be automatic.
Edit crontab on your server and add this to it:
10 00 * * 1-5 /vmfs/volumes/ -f /vmfs/volumes/Fuji-NAS/backuplist > /vmfs/volumes/ghettoVCB-backup-$(date +\%s).log

Crontab file located on VMware ESXi 5.5 at: /var/spool/crontabs/ and root file contains the current configuration for crontab.

cat /var/spool/cron/crontabs/root

#min hour day mon dow command
1 1 * * * /sbin/
1 * * * * /sbin/
0 * * * * /usr/lib/vmware/vmksummary/
*/5 * * * * /sbin/hostd-probe ++group=host/vim/vmvisor/hostd-probe
10 00 * * 1-5 /vmfs/volumes/datastore1/root/opt/ -f /vmfs/volumes/datastore1/root/opt/vmbackup.txt

This will run backup on every day at 10'o clock, but you can change it according to your needs.


NAS4Free High available iSCSI failover VMware server. 

The following post will be how to install and set up NAS4Free server for your ESXi/ESX VMware server as an iSCSI storage.
NAS4Free is based on FreeBSD and has all the required services to serve your system as a High-Available Storage server. (HAST and CARP)
Of course you can use this solution in your network as a High-Available storage or as a Windows cifs samba server, if you modify the services on NAS4Free.
I’ll stick first to the iSCSI setup and later we will show you how to set up NFS and Windows(SAMBA) shares.

The following setup used here:

Node1 primary IP address for serving iSCSI and CARP services:
Node1 secondary IP address for HAST synchronisation:

Node2 primary IP address for serving iSCSI and CARP services:
Node2 secondary IP address for HAST synchronisation:

Virtual IP address(CARP address) for iSCSI service:

Node1 host name: has1
Node2 host name: has2

Install both nodes with lates NAS4Free edition.

– Change node names according to your set up for example: node1 and node2.


– Add node names to host file on both nodes.

– Setup carp services under Network/Interface management:


– Advertisement skew on has1 node: 0
– Advertisement skew on has2 node: 10

If has1 node dies then has2 node will take over all the services.


You must use same link up and link down action on both side of the nodes otherwise the switch over wont work properly!
So everything should be the same except the advertisement skew value.

Next step setup HAST services:



As you can see here the second network interface card used for the HAST service synchronisation not the main interface.
After you setup HAST service reboot both nodes, the apply wont help to start the services for some reason. 

– Switch on ssh service and ssh into both nodes.

On Master issue these commands:

hastctl role init disk1
hastctl create disk1
hastctl role primary disk1

On Slave issue these commands:

hastctl role init disk1
hastctl create disk1
hastctl role secondary disk1

Check both nodes with: hastctl status

Then configure ZFS
On Master:

Add disks (Disks->Management)

disk1: N/A (HAST device)
Advanced Power Management: Level 254
Acoustic level: Maximum performance
S.M.A.R.T.: Checked
Preformatted file system: ZFS storage pool device

Format as zfs (Disks->Format)

Add ZFS Virtual Disks (Disks->ZFS->Pools->Virtual Device)

Add Pools(Disks->ZFS->Pools->Management)

Add PostInit script on both nodes to /system/advanced/command scripts/ tab.
/usr/local/sbin/carp-hast-switch slave

Shut down the master and on the slave import the pool through the GUI.  Tab: /ZFS/Configuration/Detected
Then synchronise the pool on the slave!

When finished on slave, start master and switch VIP back to master.

zpool status disk1
hastctl status

Troubleshooting commands from SSH terminal:

zpool status

nast1: ~ # zpool status mvda0
  pool: mvda0
 state: UNAVAIL
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run ‘zpool clear’.
  scan: none requested

        NAME                   STATE     READ WRITE CKSUM
        mvda0                  UNAVAIL      0     0     0
          2144332937472371213  REMOVED      0     0     0  was /dev/hast/hast

If status unavailable then you could try:

zpool clear “pool name”

It will scan and scrub the local disks.

nast1: ~ # zpool status mvda0
  pool: mvda0
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
  scan: scrub in progress since Mon Jun  2 15:26:25 2014
        1.19G scanned out of 1.43G at 28.3M/s, 0h0m to go
        0 repaired, 82.75% done

        NAME         STATE     READ WRITE CKSUM
        mvda0        ONLINE       0     0     0
          hast/hast  ONLINE       0     0     0

Then check pool again:
zpool status

nast1: ~ # zpool status
  pool: mvda0
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Mon Jun  2 15:27:17 2014

        NAME         STATE     READ WRITE CKSUM
        mvda0        ONLINE       0     0     0
          hast/hast  ONLINE       0     0     0

Recreate sync on disks or split brain:

On Master issue these commands:

hastctl role init disk1
hastctl create disk1
hastctl role primary disk1

On Slave issue these commands:

hastctl role init disk1
hastctl create disk1
hastctl role secondary disk1

If you lost sync because of disk error or network error then you could recreate the sync between the hast disk(s).
Just recreate the roles and the nodes will start syncing the data. (use commands above)  Be careful with the roles and the nodes, don’t mix them up!
If you recreate the roles and the disks, you wont lose data at all. It will only start synching the disk(s) bt wont overwrite data.

If it a split brain scenario then you should decide which node has the newer data and issue the above commands according to the data. So for example if the secondary node has newer data then the primary then obviously you should issue: role primary on the second node and role secondary on the primary node and vica-versa.