Configuring NMX Server Redundancy
Configuring NMX Server Redundancy
Configuring NMX Server Redundancy
Specifically, NMX redundancy sets up dat abase replication of a Master NMX server database
onto Backup NMX server database. The two servers maintain identical databases using SQL
replication mechanisms, but only the master server has all the processes to control head-end
devices. The master and the Backup servers exchange messages for arbitration purposes, and
when the active server fails, the Backup server takes over.
Replication can only be set for NMX Catalogs. There is no s upport for SMS (reports) catalog
replication. SMS catalog is copied from Master NMX to Backup NMX during redundancy setup
and there is no data being pushed to Backup when there is any change in SMS data. The Backup
NMX pulls the SMS data on a daily basis every day at 6:00AM using SQL server automated job
(PullSMSData). You can see this job on the Backup NMX from SQL Server Management Studio
(SSMS) -> SQL Server Agent ->Jobs. This job only runs on the Backup NMX when Master NMX
is active. If the Backup takes over before 6:00AM in the morning, all the changes done to SMS
catalog on Master NMX are lost for that day, since this SMS data is not critical to the customer,
this is not a big issue, and designed this way.
Please note that when the redundancy is set and Master NMX is active, the catalog name on the
Backup would be N MX_REPLICATION. SMS catalog name is the same on Mas ter as well as
Backup NMX.
1.2 How do the 2 NMX servers decide which is the active controller of the head
end equipment
NMX Master and Backup servers are peers that monitor each other through a c onstant
messaging mechanism (heartbeat messages). When one o f the servers does not receive a
heartbeat message, that NMX server first pings its own gateway to ensure that they themselves
are connected to the network.
If the Backup NMX server is connected to the network and loses 3 consecutive heartbeat
messages from the Master server, it takes control by becoming active server. This is considered a
fail-over. The Backup server then becomes the active server. Note in this case, only Backup NMX
is monitoring the devices, there is no protection available if the active Backup NMX were to fail.
So it is suggested, as soon as you see this happen, either do Master Recovery or set Backup as
Master operation to put the NMX system back in redundancy.
If the Master NMX server is connected to the network and loses 3 consecutive heartbeat
messages from the Backup server, it retains the control. As soon as Backup NMX comes back
up, it starts sending the HB message to Master, and detecting this master NMX server can put the
Backup NMX on Standby mode without any user intervention on Master NMX front.
Once the Backup server takes control, it renames and pr epares the databases as the standard
databases names customer named them originally on Mas ter. All the connected client
applications time out and shut down until reconnected.
The frequency at which the two servers exchange HB messages is set at 5 seconds.
Predetermine if SMS server is needed. If needed, then it should be added before NMX
redundancy is setup to avoid duplicating the effort, since redundancy will have to be
removed if these databases are to be added later.
Keep a Backup copy of all the databases before setting up redundancy as a precaution
Ensure in case of multiple NIC cards the default gateway should be defined for the
management NIC interface (to view the default gateway use ipconfig from command
prompt).
Please ensure that Master and Backup NMX servers are configured with same time zone,
this validation is not done by NMX redundancy wizard today.
Ensure that the PC Server name is less than 15 characters long (NETBIOS limit). Having
longer names will cause NMX Redundancy setup to fail.
NMX redundancy wizard will run lots of validations on Master and Backup NMX, the list of
validations that are carried out are listed here.
Post NMX redundancy set up, use the Auto-restart menu under the Server Fail Safe top level and
set up bot h the Master and Backup for auto restart and provide the appropriate login and
password information. Typically Windows Administrator login is used for this purpose. The
password used cannot be an empty string. If any user other than Administrator is specified then
that user must belong to Administrator group on that machine.
After going through the wizard successfully, the active NMX will transition through various states
and turn green and the Backup NMX will turn yellow and be at Standby state. If the Backup says
‘Unreachable’ or ‘Stopped’ then ping from Master to ensure that Backup is reachable. Shutdown
and restart of the Master can be tried if PING is successful.
Virtual IP's setup screen allows the user to enter virtual IP's and s elect the physical MAC's to
which they need to be assigned. The user can select a virtual IP for every physical NIC card that
he has on the PC.
The main option on the VIP menu identifies the Management Nic.
Please ensure the virtual IP addresses belong to the same subnet as that of Master and
Backup Nics selected and are not already assigned on the network. This check is handled
by validation wizard now.
While setting up on a network WITHOUT any DNS servers make sure that hosts file is
properly setup with the IP and PC names for the peer machine. i.e., the hosts file on the
Master should have entries for the Backup PC and vice-versa. This check is handled by
validation wizard now.
These virtual IP addresses are assigned on t he active Domain after successful domain startup
and remain assigned until shutdown or redundancy switch-over happens. Please note however
disabling and enabl ing the NIC card using Windows Network settings will remove the virtual IP
assignment. The user should refrain from doing so on an active Domain.
NMX will automatically check for various validation rules upon redundancy setup. Once
redundancy is setup these checks can also be m anually performed by the user by clicking on
Server Fail Safe -> Validate Backup Configuration option.
The spread sheet below shows all the checks that NMX will verify for you at the time of
redundancy setup or post setup.
Redundancy validator will carry out various checks on Master and Backup NMXs and if it finds any
errors, it will report the errors and us er would be gi ven the option to fix them. It will make
comprehensive list of checks during redundancy setup to ensure once redundancy is setup, the
chances of replication failure are very less. User will be pr ovided with an option to save the
redundancy validation results to a file.
Background Color – This legend indicates, check can be auto fixed by the NMX
Strike through indicates, the check has been removed
The redundancy wizard will show the result in a s pread sheet to the user with all the validation
results. If any of the validation is failed, and If it cannot be fixed automatically, then user needs to
intervene and correct those errors manually. Post correcting manual failed validations, user can re
validate or re-setup the redundancy. And if there are failed validations that can be automatically
fixed, wizard will provide fix option to the user. Please note that, if there is any manual validation
failure is present, auto fix button will be disabled, unless user corrects those errors first, automatic
fix won’t be enabled.
Check Site ID matches on both servers - This site id can be verified by visually looking at the
HLORManager.ini file on the two machines that are going to participate in the NMX redundancy. If
they are different, the software will correct them during the setup process.
Check NMX software version on the two servers - The software version can be verified either
by looking under the Harmonic registry entry or by launching the Domain manager GUI and
checking the version in the about box.
Verify Network connectivity – Ensure that Master can ping Backup NMX by machine name, and
vice versa. If proper DNS servers are not configured, please add hos t file entries to enable
machine name pinging.
Machine name and SQL name - For NMX redundancy to work correctly, each of the two servers
must have distinct hostnames and distinct SQL server names. Validation wizard makes sure this
is true.
Embedded software - When NMX redundancy is setup; the system will copy the files.ini, which is
a listing of the embedded software present on the Master server to the Backup server for its use
when fail over happens. If there is a mismatch between embedded software between Master and
Backup NMX, validation wizard will warn the user, and on dom ain start up, it will automatically
sync the Master firmware on to the Backup NMX. Post redundancy setup, whenever (if ever) new
embedded software is manually added to the TFTP directory it must be copied to both Master and
Standby machines. Or use the Tools->Transfer Software option from Master DM Gui.
Backup and Restore of catalogs - When NMX redundancy has been applied and the catalogs
backed up, the redundancy information will also be backed up, including virtual address settings.
Care must be taken to restore these catalogs only on systems that have identical settings for the
restored database to work correctly.
Connections to serial devices - When setting up N MX redundancy user must verify serial
devices connections that exist in the system can be c ontrolled from either server. This can be
achieved either directly if the serial device supports two or more connection ports or by the use of
3rd party port servers that simulate serial COM connections like the DigiServer port. Please refer
to corresponding manuals for using such devices. Failure to comply with this requirement will
result in switches being in timeout after server fail over.
Outlook email account – The user should ensure that identical outlook email setup exists on the
Backup NMX PC, such that when the Backup server becomes active, the NMX can successfully
continue to send VOD statistics and alarm statistics reports.
Note: When you perform a new NMX installation you must restart your computer and login to
Domain Manager and create a new catalog. Refer to the procedure shown below.
1. Make sure ECL license is configured on both Master and Backup NMX.
3. Configure the default gateway and ensure that the Backup and master servers can
access it.
4. Provide different computer names for the master and Backup server. If you have the
DNS server established, skip step 5.
5. Edit the hosts file in each computer so that both computers can access each other by
computer name and IP numeric address.
On the Master computer, add a line with the Backup machine's management IP
address and its name.
On the Backup computer, add a line to this file with the Master machine's
management IP address and its name.
6. Test the access (by pinging) by numeric IP address and access by computer name to
verify connections.
From the Command window, enter ping <computername>. If the system responds
with "unknown host," the computers are not set up correctly.
7. Open the Domain Manager and stop the domain manager on the Backup server if it is
running.
8. Back up all databases on the Master server. (NMX and SMS catalogs)
9. On the Master computer, click on Database -> Catalog Management to open the
Database Configuration.
10. Create all necessary database catalogs on the Master server, including the NMX, and
SMS catalogs
1. In the Domain Manager, select Server Fail Safe > Setup. The NMX Redundancy Setup
Wizard opens.
2. Read the first screen, and t hen click Next and ens ure that the system is compliant with
suggested guidelines.
6. Optionally, Click Save to file... in order to save the validation results to an *.xml file so you can
troubleshoot the validation issue later.
7. Click Close to proceed to the third step of the Redundancy Setup Wizard.
The Define Virtual IPs dialogue box appears.
8. Click Add to define a unique virtual IP address. The virtual IP address is not mandatory, but is
needed if external clients are connected to the system and ar e designed to handle the
concept of server redundancy. The virtual IP address is assigned to active NMX NICs only.
When a failover occurs this virtual IP is moved to a Backup server by NMX.
9. In the Virtual IP field, add an I P address. The virtual IP address must belong to the same
subnet and cannot be already assigned on the network.
11. From the Master MAC Address drop-down list, select the MAC address of a NIC card on
the master server. When you start the domain, the virtual IP address is allocated, not on
redundancy setup.
12. From the Backup MAC Address drop-down list, select the MAC address of the
corresponding NIC card on the Backup server. If you want to add another Virtual IP address
for different Nic, click Add, and follow the same process for the Master and Backup.
13. Select the Main check box to show which virtual address corresponds to the management
NIC. Main signifies the management NIC of the NMX server.
14. Click OK. The next step of the wizard shows the process of establishing the replication
between the Master and Backup databases and updates the registry keys. If the setup is
successful, you will see a Redundancy Setup Completed Successfully message.
15. Click Finish to close the wizard. Once you have set up NMX redundancy, you will see three
columns in the Domain Manager window: Component, State on Mas ter, and S tate on
Backup. The last two columns display the state of the component on t he master and
Backup PCs. The status of any firmware copying is displayed at the bottom of the Domain
Manager interface. During the copying process, the master server is accessible but the
Backup server is not available.
16. (Optional) Test the setup by forcing a switch, followed by recovering the master server to
ensure and verify the redundancy setup is working correctly.
Once the firmware version sync is done, then virtual IP is set, and then the Backup NMX will go
in to “Standby” mode.
After you restore the failed master server, you should immediately perform one of the following
operations:
Master Recovery: Recover the master server. Choose this option if the restored Master server is
better or faster than the Backup server. This operation, performed when the domain is not
running, copies the Backup database to the Master database and sets the Master to "Active" and
the Backup to "Standby."
Set the Backup server as the master server. Choose this role-swapping option if you want to set
the Backup server as the new Master server, and set the failed and restored master server as the
new Backup server. This function is available while the server is running.
All the DSM GUI clients will timeout and shutdown when a fail over occurs. The TCP clients will
also detect a t imeout and af ter the fail over the Backup NMX will be av ailable for reconnection
again. Since the virtual IP address is used that migrates to the server that is actually in control, the
GUI and the clients need to know only one IP address to connect to.
Backup PC could be in one of 2 modes: ‘Backup Standby’ or ‘Backup Active’ When in ‘Backup
Active’ mode two menu items ‘Master Recovery’ and ‘Set Backup as Master’ are available to the
user.
When Backup takes over after detecting a failure, it will assert two alarms one to indicate that the
Master has failed and anot her to indicate that the Backup server has taken over successfully.
These alarms can be s een on t he alarm viewer at the site level on the Backup server that is
active.
User also has the option to manually switch to Backup Nmx, by clicking on Server Fail
Safe -> Switch To Backup option, this option is only enabled when server is running. This
step will follow the exact same procedure as automatic failure switch over.
This Operation will run the recovery process that will copy the Backup catalogs to the Master NMX
and set back the mode on Master to ‘Master Active’ and set Backup state to ‘Standby’.
This Operation will set the active Backup as the new master, while the failed master will be set as
the new Backup server. The role swap operation could be invoked even when the server is
running.
Note: The site ID is a unique number for each machine except in the case of NMX redundancy,
in which case both machines share the same ID.
4. Click OK.
When the operation completes, a message indicating that the NMX redundancy was
removed successfully displays.
This option is available only when the server is not running. This operation will remove the entire
redundancy definitions, setting back all PCs and releasing the Backup for normal use.
The user will be prompted to enter a distinct site ID for the Backup server.
Redundancy can be removed when Master is active or when Backup is active.
If redundancy is removed when Master is active, the catalog on the Master is in up-to-date state.
If redundancy is removed when Backup is active, the catalog on the Backup is in up-to-date state.
If there is a replication failed alarm, manually “Switch To Backup” option will pop up a dialog box
with stale catalog warning along with last replication time on c atalog, and if user is willing to
proceed, he can click on OK to continue.
Post automatic fail-over case, if the domain manager detects stale catalog in the Backup, the
domain will not come up. User should manually start the domain; in that event NMX will pop up a
dialog box with stale catalog warning along with last replication time on catalog. In this case user
has the option to go back to Master Catalog if he thinks the catalog on the Master is up to date, he
can then remove the redundancy manually and re configures the redundancy. Or if user is aware
that there was no change on the catalog done during that said period he can go ahead with stale
catalog on Backup domain.
Backup NMX having Stale catalog can cause Service outages, but the big problem is to detect the
stale catalog.
Our approach to detect stale catalogs is for the Master domain to periodically (every 5 seconds)
insert the timestamp (current time) into a database table (Domain_ProductVersion). Backup NMX
can periodically (every 5 seconds) check the timestamp in the Backup table and compare against
current time. If they are not within 2 minutes gap, Backup database replication is considered to be
out of order. When this is detected, “Replication Failed” alert will be flashed on DM GUI.
2. If the Redundancy Wizard complains about NIC (Network Interface Card) Priority
• Change the NIC order by going to: Control Panel > Network and Internet > Network
Connections. Select Advanced > Advanced Settings. In the Connections box, use the
arrows to move the required NIC to top of the list.
3. Log Files
• To help you and Harmonic Support identify any NMX issues, the DomainManager.log
file is generated with any critical failures and saved to:
C:\ProgramData\Harmonic\NMX\SharedFiles
Send it to Harmonic Support for further debugging and investigation.
Mark the file name with Master and Backup with so that support can identify which
log file belongs to which server.