SG 248426
SG 248426
SG 248426
Implementing IBM VM
Recovery Manager
for IBM Power Systems
Dino Quintero
Jose Martin Abeleira
Adriano Almeida
Bernhard Buehler
Primitivo Cervantes
Stuart Cunliffe
Jes Kiran
Byron Martinez Martinez
Antony Steel
Oscar Humberto Torres
Stefan Velica
Redbooks
IBM Redbooks
October 2019
SG24-8426-00
Note: Before using this information and the product it supports, read the information in “Notices” on
page vii.
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Chapter 3. Planning and deploying IBM VM Recovery Manager High Availability for IBM
Power Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
3.1 VM Recovery Manager HA requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
3.1.1 Software requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
3.1.2 Firmware requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
Contents v
vi Implementing IBM VM Recovery Manager for IBM Power Systems
Notices
This information was developed for products and services offered in the US. This material might be available
from IBM in other languages. However, you may be required to own a copy of the product or product version in
that language in order to access it.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user’s responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not grant you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, MD-NC119, Armonk, NY 10504-1785, US
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any
manner serve as an endorsement of those websites. The materials at those websites are not part of the
materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you provide in any way it believes appropriate without
incurring any obligation to you.
The performance data and client examples cited are presented for illustrative purposes only. Actual
performance results may vary depending on specific configurations and operating conditions.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
Statements regarding IBM’s future direction or intent are subject to change or withdrawal without notice, and
represent goals and objectives only.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to actual people or business enterprises is entirely
coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs. The sample programs are
provided “AS IS”, without warranty of any kind. IBM shall not be liable for any damages arising out of your use
of the sample programs.
The following terms are trademarks or registered trademarks of International Business Machines Corporation,
and might also be trademarks or registered trademarks in other countries.
AIX® Parallel Sysplex® Redbooks®
DB2® POWER® Redbooks (logo) ®
GDPS® POWER8® Storwize®
IBM® POWER9™ System Storage™
IBM Cloud™ PowerHA® SystemMirror®
IBM Spectrum® PowerVM®
The registered trademark Linux® is used pursuant to a sublicense from the Linux Foundation, the exclusive
licensee of Linus Torvalds, owner of the mark on a worldwide basis.
Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other
countries, or both.
Red Hat, are trademarks or registered trademarks of Red Hat, Inc. or its subsidiaries in the United States and
other countries.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Other company, product, or service names may be trademarks or service marks of others.
This IBM® Redbooks® publication describes the IBM VM Recovery Manager for Power
Systems, and addresses topics to help answer customers' complex high availability (HA) and
disaster recovery (DR) requirements for IBM AIX® and Linux on IBM Power Systems servers
to help maximize systems' availability and resources, and provide technical documentation to
transfer the how-to skills to users and support teams.
The IBM VM Recovery Manager for Power Systems product is an easy to use and
economical HA and DR solution. Automation software, installation services, and
remote-based software support help you streamline the process of recovery, which raises
availability and recovery testing, and maintains a state-of-the-art HA and DR solution. Built-in
functions and IBM Support can decrease the need for expert-level skills and shorten your
recovery time objective (RTO), improve your recovery point objective (RPO), optimize
backups, and better manage growing data volumes.
This book examines the IBM VM Recovery Manager solution, tools, documentation, and other
resources that are available to help technical teams develop, implement, and support
business resilience solutions in IBM VM Recovery Manager for IBM Power Systems
environments.
Authors
This book was produced by a team of specialists from around the world working at IBM
Redbooks, Austin Center.
Jose Martin Abeleira is a Senior Systems and Storage Administrator at DGI Uruguay. He is
a former IBMer, a prior IBM Redbooks author, a Certified Consulting IT Specialist, and an IBM
Certified Systems Expert Enterprise Technical Support for AIX and Linux in Montevideo,
Uruguay. He worked with IBM for 8 years and has 15 years of AIX experience. He holds an
Information Systems degree from Universidad Ort Uruguay. His areas of expertise include
IBM Power Systems, AIX, UNIX, and Linux; Live Partition Mobility (LPM), IBM PowerHA®
SystemMirror®, storage area network (SAN), and Storage on IBM DS line; and V7000,
HITACHI HUSVM, and G200/G400.
Bernhard Buehler is an IT Specialist in Germany. He works for IBM Systems Lab Services in
Nice, France. He has worked at IBM for 37 years and has 28 years of experience in AIX and
the availability field. His areas of expertise include AIX, Linux, PowerHA SystemMirror, HA
architecture, script programming, and AIX security. He is a co-author of several IBM
Redbooks publications. He is also a co-author of several courses in the IBM AIX curriculum.
Stuart Cunliffe is a senior IBM Systems Consultant with IBM UK. He has worked for
IBM since graduating from Leeds Metropolitan University in 1995, and has held roles in
IBM Demonstration Group, IBM Global Technologies Services (GTS) System Outsourcing,
eBusiness hosting, and ITS. He currently works for IBM System Group Lab Services where
he specializes in IBM Power Systems, helping customers gain the most out of their Power
Systems infrastructure with solutions that offer PowerHA SystemMirror, PowerVM, IBM
PowerVC, AIX, Linux, IBM Cloud Private, IBM Cloud Automation Manager, and IBM DevOps.
Jes Kiran is a Development Architect for IBM VM Recovery Manager for HA and DR
products. He has worked in the IT industry for the last 18 years, and has experience in the
HA, DR, cloud, and virtualization areas. He is an expert in the Power, IBM Storage, and AIX
platforms.
Antony Steel is a Senior IT Specialist in Singapore. He has had over 25 years of field
experience in AIX, performance tuning, clustering, and HA. He worked for IBM for 19 years in
Australia and Singapore, and is now CTO for Systemethix in Singapore. He has co-authored
many IBM Redbooks publications about Logical Volume Manager (LVM) and PowerHA
SystemMirror, and helped prepare certification exams and runbooks for IBM Lab Services.
Stefan Velica is an IT Specialist who currently works for IBM GTS in Romania. He has 10
years of experience with IBM Power Systems. He is a Certified Specialist for IBM System p
Administration, HACMP for AIX, high-end and entry/midrange IBM DS Series, and Storage
Networking Solutions. His areas of expertise include IBM System Storage™, SAN, PowerVM,
AIX, and PowerHA SystemMirror. Stefan holds a bachelor’s degree in Electronics and
telecommunications engineering from the Polytechnic Institute of Bucharest.
Wade Wallace
IBM Redbooks, Austin Center
P I Ganesh, Denise Genty, Kam Lee, Luis Pizaña, Ravi Shankar, Thomas Weaver,
IBM Austin
Maria-Katharina Esser
IBM Germany
Luis Bolinches
IBM Finland
Kelvin Inegbenuda
IBM West Africa
Shawn Bodily
Clear Technologies, an IBM Business Partner
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
Preface xi
Comments welcome
Your comments are important to us!
We want our books to be as helpful as possible. Send us your comments about this book or
other IBM Redbooks publications in one of the following ways:
Use the online Contact us review Redbooks form found at:
ibm.com/redbooks
Send your comments in an email to:
redbooks@us.ibm.com
Mail your comments to:
IBM Corporation, IBM Redbooks
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400
Note: In most cases when people talk about HA, they mean CA.
An active/active solution requires that the application is aware of the redundant components;
an active/passive and an active/inactive solution are exempt from this requirement.
Internal-managed
In an internal-managed environment, the operating system or the application has information
that it is part of a cluster. In most cases. HA components are installed that check whether the
cluster partners, hardware, and software components are reachable or available.
External-managed
In an external-managed environment, the operating system has no information about whether
it is part of a cluster.
As shown in Figure 1-3, an update or upgrade of the underlying hardware or firmware (FW)
can be hidden from the user. The outage period is the time that it takes to move to the new
hardware and FW environment. In both cases, you can make the change on the backup
system while the application is still running on the production environment. From a planning
point of view, you need a second downtime to move back to the production environment.
Make the same hardware and FW changes on the production environment before you move
the application back to it.
In an internal-managed environment, the update of the operating system and application can
be hidden from the user because you have multiple independent installations. As described in
1.1.1, “General concepts” on page 3, the update can be made on the backup system while
the application is still running on the current environment, as shown in Figure 1-3.
Unplanned outages
As listed in Figure 1-4 on page 6, the unexpected outage of the hardware, operating system,
or application can in some cases be covered in a similar way for both architectures. However,
if data corruption occurs, the outage can be different.
Because the data in both cases is available only once, the effect of data corruption is the
same. This is the worst case scenario because it requires a restore of your data from your
backup media. This restoration can take a long time, and your recovery point might be old.
The outage time also can differ based on the cause of the outage. An outage of one of the
components is detected much faster than data corruption on one of the components. The
outage detection time does not differ much regarding the selected architecture.
Measuring availability
In many cases, people measure availability in percentage (%) numbers. You might find
comments that are related to the three nines or four nines of HA. Table 1-1 shows what these
percentage values mean in hours or minutes.
RPO means how much data may be lost without a major business impact or how far can you
go back in your backups to get consistent data.
Depending on the selected solution and the used application, these items have different
values.
The following list includes but is not limited to the items that are represented in Figure 1-6:
People
The knowledge and experience of the system administrators managing the environment is
important to the stability and usability of the availability solution.
Data
An important aspect is that critical data must be redundant (by using a RAID 1, 5, or 6
setup) and backups must exist.
Hardware
The hardware must be able to handle the expected workload because a slow responding
system is as bad as a non-existing one.
Software
The software (application) must be able to automatically recover after a system crash.
Environment
The location of your data center is important, and it must not be too close to the coastline
or a river due to the high risk of flooding. Additionally, electrical power support must be
redundant.
The main purpose of DR is to have a defined and possibly automated procedure for recovery
from a major business impact, such as an outage of the whole data center as the result of an
earthquake, flooding, or storm.
RTO and RPO for DR are normally different from the RPO and RTO values for availability.
Note: RPO means how much data can be lost without suffering a major impact to the
business or how far back in time that you must go to get consistent data. The RPO for DR
is greater than the RPO for availability. The worse case is the time between the disaster
and when the last successful backup was completed.
Figure 1-7 on page 10 shows the RTO summary of the four major components:
Outage How long until the outage is recognized.
Prepare to repair How long until a decision is made whether a DR situation must be
declared.
Repair or fallover How long until fallover to the backup system occurs, or how long until
the original system is repaired or reinstalled.
Minimum service How long until full service is available on the same or different
hardware.
As for availability, a DR solution must also address the items that are described in “Other
considerations for high availability” on page 8.
In a few publications, the recommended minimum distance is listed as 5 km. However, this is
not sufficient when you think about major natural events such as earthquakes, hurricanes,
volcano eruptions, or other disasters. In such cases, even 100 km cannot be sufficient.
Active partition mobility moves AIX, IBM i, and Linux LPARs that are running, including the
operating system and applications, from one system to another. The LPAR and the
applications running on that migrated LPAR do not need to be shut down.
Inactive partition mobility moves a powered-off AIX, IBM i, or Linux LPAR from one system to
another.
However, while partition mobility provides many benefits, it does not provide the following
functions:
Partition mobility does not provide automatic workload balancing.
Partition mobility does not provide a bridge to new functions. LPARs must be restarted and
possibly reinstalled to take advantage of new features.
When an LPAR is moved by using LPM, a profile is automatically created on the target server
that matches the profile on the source server. The partition’s memory is then copied
asynchronously from the source system to the target server, which creates a clone of a
running partition. Memory pages that changed on the partition (“dirty” pages) are recopied.
When a threshold is reached that indicates that enough memory pages were successfully
copied to the target server, the LPAR on that target server becomes active, and any remaining
memory pages are copied synchronously. The original source LPAR is then automatically
removed.
Because the Hardware Management Console (HMC) always migrates the last activated
profile, an inactive LPAR that has never been activated cannot be migrated. For inactive
partition mobility, you can either select the partition state that is defined in the hypervisor or
select the configuration data that is defined in the last activated profile on the source server.
There are many prerequisites that must be met before an LPAR can be classified as LPM
ready. You must verify that the source and destination systems are configured correctly so
that you can successfully migrate the mobile partition from the source system to the
destination system. You must verify the configuration of the source and destination servers,
the HMC, the Virtual I/O Server (VIOS) LPARs, the mobile partition, the virtual storage
configuration, and the virtual network configuration.
For more information about preparing for partition mobility, see IBM Knowledge Center.
In addition to the minimum required FW, HMC versions, and VIOS versions, the high-level
prerequisites for LPM include:
The LPAR to be moved can be in an active or inactive state.
The source and target systems must be in an active state.
The source and target systems VIOSs that provide the I/O for the LPAR must be active.
You can use the HMC command miglpar or GUI to validate whether an LPAR is LPM-capable
without performing the actual migration. For more information, see IBM Knowledge Center.
Initiating LPM can be done by using the HMC GUI, the HMC command-line interface (CLI),
IBM Power Virtualization Center (IBM PowerVC), IBM Lab Services LPM Automation Toolkit,
and IBM VM Recovery Manager. However, there are a few restrictions:
The HMC GUI can initiate only a single LPM migration at a time.
The HMC CLI can initiate only a single LPM migration at a time, unless scripted.
IBM PowerVC can initiate only a single LPM migration at a time, unless frame evacuation
is used.
IBM PowerVC cannot migrate inactive LPARs.
You can confirm whether an IBM Power Systems server is licensed for LPM by going to the
HMC and clicking Resources → All Systems → System Name → Licensed Capabilities.
If the source server has a physical fault that requires fixing, then you can use SRR to recover
the key LPARs in a more timely manner. Also, if the source server is restarting, it might take
longer to start the server, in which case you can use SRR for faster reprovisioning of the
partition. This operation can complete quicker compared to restarting the server that failed
and then restarting the partition.
SSR (available from IBM POWER8® processor-based systems that use FW level 8.2.0 or
later and HMC version 8.2.0 or later) removes the need to use a reserved storage device that
is assigned to each partition. As a best practice, use SSR.
When an LPAR is restarted by using SRR, a new profile is automatically created on the target
server that matches the profile on the source server. That new profile is then mapped to the
storage LUNs that were being used by the original partition (the partition is inactive). The new
profile on the target server is then activated and the partition is again active. When the source
server becomes active, you must remove the old profile to ensure that the partition is not
accidentally restarted on that server (it restarts automatically). The automatic cleanup runs
without the force option, which means that if a failure occurs during the cleanup (for example,
RMC communications with the VIOS fails), the LPAR is left on the original source server and
its status marked as Source Side Cleanup Failed.
As with LPM, there are many prerequisites that must be met in order for SRR to work.
However, if LPM is not working for an LPAR, then SRR does not work.
Other than the minimum required FW, HMC versions, and VIOS versions, the high-level SRR
prerequisites include:
The source system must be in a state of Initializing, Power Off, Powering Off, No
connection, Error, or Error - Dump in progress.
The source systems VIOSs that provide the I/O for the LPAR must be inactive.
The target system must be in an active state.
The target systems VIOSs that provide the I/O for the LPAR must be active.
The LPAR that will be restarted must be in an inactive state.
The LMB size is the same on the source and the target system.
The target system must have enough resources (processors and memory) to host the
partition.
The target system VIOSs must be able to provide the networks that are required for the
LPAR.
In order for an LPAR to perform a remote restart operation, a flag must be set against it by
using the HMC. It is possible to view and set this flag dynamically by using the HMC.
You can confirm whether an IBM POWER® server is licensed for SRR by opening the HMC
and selecting Resources → All Systems → System Name → Licensed Capabilities, as
shown in Figure 1-9.
If PowerHA detects an event within the cluster, it automatically acts to ensure that the
resource group is placed on the most appropriate node in the cluster to ensure availability. A
correctly configured PowerHA cluster after setup requires no manual intervention to protect
against a single point of failure, such as failures of physical servers, nodes, applications,
adapters, cables, ports, network switches, and storage area network (SAN) switches. The
cluster can also be controlled manually if the resource groups must be balanced across the
clusters or moved for planned outages.
PowerHA Enterprise Edition also provides cross-site clustering where shared storage might
not be an option but SAN-based replication is available. In this environment, PowerHA uses
the remote copy facilities of the storage controllers to ensure that the nodes at each site have
access to the same data, but on different storage devices. It is possible to combine both local
and remote nodes within a PowerHA cluster to provide local HA and cross-site DR.
Ease of use
Both LPM and SRR are relatively simple to configure and use. Although this book does not
describe how to enable VM to be LPM- and SRR-ready, you can do this task in a few simple
steps and verify your success online. Both LPM and SRR are included in IBM PowerVM
Enterprise Edition so no extra software is required. You can start LPM by using the HMC GUI,
the HMC CLI, IBM PowerVC, IBM Lab Service, LPM Automation Toolkit, or IBM VM Recovery
Manager. You can start SSR by using the HMC CLI, REST APIs, IBM PowerVC, IBM Lab
Services LPM and SRR Automation Toolkit, or IBM VM Recovery Manager.
IBM PowerHA is a more complex product to set up because it requires a license and you must
install the cluster code on all nodes in the cluster. It also requires an experienced
IBM PowerHA consultant to design and implement a successful cluster. After the cluster is
active, it must be administered by an experience PowerHA administrator.
VM Recovery Manager for HA is simpler to configure than IBM PowerHA, but relies on LPM
and SRR being configured to function correctly.
The key differentiators in this section are availability and recovery times as shown in Table 1-2
on page 17. You can see how each product differs in terms of storage requirements,
automation, server status, and outages.
If you start with LPM, the source and targets servers (and their VIOSs) must be running and
active. Although the VM that you want to move can be offline, the server it is hosted on must
be online. If this criteria is met, then using LPM on an active VM does not result in any
downtime. If that VM crashes, LPM does not automatically restart the VM or migrate it to
another server because manual intervention is required. Therefore, for applications and VMs
where a basic form of planned HA is required, using just LPM is sufficient. However, it is not
sufficient when automatic mobility is required or recovery from an offline source server.
As with LPM, SRR requires the destination server to see the same shared storage as the
source server to rebuild the VM. The key difference between SRR and LPM is that the source
server must be offline so that the rebuild is successful.
The VM Recovery Manager HA solution implements recovery of the VMs based on the VM
restart technology. The VM restart technology relies on an out-of-band monitoring and
management component that restarts the VMs on another server when the host infrastructure
fails. The VM restart technology is different from the conventional cluster-based technology
that deploys redundant hardware and software components for a near-real-time failover
operation when a component fails. The cluster-based HA solutions are commonly deployed to
protect critical workloads.
The VM Recovery Manager HA solution is ideal to ensure HA for many VMs. Additionally, the
VM Recovery Manager HA solution is easier to manage because it does not have clustering
complexities.
The VM Recovery Manager HA solution provides the capabilities that are described in the
following sections.
Unplanned HA management
During an unplanned outage, when the VM Recovery Manager HA solution detects a failure
in the environment, the VMs are restarted automatically on other hosts. You can also change
the auto-restart policy to advisory mode. In advisory mode, failed VMs are not relocated
automatically, but instead email or text messages are sent to the administrator. An
administrator can use the interfaces to manually restart the VMs.
Planned HA management
During a planned outage, when you plan to update FW for a host, you can use the LPM
operation of the VM Recovery Manager HA solution to vacate a host by moving all the VMs
on the host to the remaining hosts in the group. After the upgrade operation is complete, you
can use the VM Recovery Manager HA solution to restore the VM to its original host in a
single operation.
Advanced HA policies
The VM Recovery Manager HA solution provides advanced policies to define relationships
between VMs, such as collocation and anticollocation of VMs, the priority in which the VMs
are restarted, and the capacity of VMs during failover operations.
Introduction
Implementation of scripts for switching IP addresses when moving an IBM AIX VM between
sites can pose a challenge because AIX configures the IP address early in the startup
process. You can update the IP address after the AIX VM is running and operational at the
DR site, but that means the primary site IP address is active, even for a short period, at the
DR site. This situation can be problematic for network switches, custom applications that
depend on the network environment, and the application configuration.
One method to meet this challenge is by using the AIX ghostdev system flag. This flag is
primarily used for duplicating AIX images to multiple environments. With this flag, you can
make an AIX mksysb image of a VM and use that image to build more VMs on other systems.
When the AIX operation system comes upon this new VM, AIX detects that it is on another
system and automatically removes the host name and IP address configuration as it comes
up on the new host. In addition, the flag also removes the volume group (VG) information and
changes devices, such as host bus adapters (HBAs), to their default values. Many customers
customize device values for performance reasons and do not want the default values. We
address how to return these values back to their wanted settings.
Even though you do not use the ghostdev flag to make an AIX mksysb image, it is useful
because of its automatic system detection and its ability to prevent using the primary site IP
address at the DR site.
Note: The sample scripts in the VM Recovery Manager DR product and in this publication
are provided AS-IS and without any IBM Support Services.
Example environment
In our example, we use AIX 7.2.2 in our AIX VMs, but any supported version of AIX works.
The environment consists of two sites, “sitert11” and “sitert12”. The AIX VMs are configured
such that when the VMs are in sitert11, they have a different IP address and subnet than on
site sitert12. We demonstrate by using one AIX VM. The VM has a host name of rt11006
when the VM is on host rt11, and a host name of rt11006b when the VM is on host rt12.
The goal here is to make sure that the scripts configure the VM such that when it is on system
rt11, it has the appropriate IP and network configuration for that system. Likewise, when the
VM is moved to system rt12, it must have the appropriate IP and network configuration for
that system.
Figure 1-10 AIX VM before the VM Recovery Manager DR move to the DR site
Figure 1-11 AIX VM after the VM Recovery Manager DR move to the DR site
Without the ghostdev parameter, a copied AIX VM comes up with the previous host name, IP
address, and device configuration. Because those devices no longer exist on the copied VM,
they are in Defined state. We call these devices ghost devices, hence the name of the
ghostdev parameter.
You use the ghostdev setting because you want to prevent the primary site IP addresses from
becoming active, even for a short period, at the DR site. However, the clearing of the VG
information and setting of devices to their default values means that you must use scripts to
return those values to their customize settings.
The machine type and serial number of the host is obtained by running the AIX prtconf
command. The network parameters (for example, IP address, and so on) can probably be
acquired from your network administrator.
Example 1-1 shows an example of using prtconf from the AIX VM to gather system model
and serial number information.
To change the ghostdev parameter and make it active on the system, change the value from 0
to 1 by running the following command:
chdev -l sys0 -a ghostdev=1
Note: If the DR scripts are successfully configured, the AIX VM restarts after the initial
configuration is performed by the AIX DR scripts.
a. Run prtconf to verify the mode and serial number of the current system:
root:rt11006a: / >
# prtconf | grep -iE "System Model|Machine Serial"
System Model: IBM,8286-42A
Machine Serial Number: 2100E5W
b. Run hostname to verify the VM host name:
root:rt11006a: / >
# hostname
rt11006a
c. Run netstat to verify VM network routing table:
root:rt11006a: / >
# netstat -rn
Routing tables
Destination Gateway Flags Refs Use If Exp Groups
Introduction
The implementation of scripts for switching IP addresses when moving a VM between sites is
specific to Linux implementations because Linux typically uses the systemd processes for
starting up resources. Using systemd for starting Linux system processes has many
advantages due to its ability to start processes in parallel. When trying to implement a
process that depends on other processes, you must understand how and where to insert your
process.
We do not describe in detail what systemd does or how it is configured in this book, but we
provide enough information to our scripts that change the IP addresses depending on which
site the VM is.
Systemd consists of multiple components, of which a unit is one. Systemd manages units and
can enable and disable them. Units are described by unit files, which are contained in the
/etc/systemd/system directory. There are many types of units that systemd manages, and
here you focus on the .service unit. The .service unit describes how to manage an
application on the system. The .service unit describes how to run an application or script on
the system. This unit updates the IP address of the VM.
Note: For purposes of brevity, error checking is not included in the scripts in this chapter,
but it is a best practice to include error checking in your scripts to determine whether the
commands ran successfully.
In addition, the sample scripts in the IBM VM Recovery Manager DR product and in this
publication are provided as is and without any IBM Support services.
The serial number of the host is obtained from the file /proc/device-tree/system-id and can
be displayed on a Linux CLI by running cat /proc/device-tree/system-id. The network
parameters (for example, IP address, and so on) can be acquired from your network
administrator.
Example environment
In our example hardware and software setup, we use Red Hat Enterprise Linux (RHEL)
server release 7.6 (Maipo). The environment consists of two sites, sitert11 and sitert12.
The Linux VMs are configured such that when the VMs are in sitert11, they have a different IP
address and subnet than on site sitert12. We demonstrate that setup by using one Linux VM.
The VM has a host name of rt11009 when the VM is on host rt11, and it has a host name of
rt11009b when the VM is on host rt12.
The goal here is to make sure that the scripts configure the VM such that when it is on system
rt11that it has the appropriate IP and network configuration for that system. Likewise, when
the VM is moved to system rt12, it must have the appropriate IP and network configuration for
that system. For more information, see Figure 1-12 and Figure 1-13 on page 32.
Figure 1-12 Linux VM before the VM Recovery Manager DR move to the DR site
We then create copies of system files by using the serial number as part of the name to
identify which network file applies to which host. In our case, we use serial number identifiers
“020607585” for host rt11 and “022100E5W” for host rt12.
In total, we have all of the files that are shown in Table 1-6 and Table 1-7.
Creating the script to copy appropriate customized file to the system file
Create the script that copies the appropriate customized files to the system files. This script is
named /usr/sbin/vmr_dr_ip_conf.sh. This file uses the Bash shell, and its full content is as
follows:
#!/bin/bash
#
SERIAL=$(/usr/bin/cat /proc/device-tree/system-id | /usr/bin/awk -F, '{print $2}')
cp -p /etc/sysconfig/network-scripts/$SERIAL.ifcfg-net0
/etc/sysconfig/network-scripts/ifcfg-net0
cp -p /etc/$SERIAL.hostname /etc/hostname
This file copies each of the customized system files to the appropriate standard system file
depending on which host (by serial number) the VM is operational.
Before=network-pre.target
Wants=network-pre.target
DefaultDependencies=no
Requires=local-fs.target
After=local-fs.target
ExecStart=/usr/sbin/vmr_dr_ip_conf.sh
RemainAfterExit=yes
[Install]
WantedBy=network.target
By using this file, the vmr_dr_ip_conf service runs before the network services start, and it
customizes the system files with the appropriate parameters for the host on which it is
running.
Verifying that the IP address scripts change the IP addresses during the
DR event
Now that the DR scripts are configured and the vmr_dr_ip_conf systemd service is set up and
operational, the environment is ready to be tested by moving the Linux VM to the DR site.
Note: When a specific concept, term, or abbreviation is used for the first time, it is
emphasized in italics and the surrounding statement can be considered as its definition.
2.1.1 Overview
VM Recovery Manager HA can be deployed across an existing PowerVM virtualized
infrastructure that consists of Hardware Management Consoles (HMCs), managed systems,
Virtual I/O Servers (VIOSs), and virtual machines (VMs). Figure 2-1 presents a generic
architecture overview diagram for the VM Recovery Manager HA solution. The term host is
used from now on for the physical POWER managed systems. More POWER hosts that are
in the same data center and sharing common network connectivity and storage access
configuration can be placed together in a host group so that if one host has an issue, the
affected VMs are relocated on neighboring healthy hosts in the group.
The core component of the solution is the KSYS controller, which is a piece of software
running on a dedicated AIX logical partition (LPAR). The KSYS controller acts as an
orchestrator by monitoring continuously the managed infrastructure. When a failure is
detected, it makes VM relocation decisions and acts by restarting VMs on different hosts.
Configurable user policies are accounted for during relocation. The limit of 24 VIOSs results
in up to 12 managed systems per host group for typical dual-VIOS configurations. Up to four
host groups are supported by a KSYS controller in large environments. If needed, you can
also deploy multiple KSYS controller independent instances, each responsible for its separate
group of managed systems.
Host monitoring
Managed hosts are monitored continuously for failures. When the KSYS controller concludes
that a host failure happened, all the VMs previously running on the failed host are
automatically restarted on neighboring healthy hosts. The decision that a host failure
happened is made by assessing the health state of the VIOSs running on the host. Both
VIOSs must be declared unhealthy based on specific logic to conclude that a host failure
occurred.
For more information about the specific logic, see “Host relocation decision” on page 95.
For more information about the decision process, see “VM relocation decision” on page 96.
Application monitoring
An application management framework, component of the installed VM Agent monitors the
application by using a user-provided monitoring script. This local framework can restart the
application by using user-provided stop and start scripts over a user-defined number of times.
But, if this repeated local application restart does not solve the problem, then it is considered
a permanent application failure and the KSYS controller can take over.
The KSYS controller monitors all applications that are configured as critical by the user. When
such a critical application faces a permanent application failure, the KSYS controller restarts
the hosting VM. KSYS first restarts the VM on the same host so that the VM Agent can regain
control for a new sequence of local application restart cycles. If the permanent application
failure state is reached again, the local VM restarts and the inner application restart cycle is
repeated for a limited pre-configured number of times. When the pre-configured restart limit
count is reached, then the VM is restarted on a different host.
For more information about the application management framework and critical application
management topics, see 3.7.2, “Application Management Engine” on page 312.
For more information about the HA policies, see 3.6, “Setting up HA policies” on page 297.
For more information, see Chapter 4, “IBM VM Recovery Manager High Availability GUI
deployment” on page 323.
Figure 2-2 shows that from a high-level installation and deployment perspective, the product
consists of three primary components:
KSYS controller
VM Agent
GUI server
KSYS controller
The KSYS controller acts as an overall orchestrator for the environment. You can refer to the
KSYS controller as the KSYS subsystem or KSYS. The registered infrastructure is monitored,
and if a failure is detected, the KSYS analyzes the situation, notifies the administrator, and
can automatically relocate the affected VMs to other healthy hosts. The KSYS interacts with
the registered HMCs, hosts, and VIOSs to collect configuration and status information about
their managed resources.
The KSYS controller runs on an AIX LPAR and can be separated from the managed
infrastructure to avoid simultaneous KSYS and managed infrastructure outage. The KSYS
remains operational even when parts of the managed infrastructure fail. Ensure that the AIX
instance running the KSYS subsystem is monitored by local monitoring and incident
management solutions. You can periodically check the KSYS subsystem health either by
using the CLI or the VM Recovery Manager HA GUI dashboard.
GUI server
The administrator can operate the tool either directly on the KSYS controller by using CLI
commands in terminal sessions or the web browser through the GUI server. GUI-based
access offers a user-friendly management alternative to the CLI approach, and it can act as a
single point of control for multiple KSYS instances within the environment. The GUI server
topic is covered in Chapter 4, “IBM VM Recovery Manager High Availability GUI deployment”
on page 323.
After this preliminary host group setup is complete, run a simple host group discovery
command. The host group discovery scans and collects the required input and performs all
the required settings. Allow the discovery to complete and do not manually intervene while it
is running. Access the HMC and VIOS only if errors are detected after the discovery process.
Consider an initial discovery that successfully completed for a host group. New components
show up or are activated, as shown in Figure 2-3 on page 44, in addition to those in 2.2.1,
“Simplified component model” on page 42. We introduce these new components as follows:
Host monitor
Shared Storage Pool cluster and Health Status Database
Virtual switch
VM monitor and Application Monitoring Framework
Host monitor
The KSYS accesses the HMCs and activates the host monitor (HM) daemon on each VIOS.
The HM code is included as part of the VIOS V3.1 enhancements, so any VIOS 3.1 or later
installation has it. To see whether any specific update must be applied, see IBM Knowledge
Center. The HM daemon is activated on a VIOS when the user performs the first KSYS
discovery operation for the host group containing the host where the VIOS is. The HM is
ready to communicate with the KSYS controller through the HMC and VIOS.
A key role of the HM is to monitor the heartbeat coming from the VMs running on the host and
to update KSYS subsystem. HM also intermediates communication between the VM Agent
and KSYS.
A shared file system space and database are created on the shared disk for SSP
management. KSYS creates its own schema inside this SSP management database. Health
status information about VMs, HMs, and VIOSs of all hosts on the host group is then saved
and kept inside that schema. This database is called the Health Status Database (HSDB).
When the KSYS controller retrieves a piece of data by querying the HSDB, it uses a single
VIOS in the host group and does not need to interrogate each VIOS separately. KSYS polls
the HSDB every 20 seconds for health status updates. The retrieved updates are empty
under normal operating conditions. Only when something is not healthy is information
returned to KSYS.
This is the bare minimum HA management that you obtain if you initialize only KSYS,
configure the HMCs, and create the host group. KSYS initiates health management by using
the deployed SSP cluster and HSDB, and it is ready to determine any subsequent host
failure. This host monitoring is independent of any VM running on the host itself. A second
thing that you can do is to install the VM Agent. It is optional, but if you install and configure
this VM Agent on your VMs, you get some extra benefits, as explained “Virtual switch” and
“VM monitor and Application Monitoring Framework” on page 47.
Virtual switch
During a host group discovery, if at least one VM on a member host is enabled for monitoring,
then KSYS creates a virtual switch in each host hypervisor. One trunk Virtual Ethernet
adapter is created on each VIOS and connected to the new virtual switch. The trunk adapters
have different virtual LAN (VLAN) IDs (101 on one VIOS and 102 on the other VIOS). During
discovery, two standard Virtual Ethernet adapters are also created on each VM that is
enabled for monitoring. One Virtual Ethernet adapter has VLAN ID 101 and the other Virtual
Ethernet adapter has VLAN ID 102. The virtual switch is configured in Virtual Ethernet Port
Aggregator (VEPA) mode to isolate the VMs from each other. Each VM can communicate
only with the trunk ports on the VIOS side and cannot reach the other VMs through this VEPA
mode switch.
Figure 2-4 shows a virtual switch and related Virtual Ethernet adapters on a host with three
VMs, all enabled for monitoring at the KSYS level.
By using this protocol, you get a complete communication path between the KSYS controller
on one side and each VM Agent with its monitored applications on the other side.
Redundancy is provided by an intrinsic design that is based on double components at the
HMC and VIOS levels. Security is ensured by the virtual switch VEPA mode, which isolates
the VM private interfaces.
For more information about the VM Agent structure and functions, see 2.2.8, “VM Agent
architecture” on page 164.
Assuming that the KSYS cluster is successfully installed and initialized, then its state can be
Online, as shown in Example 2-1.
Immediately after registering an HMC, you can see the universally unique identifier (UUID)
for each of its managed systems, as shown in Example 2-2. For dual-HMC configurations,
managed systems appear once under each managing HMC.
Name: rthmc3
Ip: 9.3.18.159
Login: hscroot
Example 2-3 shows the VIOSs running on the hosts that we added in our test KSYS cluster.
Name: rt13-8286-42A-21E0B2V
UUID: 4462c37c-65c6-3614-b02a-aa09d752c2ee
FspIp: Must run discovery first to populate
Host_group: No host_group defined
VIOS: rt13v2
rt13v1
HMCs: rthmc3
HA_monitor: enable
VM_failure_detection_speed: normal
When a host has more than two VIOSs, you must keep in the KSYS configuration only two of
the VIOSs to provide access to the shared storage resources. The other VIOSs must be
explicitly unregistered from the KSYS cluster instance, as suggested in Example 2-4.
The host group of our two hosts is created, as shown in Example 2-5.
When we installed and started the VM Agent on the VMs, we decided to keep them as
managed in our configuration. Then, we performed the initial host group discovery and
verification operations. Example 2-6 shows the output of the initial discovery. The SSP was
created, but there is nothing about the virtual switches and trunk adapters.
Example 2-8 Trunk adapter creation at host group discovery after enabling monitoring
# ksysmgr modify system ha_monitor=enable
KSYS ha_monitor has been updated
# ksysmgr modify vm
rt13001,rt13002,rt13005,rt13007,rt14001,rt14002,rt14005,rt14007 ha_monitor=enable
For VM rt13001 attribute(s) 'ha_monitor' was successfully modified.
...
For VM rt14007 attribute(s) 'ha_monitor' was successfully modified.
# ksysmgr discover host_group rbHG verify=yes
Running discovery on Host_group rbHG, this may take few minutes...
Creating HA trunk adapter for VIOS rt14v2
Finished creating HA trunk adapter for VIOS rt14v2
Creating HA trunk adapter for VIOS rt13v2
Finished creating HA trunk adapter for VIOS rt13v2
Creating HA trunk adapter for VIOS rt14v1
Finished creating HA trunk adapter for VIOS rt14v1
Creating HA trunk adapter for VIOS rt13v1
Finished creating HA trunk adapter for VIOS rt13v1
Preparing VIOS in rt13-8286-42A-21E0B2V for HA management
VIOS in rt13-8286-42A-21E0B2V prepared for HA management
Preparing VIOS in rt14-8286-42A-SN2100DEW for HA management
VIOS in rt14-8286-42A-SN2100DEW prepared for HA management
Discovery has started for VM rt13001
Configuration information retrieval started for VM rt13001
...
Configuration information retrieval completed for VM rt14007
Discovery for VM rt14007 is complete
Discovery has finished for rbHG
8 out of 8 managed VMs have been successfully discovered
Example 2-9 shows the vSwitch and Virtual Ethernet adapter connectivity details of an AIX
VM and its underlying VIOSs after the new discovery operation.
ent0
Port VLAN ID: 1
ent1
Port VLAN ID: 101
Switch ID: rbRMHA_VSWITCH
ent2
Port VLAN ID: 102
Switch ID: rbRMHA_VSWITCH
#
The VEPA mode of the switch ensures complete isolation between connected VM Virtual
Ethernet adapters. They have connectivity only with the VIOS trunk adapter.
RMC is a subsystem of RSCT that is a generalized framework for managing, monitoring, and
manipulating system resources. By system resources, we mean various physical or logical
entities that are represented in an abstracted way in the RMC framework. Any resource
abstraction, simply called resource, has attributes that describe configuration and state
characteristics of the modeled entity and actions to model behavior and changes, like in usual
object-oriented approaches. Resources with similar characteristics are instances of
predefined resource classes.
You can configure ctrmc daemons on multiple systems to communicate with each other in a
cluster configuration. With such a setup, you can manage and monitor resources of all
systems (nodes) in the cluster. There are two cluster setups that are available: peer domain
clusters and management domain clusters.
A peer domain is a set of nodes that have a consistent knowledge of the existence of each
other and of the resources that are shared among them. In a peer domain setup, RMC
activates and uses core RSCT subsystems such as Topology Services, Group Services, and
cluster security services subsystems.
A management domain is defined as a set of nodes whose resources can be managed and
monitored from one of the nodes, which is designated as the management control point
(MCP). All other nodes are considered to be managed nodes. Topology Services and Group
Services are not used in a management domain.
The registry component provides data persistency and consistency services. Persistent data
is stored in local registry files. In cluster configurations, data consistency is ensured by
maintaining the local registry replicas in sync on all cluster nodes.
An RMC client application can call API subroutines directly or can use command utilities,
which are RMC clients themselves. A resource manager implements its own functions so that
the RMC daemon can pass requests from client applications.
A resource manager daemon can create and initialize its own specific in-memory constructs,
mainly Resource Control Points (RCPs) and Resource Class Control Points (RCCPs) to
handle its managed resources and resource classes in the registry. RCPs and RCCPs are
instantiated objects of a class type that are initialized by a constructor and expose their
functions by implemented methods. A resource manager itself might have a Resource
Manager Control Point (RMCP) object.
Resource managers use RSCT multi-page tracing capability for logging purposes. Code
inside resource managers can send trace records (relevant to an internal functional context or
decided by some criteria to be grouped) into a common logical trace file that is referred by a
base name. Multiple distinct base names can be used. The tracing facility writes these
records into a set of physical trace files that have the same base name and suffixes in the
form of .number.sp. These physical files are referred to as trace page files and are written in a
typical rotating manner.
When the last page file that is associated with a logical trace file is full, the trace records are
written to the first page file. The fixed number of these trace files limits the history information
that is available for debugging purposes. If needed, you can modify the number and size of
the page files. Additionally, you can activate the trace spooling feature to copy the completed
files to another directory on the system where you have more space that is available. Trace
spooling can be turned on or off. Manual or automated (crontab based or similar) cleanup
maintenance is needed to avoid file system full situations for the spooling repository.
We show the local AIX resource in Example 2-10 on page 53 that represents the HMC MCP.
This management domain setup supports HMC and LPAR operations, like graceful shutdown;
DLPAR; LPM; and others. In a dual-HMC configuration, there are two management domains.
For more information about the RMC framework, see the IBM Knowledge Center and A
Practical Guide for Resource Monitoring and Control (RMC), SG24-6615. These resources
cover the default AIX resource managers and associated resources and classes that you
have seen so far in the example outputs of this section.
Let us now examine how the RMC framework is used by VM Recovery Manager HA to
implement its KSYS controller function.
KSYS daemon is implemented in the form of the IBM.VMR resource manager. Figure 2-6 on
page 57 shows the KSYS controller components that are interconnected within the RMC
framework:
The ksysmgr utility
IBM.VMR resource manager
Resource and Class Registry
The RMC resource manager IBM.VMR implements its log files by using the common RSCT
trace facility. Example 2-15 shows the default trace setup for the IBM.VMR resource manager.
Example 2-15 Default tracing setup for the IBM.VMR resource manager
# lssrc -ls IBM.VMR|grep -p trace
Information about trace levels:
...
_VMR Errors=255 Info=1 LPAR=4 Verify=4 KREST=5 KRI=4
FDE=4 FDELong=4 CONFLICT=4 MSG=4 Debug=0
/var/ct/3EmzyyOpynuvLXYdEkdZSO/log/mc/IBM.VMR/trace.1.sp -> spooling not enabled
/var/ct/rbRMHA/log/mc/IBM.VMR/trace.conflict.1.sp -> spooling not enabled
/var/ct/rbRMHA/log/mc/IBM.VMR/trace.fde.1.sp -> spooling not enabled
/var/ct/rbRMHA/log/mc/IBM.VMR/trace.fdelong.1.sp -> spooling not enabled
/var/ct/rbRMHA/log/mc/IBM.VMR/trace.krest.1.sp -> spooling not enabled
There are nine distinct logical trace files with a specific base name, each with its
preconfigured number of physical page trace files. The product documentation mentions five
of them (user, ksys, fde, krest, and krestlong), which are used for problem determination.
The user and ksys files contain both information about the functions that are run for each
operation, but ksys provides more details. When investigating a problem, start by checking
the user log file and, if needed, go to ksys. Similarly, trace.fdelong.*.sp and
trace.fde.*.sp are the detailed and the abridged trace files for the Failure Detection Engine
(FDE), which is a core functional module that has its own dedicated trace files.
The KSYS daemon communicates with HMCs and VIOSs in the infrastructure by using a
wrapper library for the HMC Representational State Transfer (REST) API that is called
libkrest. Inputs, outputs, and errors of libkrest calls are traced by dedicated high-level
(trace.krest.*.sp) and detailed trace (trace.krestlong.*.sp) files.
Typical trace file records have as their third field the pthread ID (pTID) of the thread logging
the record. Example 2-16 shows sample records taken from the trace.user.*.sp files (after
these are formatted by the rpttr -odict /var/ct/rbRMHA/log/mc/IBM.VMR/trace.user*
command).
We also analyzed the ksys trace files looking for the most active threads during the daemon
startup. The result of this analysis also influenced the selection of the threads that are kept as
relevant for the procstack output, as shown in Example 2-17 on page 60. The analysis is
detailed in 2.2.5, “KSYS daemon restart exercise” on page 62. The field T(90a) of the last
record in Example 2-16 on page 59 means that the record itself is written by the thread with
pTID 2314, as shown in Example 2-17 on page 60, with the 90a value in parentheses being
the hexadecimal representation of decimal 2314.
Example 2-18 shows the kernel thread ID (TID) and the user space pTID decimal value as
listed by the procstack command for some of the threads that are shown in Example 2-17 on
page 60. In the last column, we added the matching thread pTID hexadecimal value.
A thread can log records in any logical trace file. The main thread (pTID 1), for example,
writes at the daemon startup a time stamp message with the daemon process ID (PID) in
each trace file. The FDE thread (having the distinctive FDEthread() function invoked) is
writing with predilection in the trace.fde.*.sp files, as shown in Example 2-19, but it can also
log records in other trace files.
Again, the number in the third field, T(9598), repeated here in each line, is nothing else than
the hexadecimal value for the decimal 38296 pTID, which you saw in as the thread identifier in
the procstack output of Example 2-17 on page 60. The rest of the record is self-explanatory:
date, time, source file plus the line of the code with the instruction throwing the message, and
the descriptive message itself.
Note: In a peer domain configuration with multiple nodes, a registry replica is kept
consistently on each node by the RMC framework. The KSYS subsystem supports only
one node peer domain, but the framework is prepared for multiple node extensions.
We expect that a restart of the KSYS daemon ends with a process memory context and
thread layout similar to what was there before restart.
It is not that difficult to compare thread stack snapshots that are taken before and after restart
as a first confirmation. Example 2-22 presents the thread stack snapshot that is taken after a
restart.
We see that the same layout that is shown in Example 2-17 on page 60 is created after
restart with the number of threads as 142, among which are one event handler (starting
eventHandler()), one FDE thread (starting FDEthread()), one scheduler thread (starting
rsct_rmf3v::RMSchedule::run()), and the pool of 128 task handlers (starting
threadHandler() and TaskQueue::getTaskFromQueue()), as expected.
Example 2-23 Collecting the content of the ksys and fde trace files after the daemon restarts
# lssrc -a|grep VMR
IBM.VMR rsct_rm 16646518 active
# ksysmgr trace log=fde > fde.log
# cat fde.log|sed '/Trace Started - Pid = 16646518/,$!d' > post-restart_fde.log
# head -1 post-restart_fde.log
[06] 11/19/18 T( 1) ____ 13:48:35.140684 ******************* Trace Started - Pid
= 16646518 **********************
# tail -1 post-restart_fde.log
[06] 11/19/18 T(9192) _VMR 13:53:37.658738 DEBUG FDEthread.C[709]: Sleep for 20
sec. sleepCounter 13
# ksysmgr trace log=ksys > ksys.log
# cat ksys.log|sed '/Trace Started - Pid = 16646518/,$!d' > post-restart_ksys.log
# head -1 post-restart_ksys.log
[15] 11/19/18 T( 1) ____ 13:48:35.161478 ******************* Trace Started - Pid
= 16646518 **********************
# tail -1 post-restart_ksys.log
[15] 11/19/18 T(9192) _VMR 14:01:23.787096 DEBUG VMR_HMC.C[6938]: getQuickQuery
[315CE56B-9BA0-46A1-B4BC-DA6108574E7E] JobStatus: COMPLETED_OK, ReturnCode: 0
#
The ksysmgr trace log=fde and ksysmgr trace log=ksys commands in Example 2-23 are
the KSYS equivalents of the RSCT commands rpttr -odict
/var/ct/rbRMHA/log/mc/IBM.VMR/trace.fde* and rpttr -odict
/var/ct/rbRMHA/log/mc/IBM.VMR/trace.ksys*. Running the sed command deletes the rows
before the distinctive record signaling daemon restart with the new daemon PID (pattern
Trace Started - Pid = 16646518). From the time stamps in the first and last row of the
truncated log files, we see that the traces are from the first minutes after restart (12 minutes
for ksys).
Example 2-24 Thread writing frequency in the ksys trace file after restart
# cat post-restart_ksys.log|wc -l
1203
# grep "11/19/18 T(" post-restart_ksys.log|grep "T( 1)"|wc -l
5
# for t in `grep "11/19/18 T(" post-restart_ksys.log|grep -v "T( 1)"|while read
f1 f2 f3 others; do echo $f3; done|sort -u`; do echo "$t \c"; grep "11/19/18 T("
post-restart_ksys.log|grep -v "T( 1)"|while read f1 f2 f3 others; do echo $f3;
done|grep -c $t; done|sort -nrk2
T(304) 626
T(9192) 66
T(90a) 66
T(4041) 56
T(8788) 50
T(8687) 50
Table 2-1 shows decimal pTID values in the second column. The last column shows the
distinctive call that is started in the thread’s stack.
Table 2-1 Writing frequency of threads showing up in the ksys trace file after daemon restart
Log pTID pTID Trace Distinctive thread stack call taken from procstack
(hex) writes output
T( 1) 1 5 main()
Example 2-25 RCP and RCCP traces in the ksys trace file after the daemon restart
# grep -i rcp post-restart_ksys.log|wc -l
404
[# grep -i rccp post-restart_ksys.log|wc -l
112
# grep -i rccp post-restart_ksys.log|head -4
[15] 11/19/18 T(304) _VMR 13:48:38.197869 VMR_SITERccp::VMR_SITERccp Entered
[15] 11/19/18 T(304) _VMR 13:48:38.198199 VMR_SITERccp::VMR_SITERccp Leaving
[15] 11/19/18 T(304) _VMR 13:48:38.198395 VMR_HMCRccp::VMR_HMCRccp Entered
[15] 11/19/18 T(304) _VMR 13:48:38.198452 VMR_HMCRccp::createRcp Entered
#
The occurrences of the first matching string, VMR_SITERccp, suggest that a constructor of a
VMR_SITERccp class ran. Searching for other VMR_SITERccp occurrences produces the output
in Example 2-26.
In Example 2-27, we then filter all output that is logged by the VMR_HMCRccp::VMR_HMCRccp
constructor, which corresponds to the next RCCP match in Example 2-25 on page 66.
Two HMC RCPs are created inside the HMC RCCP constructor, one for each HMC in the
infrastructure.
The Entered and Leaving strings appear frequently in our previous examples as start and end
markers for class constructors and methods. So, we use this technique to delimit the log
sections of each RCCP constructor. Example 2-28 shows the results of this technique for the
Rccp Entered and Rccp Leaving patterns.
You can easily associate the RCCPs that are shown in Example 2-28 on page 67 with their
counterpart resource classes, which re identified in Example 2-14 on page 58.
With the last four sed commands, we check that there are no other threads than T(304) logs
in the line range 13 - 660. Thread T(304) logs 602 records as counted by the grep -c
"T(304)" command. Then, the remaining 46 lines prove to be 45 white space lines plus one
line consisting of only the “Added VM's List =” string. So, only T(304) is logging all these
entries in the sequence. By the row number of each start and end marker, we see that RCCP
constructors are run one after the other and log their messages in subsequent compact
sections.
The largest RCCP constructor section is the largest section of the LPAR RCCP,
VMR_LPARRccp::VMR_LPARRccp, from lines 90 - 549 because one LPAR RCP is created for
each of the 16 existing VMs, as shown in Example 2-29.
The CEC RCCP constructor generated records in lines 38 - 88. Example 2-30 on page 69
lists only the first 10 lines in the range.
Notice the message about the creation of the pool with 128 threads. An RCP is created for
each host that is registered as managed by KSYS.
The next log section in the sequence is about the host group RCCP constructor.
Example 2-31 shows the complete details of this section.
The VMR_HGRccp::createRcp method is entered once, and inside this method the
VMR_HGRcp::VMR_HGRcp RCP constructor is also entered only one time. We defined only the
rbHG host group in our example, so these log entries apply to this unique host group.
Among the various configuration and state attributes being initialized, we notice that an FDE
thread is referred to is inside the VMR_HGRccp::createRcp call (setting its spawnFDE to 1). We
also notice that an EventProcessor thread is created (evtThID is 37009), which, by pTID, is
exactly our earlier assumed event thread Handler, that is, the one with the distinctive
eventHandler() function launched in its stack. We derive from these findings that dedicated
FDE and event thread Handlers are created for each existing host group.
The last relevant action that we notice in the HGRccp creation log is the scheduling of a
resyncApps action (ResyncAPPs Scheduler added), which is supposed to be run once and with
a delay parameter of 10000. The value appears to be in milliseconds because 10 seconds
later the scheduler thread, T(203), calls the HG_ResyncShed::callback function followed by
the VMR_HGRcp::resyncAppInfo method. These calls are logged at 13:49:36.234278 (see ksys
log lines 761 - 762 in Example 2-42 on page 80) and in Example 2-31 on page 69,
DSchedCb::schedOnce is called at 13:49:26.231643, which is a 10-second difference.
The next RCCP section in the sequence is for the SSP, and it is shown in Example 2-32.
Again, only one SSP RCP is created because we have only one host group that is defined, so
there is only one SSP.
The last RCCP section in the sequence is for the managed applications (Example 2-33).
Only one application RCP is created because we have only one application that is defined
(Example 2-34).
Example 2-35 Scheduler records in the ksys trace file after daemon restart
# sed '/VMR_APPRccp::VMR_APPRccp Leaving/,$!d;=' post-restart_ksys.log |sed
'N;s/\n/:/'|head -25
660:[15] 11/19/18 T(304) _VMR 13:49:26.232224 VMR_APPRccp::VMR_APPRccp Leaving
661:
662:[15] 11/19/18 T(304) _VMR 13:49:26.232225 DEBUG VMR_APP.C[274]: Leaving VMR_APPRccp
663:[15] 11/19/18 T(304) _VMR 13:49:26.232226 VMR_HMCRccp::schedule Entered.
664:[15] 11/19/18 T(304) _VMR 13:49:26.232228 DSchedCb::schedPeriodic starting HMC Ping
Interval=10
665:[15] 11/19/18 T(304) _VMR 13:49:26.232230 VMR_HMCRccp::schedule Leaving, pCycle=10.
666:
667:[15] 11/19/18 T(304) _VMR 13:49:26.232231 DEBUG VMRRmcp.C[241]: VMRRmcp Init, START
After finishing with the last RCCP, as part of a VMRRmcp Init piece of code, a periodic
QuickCheck action is scheduled at 1 hour and a DetailCollectOnce action is scheduled at an
epoch time. Converting the epoch time 1542693600 to a human-readable format, we see that it
is the next midnight moment (local CST time, Greenwich mean time: Tuesday 20 Nov 2018
6:00:00 AM) compared with the execution moment (11/19/18 T(304) _VMR
13:49:26.232288). The scheduling is done at the VMR_SITERccp method level, so we conclude
it applies for all host groups in the environment. This is the scheduling setup for the periodic
Quick Discovery and for the Deep Discovery tasks. For more information, see “Quick
Discovery and Deep Discovery scheduling records” on page 72.
We also see from the last command in Example 2-35 on page 72 that the last record that is
generated by the T(304) thread in our ksys log is about a global initialization of the krest
library module in line 682 (Calling kri_global_init.). So after line 682, we expect records
that are logged by the other threads: the FDE thread, event thread Handler, task threads in
the pool, scheduler thread, and so on.
Review the restart moment and the initial records that are logged in the ksys log up until the
first RCCP is created (Example 2-36).
You can easily count the first five entries that are posted by the main thread (T( 1)), which
matches the number of traces that is left by the main thread in our whole ksys log excerpt, as
shown in Table 2-1 on page 65.
The main thread then hands off the execution control to the created thread T(304). This
created thread logs six initial records, and then handles the whole sequence of RCCPs and
RCPs initialization, as shown in Example 2-28 on page 67. We counted 602 records that are
signed by T(304). The number of records that are signed by T(304) in Example 2-35 on
page 72 after the return from the last RCCP constructor is 18 (19 - 1). Adding up all these
occurrences, we get exactly the number for T(304) that is shown in Table 2-1 on page 65
(6+602+18=626).
Example 2-38 Initial records in the fde trace file after daemon restart
# sed '1,$!d;=' post-restart_fde.log|sed 'N;s/\n/:/'|head -25
1:[06] 11/19/18 T( 1) ____ 13:48:35.140684 ******************* Trace Started -
Pid = 16646518 **********************
2:[06] 11/19/18 T(9192) _VMR 13:49:26.243690 DEBUG FDEthread.C[69]: Entering
3:[06] 11/19/18 T(9192) _VMR 13:49:26.265358 DEBUG FDEthread.C[127]: Monitoring
Enabled for HG rbHG.
4:[06] 11/19/18 T(9192) _VMR 13:49:26.265375 DEBUG FDEthread.C[145]: CEC is
4462c37c-65c6-3614-b02a-aa09d752c2ee
5:[06] 11/19/18 T(9192) _VMR 13:49:26.265383 DEBUG FDEthread.C[158]: VIOS
315CE56B-9BA0-46A1-B4BC-DA6108574E7E in CEC
6:[06] 11/19/18 T(9192) _VMR 13:49:26.265391 DEBUG FDEthread.C[208]: Use VIOS
rt13v2: 315CE56B-9BA0-46A1-B4BC-DA6108574E7E for polling
7:[06] 11/19/18 T(9192) _VMR 13:49:26.265394 DEBUG FDEthread.C[285]: Current scan
[ 1 ]
8:[06] 11/19/18 T(9192) _VMR 13:49:26.265412 DEBUG VMR_VIOS.C[6134]:
setCAAtopology
9:[06] 11/19/18 T(9192) _VMR 13:49:26.265424 DEBUG VMR_VIOS.C[6134]:
setCAAtopology
10:[06] 11/19/18 T(9192) _VMR 13:49:26.265454 DEBUG VMR_VIOS.C[6134]:
setCAAtopology
11:[06] 11/19/18 T(9192) _VMR 13:49:26.265463 DEBUG VMR_VIOS.C[6134]:
setCAAtopology
12:[06] 11/19/18 T(9192) _VMR 13:49:26.265471 DEBUG VMR_HG.C[11664]: FDE
performing doQuickQuery to 315CE56B-9BA0-46A1-B4BC-DA6108574E7E
13:[06] 11/19/18 T(9192) _VMR 13:49:26.265475 DEBUG VMR_retry.C[1151]: Doing
operation with opCode: 33(VMDR_QUICK_QUERY)
14:[06] 11/19/18 T(9192) _VMR 13:49:26.265495 DEBUG VMR_retry.C[178]: INFO: Trying
with HMC: rthmc3.
15:[06] 11/19/18 T(9192) _VMR 13:49:26.517072 DEBUG VMR_HMC.C[6905]:
getQuickQuery: Calling kriSubmitQuickQuery!. HMC:9.3.18.159, viosUuid:
315CE56B-9BA0-46A1-B4BC-DA6108574E7E
16:[06] 11/19/18 T(9192) _VMR 13:49:26.657903 DEBUG VMR_HMC.C[6925]:
getQuickQuery: Job submitted. Now doing WaitTillJobCompletion() ..
17:[06] 11/19/18 T(9192) _VMR 13:49:26.657908 DEBUG VMR_HMC.C[3426]: Calling
krigetJobResult(). HMC: 9.3.18.159, jobid: 1540961034191, retCnt = 1
18:[06] 11/19/18 T(9192) _VMR 13:49:27.776247 DEBUG VMR_HMC.C[3426]: Calling
krigetJobResult(). HMC: 9.3.18.159, jobid: 1540961034191, retCnt = 2
19:[06] 11/19/18 T(9192) _VMR 13:49:27.865264 DEBUG VMR_HMC.C[6938]: getQuickQuery
[315CE56B-9BA0-46A1-B4BC-DA6108574E7E] JobStatus: COMPLETED_OK, ReturnCode: 0
20:[06] 11/19/18 T(9192) _VMR 13:49:27.865286 DEBUG VMR_retry.C[345]: In doRetry
function, for opCode = 33(VMDR_QUICK_QUERY), rc = 0, retCode is 0, errstr is:
,retry flag is 22
21:[06] 11/19/18 T(9192) _VMR 13:49:27.865296 DEBUG VMR_HG.C[11671]: FDE
doQuickQuery success to 315CE56B-9BA0-46A1-B4BC-DA6108574E7E
22:[06] 11/19/18 T(9192) _VMR 13:49:27.865298 DEBUG VMR_VIOS.C[6134]:
setCAAtopology
23:[06] 11/19/18 T(9192) _VMR 13:49:27.865303 DEBUG needAttn.C[1566]: START
handleVIOResponse scan [ 1 ].
Comparing the moment when the FDE thread entered its run (13:49:26.243690) with the
moment the event thread Handler entered (13:49:26.242487 in Example 2-37 on page 74),
we see that the FDE thread started 1.2 ms later. The FDE thread logs at its start details about
the VIOS that it uses for polling and then initiates its first scan, which is a QuickQuery. The
steps of the QuickQuery are as follows:
1. This first FDE polling scan is started by starting a doQuickQuery call, which is logged in the
fde trace file at 13:49:26.265471, and also logged as tried by the rthmc3 HMC at
13:49:26.265495.
2. So, the FDE thread logs its first message in the ksys trace file, which is the record about
the empty HMC session found, at 13:49:26.265532 (Example 2-37 on page 74).
3. After the new HMC session is opened at 13:49:26.265536 (Example 2-37 on page 74), the
QuickQuery request is submitted to a VIOS at 13:49:26.517072 (Example 2-38 on
page 76).
4. The response is obtained by job polling, and a record about its completion is logged in the
ksys trace file at 13:49:27.865262.
5. The first FDE doQuickQuery call logs its return at 13:49:27.865296 1.6 seconds after its
invocation.
The whole sequence of HMC and VIOS low-level steps for this first QuickQuery request after
the daemon restart as they are logged in the krest and krestlog trace files is described in
“QuickQuery asynchronous job example” on page 105. Also, a complete analysis of a generic
QuickQuery request flow, including the way its response payload content is parsed and
further processed by the KSYS side, is available in “QuickQuery flow” on page 234.
After the first QuickQuery occurrence at line 705 of our ksys log file and the subsequent two
lines that are shown in Example 2-37 on page 74 (706 - 707), we continue with the next
section of 25 rows from the ksys log, as shown in Example 2-39. All these lines are ksys
records that are logged by the FDE thread about VIOS- and CEC-related initialization actions
performed while parsing and processing this first QuickQuery response payload.
Example 2-39 FDE thread traces in the ksys log after the first QuickQuery
# sed '708,$!d;=' post-restart_ksys.log|sed 'N;s/\n/:/'|head -25
708:[15] 11/19/18 T(9192) _VMR 13:49:27.865476 DEBUG VMR_VIOS.C[4158]:
709:
710: CEC IS UP, READY FOR RELOCATION
711:
712:
713:[15] 11/19/18 T(9192) _VMR 13:49:27.866167 DEBUG VMR_VIOS.C[4172]: Adding
cleanLpars task for the cec 4462c37c-65c6-3614-b02a-aa09d752c2ee
714:[15] 11/19/18 T(9192) _VMR 13:49:27.866172 DEBUG VMR_LPAR.C[11257]: adding
task for clean_lpars
715:[15] 11/19/18 T(9192) _VMR 13:49:27.866180 DEBUG VMR_LPAR.C[11260]: Add task
is done for CLEAN_LPARS
716:[15] 11/19/18 T(9192) _VMR 13:49:27.866198 DEBUG VMR_SITE.C[6261]: INFO:
eventNotify entering. event:VIOS_CAA_STATE_UP_DETECTED, event type:3, comp:VIOS,
notificationLevel:low, dupEventProcess:yes
717:[15] 11/19/18 T(9192) _VMR 13:49:27.866540 DEBUG VMR_SITE.C[7096]: ERROR:
catopen Failed
We can easily identify the following actions, which are performed for each VIOS separately
(starting from line 708 in Example 2-37 on page 74, then going through Example 2-39 on
page 77 and Example 2-40, and finishing at line 758 in Example 2-42 on page 80):
The RelocInitiated flag is set to 0, so the underlying CEC enters the READY FOR
RELOCATION state.
The FDE thread hands off a clean_lpars task to one thread in the pool of task threads.
The eventNotify service is launched for the VIOS_CAA_STATE_UP_DETECTED and
VIOS_HM_RESP_DETECTED events.
Example 2-40 FDE thread traces in the ksys log after the first QuickQuery (continued)
# sed '733,$!d;=' post-restart_ksys.log|sed 'N;s/\n/:/'|head -25
733:[15] 11/19/18 T(9192) _VMR 13:49:27.893914 DEBUG VMR_VIOS.C[4156]: set the
RelocInitiated value for VIOS 166C89EF-57D8-4E75-815C-AA5B885560B1 as 0
734:
735:[15] 11/19/18 T(9192) _VMR 13:49:27.893914 DEBUG VMR_VIOS.C[4158]:
736:
737: CEC IS UP, READY FOR RELOCATION
738:
739:
740:[15] 11/19/18 T(9192) _VMR 13:49:27.894557 DEBUG VMR_VIOS.C[4172]: Adding
cleanLpars task for the cec 6e2f37f5-1824-347a-99fd-488f818086dd
741:[15] 11/19/18 T(9192) _VMR 13:49:27.894562 DEBUG VMR_LPAR.C[11257]: adding
task for clean_lpars
All threads handling the CLEAN_LPARS tasks are listed in Example 2-41. You can easily check
in Table 2-1 on page 65 that they are all from the task thread pool.
Because our test environment has two hosts, each with two VIOSs, we are done with these
side actions that were triggered by the first succeeded FDE QuickQuery. Checking after the
moment FDE logs its last line, 758, as shown in Example 2-42 on page 80, we see a long
interval of 8 seconds during which nothing happens. Then, a new thread, T(203), wakes up at
13:49:36.234247 and starts logging continuously resyncAppInfo records.
The T(203) scheduler thread initiates the VMR_HGRcp::resyncAppInfo method call under
HG_ResyncShed::callback, as shown in Example 2-42. Then, a complex ResyncAPPs action is
performed by multiple threads in parallel.
T(203) is the thread that we assume to be a scheduler one because of its distinctive
rmf3v::RMSchedule::run() invoked function, which shows up in the procstack command
output in Example 2-22 on page 62. All 30 messages this thread logs in our ksys trace file
excerpt, as we counted for T(203) row in Table 2-1 on page 65, appear contiguously in
Example 2-42 on page 80 and Example 2-43. The sed command at the end of Example 2-43
confirms this fact. The range of 30 lines is 759 - 788, and there is no other occurrence of a
message from this T(203) scheduler thread in the ksys trace file excerpt.
Table 2-2 lists the resyncAPP task threads in the first column as they show up in the log (see
Example 2-43 on page 81) and the assigned managed VM for each thread separately in the
second column.
T(4041) rt13007
T(3d3e) rt13002
T(90a) rt13001
T(8889) rt14005
T(8788) rt14002
T(8687) rt14001
T(8586) rt14007
The first records of each thread already are presented in Example 2-43 on page 81. Similar
sequences of records are logged by each thread, and the records are interlaced as the
threads run in parallel. The last record of each sequence is shown in Example 2-52 on
page 88. We further examine in detail the sequence of the first resyncAPP task thread
T(4041), which handles the rt13007 VM.
In Example 2-45, we select the records of the T(4041) thread that is present in Example 2-43
on page 81 and also include the particular record showing up there as logged by the T(405)
thread because it is also related to the rt13007 VM.
These first records of the resyncAPP flow for the rt13007 VM show the required VM state
changes to start the actual flow. A highly significant state bit is set so that the VM HASTATE is
0x4 - 0x20000004. The state update appears to be even performed in the RMC persistent
registry by the T(405) thread, as suggested by the name of the function call
chgResourceCommited. We already see in Table 2-1 on page 65 that the distinctive stack call
for the T(405) thread is rsct_gscl_V1::GSController::dispatch(), so we rather associate
the T(405) thread with the core RSCT-related activity. The initial 0x4 state value matches the
HAState attribute value for the IBM.VMR_LPAR RMC resource of the rt13007 VM
(Example 2-46). We also check at that moment the HA_monitor_state VM attribute at the
ksysmgr level, which was set to the STARTED value.
Continuing with the resyncAPP flow of rt13007 VM, we then see that a lock is acquired for the
VM. Then, the resyncAllAppsInfo function is entered and a dbell:0x2 code appears that is
associated with the rt13007 VM. The dbell identifier is a short name for Door Bell. Door Bell
is a short notification coming from the VM Agent toward the KSYS to notify you that some
status or configuration changes happened at the application level. KSYS reacts to the
notification by initiating a communication session where all relevant application details are
requested and obtained from the VM Agent. The dbell concept is covered together with the
Application Reporting (AR) Messaging protocol in 2.3.3, “Application Reporting flow” on
page 262.
Let us examine the rest of the records in the sequence that are logged by the T(4041) thread
(Example 2-47 and Example 2-48 on page 85).
A messaging flow is initiated by KSYS exactly as in the case where a dbell:0x2 notification
was received from the VM Agent. The retrieved result in this particular case, “No data from
XML”, means that no application is configured on the rt13007 VM.
Example 2-49 lists one record for each of the task threads that resulted from the previous
addTask invocations. Each thread handles the resyncapp action for its assigned VM. All
threads are from the task thread pool, which can be checked by TID in Table 2-1 on page 65.
We expect all threads to follow the same flow of execution, but we are more interested in the
result of the thread T(90a), which deals with the VM rt13001, as we know it is the only VM
that has a configured application (Example 2-50).
Again, we do not get into many details of the resyncAPP flow of VM rt13001 because this topic
is covered in 2.3.3, “Application Reporting flow” on page 262. Example 2-51 shows the result
for the application running on the VM rt13001.
As in the ksysmgr output of Example 2-50 on page 87, we see in the log records of
Example 2-51 on page 87 that the resyncAPP retrieved result contains details about only one
application that was discovered, which is AppUuid:1541473063965437000 with a GREEN status.
Now that we clarified the resyncAPP flow, we identify the latest logged line of all resyncAPP
tasks and continue from there. Example 2-52 shows the last line of each thread separately so
that it is easy to identify that the latest line is 1169.
We also must identify the lines that are logged in 789 - 1169 by other threads than those
handling the resyncAPP tasks (Example 2-53).
Example 2-53 Records in resyncAPP flows not logged by resyncAPP task threads
# sed '789,1169!d;=' post-restart_ksys.log|sed 'N;s/\n/:/'|egrep -v
"T\(4041\)|T\(8586\)|T\(8687\)|T\(8788\)|T\(8889\)|T\(90a\)|T\(3d3e\)"
803:[15] 11/19/18 T(405) _VMR 13:49:36.238105 DEBUG VMR_LPAR.C[9124]: In
chgResourceCommited for HASTATE of VM rt13007 to 0x20000004.
818:[15] 11/19/18 T(405) _VMR 13:49:36.244447 DEBUG VMR_LPAR.C[9124]: In
chgResourceCommited for HASTATE of VM rt14007 to 0x20000004.
835:[15] 11/19/18 T(405) _VMR 13:49:36.284252 DEBUG VMR_LPAR.C[9124]: In
chgResourceCommited for HASTATE of VM rt14001 to 0x20000004.
852:[15] 11/19/18 T(405) _VMR 13:49:36.313110 DEBUG VMR_LPAR.C[9124]: In
chgResourceCommited for HASTATE of VM rt14002 to 0x20000004.
869:[15] 11/19/18 T(405) _VMR 13:49:36.377773 DEBUG VMR_LPAR.C[9124]: In
chgResourceCommited for HASTATE of VM rt14005 to 0x20000004.
886:[15] 11/19/18 T(405) _VMR 13:49:36.428750 DEBUG VMR_LPAR.C[9124]: In
chgResourceCommited for HASTATE of VM rt13001 to 0x20000004.
902:[15] 11/19/18 T(405) _VMR 13:49:36.455869 DEBUG VMR_LPAR.C[9124]: In
chgResourceCommited for HASTATE of VM rt13002 to 0x20000004.
978:[15] 11/19/18 T(405) _VMR 13:49:42.061246 DEBUG VMR_LPAR.C[9124]: In
chgResourceCommited for HASTATE of VM rt13007 to 0x4.
1082:[15] 11/19/18 T(405) _VMR 13:49:43.346419 DEBUG VMR_LPAR.C[9124]: In
chgResourceCommited for HASTATE of VM rt13001 to 0x4.
1095:[15] 11/19/18 T(809) _VMR 13:49:43.374562 DEBUG VMR_LPAR.C[9124]: In
chgResourceCommited for HASTATE of VM rt13002 to 0x4.
1108:[15] 11/19/18 T(809) _VMR 13:49:43.437520 DEBUG VMR_LPAR.C[9124]: In
chgResourceCommited for HASTATE of VM rt14007 to 0x4.
1122:[15] 11/19/18 T(809) _VMR 13:49:43.539527 DEBUG VMR_LPAR.C[9124]: In
chgResourceCommited for HASTATE of VM rt14005 to 0x4.
1153:[15] 11/19/18 T(405) _VMR 13:49:48.561214 DEBUG VMR_LPAR.C[9124]: In
chgResourceCommited for HASTATE of VM rt14001 to 0x4.
1167:[15] 11/19/18 T(809) _VMR 13:49:48.702620 DEBUG VMR_LPAR.C[9124]: In
chgResourceCommited for HASTATE of VM rt14002 to 0x4.
#
So, only T(405) and T(809) inserted some records in this range (among the bulk resyncAPP
related records). All of the records are about updating the HASTATE persistent attribute, as
shown in Example 2-45 on page 83.
The hourly QuickDiscovery and the daily DeepDiscovery actions happen as described in
“Quick Discovery and Deep Discovery scheduling records” on page 72. The DeepDiscovery
start moment of the day can be manually changed by running ksysmgr, as detailed in
“Discovery and verification” on page 92.
We further detail the role of each grouping module in the following sections.
During this initial discovery of a host group, the KSYS subsystem deploys a special
monitoring setup for the involved VMs and hosts. It creates an SSP cluster that is based on
the input that is specified in the preliminary configuration steps. The real SSP cluster
(Example 2-7 on page 50) of a host group is represented in the KSYS configuration by an
associated IBM.VMR_SSP resource (Example 2-56).
Subsequent discoveries can be started by user or by the KSYS scheduler. You normally run a
discovery command each time a change is made in your environment. By default, the KSYS
scheduler starts a so called Deep Discovery once in every 24 hours at midnight, as described
in “Quick Discovery and Deep Discovery scheduling records” on page 72. You can change
this setting, as shown in Example 2-57.
Deep Discovery means that discovery is followed by a complete verification step. Every hour,
the system performs a Quick Discovery, which is discovery only with no verification. The
KSYS subsystem scans the environment for any changes that occurred since the last
discovery and adapts to the modified environment. Collected changes are saved persistently
such that they survive any daemon or KSYS node crash or restart event. Any changes in the
configuration or issues that are found during a discovery are reported to the user and related
change or error records are written in log and trace files.
A discovery must be completed successfully before running the first verification. The
verification process ensures that the backup host can accommodate the VM. Verification
consists of an LPM validate operation that is performed for every VM on all the available
backup hosts on the host group. Any issues that are found during verification trigger a
notification to registered users, and then are logged in to log files.
Such verification can be started either for a host group, as shown in Example 2-58, or only for
a host or VM, as shown in Example 2-59.
In the case of a host or VM, we rather use LPM validation instead of verification.
The responsibility of the FDE module is to check periodically the health status of the
managed VMs and hosts and to initiate VM relocation actions when needed. The FDE
module does not perform the actual VM relocation or cleanup. It just triggers the relocation by
informing the Relocation Engine module.
The health status of the managed VMs is monitored by heartbeats. VM Agents periodically
send heartbeats to the HMs running on the underlying VIOSs. HMs save received VM
heartbeat details in the HSDB. Each HM also updates periodically its own status to the same
HSDB. VIOSs monitor each other’s health among themselves through the SSP and
underlying CAA cluster mechanisms.
The FDE thread of a host group retrieves a consolidated health status for all hosts and VMs in
that host group by submitting periodic requests to a designated VIOS that is selected among
all of the ones that are on the host group. These FDE requests reach the designated VIOS by
using specific REST API requests that are addressed to the HMC, which intermediates
between the KSYS and the VIOS. For more information about how HMC intermediates the
communication with the VIOS, see “The libkrest library and HMC pass-through APIs” on
page 98.
Example 2-60 shows an excerpt with a typical sequence of QuickQuery and Need Attention
records from an FDE trace file.
We see that the FDE thread sends two QuickQuery requests followed by a Need Attention
request at 20-second intervals on average. FDE uses two types of periodic requests:
QuickQuery (QQ)
NeedsAttention (NA)
Example 2-102 on page 139 shows that when an issue is present at the VIOS level, it is also
reported in the NeedsAttention replies and the QuickQuery replies.
Although it can provide VIOS details, a NeedsAttention request obtains VM health and state
change information. We describe the specific XML elements, subelements, and attributes of a
NeedsAttention reply later and mention here only two relevant VM attributes for the current
context:
state State of the VMM monitoring agent running on the VM as reported to
HMs and registered in the HSDB at the VIOS level.
missedHb Time in seconds since the moment when the heartbeat was not
received as expected by the VIOS HMs from the VM Agent (when the
heartbeat is received, this attribute is not reported in the NA reply).
The QQ and NA requests, their XML replies and contained XML elements, subelements, and
attributes, together with what is happening at the KSYS and VIOS levels since the moment
FDE initiates doQuickQuery or doNeedAttn actions until the moment they return as logged in
Example 2-60 on page 94, are detailed in 2.3.2, “Failure Detection Engine: QuickQuery and
NeedsAttention flows” on page 234.
If both VIOSs of a host are unhealthy, then the FDE decides that all managed VMs of the
affected host must be relocated to other hosts on the host group. If VIOSs of a host are
healthy but some VMs are not, then the FDE initiates relocation of only those unhealthy VMs.
FDE first checks for host relocation and then for VM relocation. Let us see in more detail how
these decisions are made.
VM relocation decision
The primary factor for initiating a VM relocation is the lack of heartbeat from the VM. Only
VMs that are registered as managed and have the ha_monitor attribute set to enable at the
ksysmgr level are considered for VM relocation. The sensitivity of the FDE reaction to a
missed heartbeat event can be adjusted at the ksysmgr level by using two parameters:
host_failure_detection_time and vm_failure_detection_speed. You can set both
parameters either at the global KSYS system or at a particular host group level.
Example 2-61 shows the parameters default values and ranges. The
vm_failure_detection_speed can be also adjusted at the VM level.
So, the default value of the VMFDT is 190, and its minimum and maximum values are 140 and
750.
Also, FDE has an HBoffset parameter for each managed and monitored VM. HBoffset is
used to ensure that the VM is not relocated again immediately after a successful relocation
and that it is left enough time to be activated on the new host. HBoffset is set to 0 by default
and increases each time FDE finds the VM is the Not Activated state.
The HBmissed variable that is obtained is compared with the VMFDT value that is derived for the
specific VM. If (HBmissed < VMFDT), then no action is performed. Otherwise, FDE performs
the following process:
1. Checks whether the VM is pingable. If the VM is pingable, then the missed heartbeat event
is ignored and FDE does not relocate the VM.
2. Retrieves the LPAR runtime state (from the lssyscfg -r lpar -F state output), current
reference code (by viewing the refcode attribute in the lsrefcode -r lpar command
output), and current profile details (from the lssyscfg -r prof output) of the VM from the
HMC. The HMC GUI help page of the Activate Partitions task for a partition lists the
following possible LPAR state values:
– Running: The LPAR is powered on and the OS instance on the LPAR is running.
– Not Activated: The LPAR is powered off.
– Open Firmware: The LPAR is in the System Management Services (SMS) menu.
– Shutting Down: The OS instance on the LPAR is shutting down.
– Error: The LPAR is in an error state.
– Not Available: The HMC cannot retrieve the state of the LPAR from the host.
– Migrating - Running: Live LPM is in progress.
– Migrating - Not Activated: The LPAR is shut down and being migrated to another
host.
– Hardware Discovery: The HMC is performing hardware discovery.
3. Relocate the VM if the LPAR state is in the Running state ((HBmissed - HBoffset) > VMFDT)
and its current reference code indicates that the system is started. Here are the reference
codes:
– 0 or ‘’ (empty): AIX is fully started.
– Linux ppc64: Big Endian 64-bit Red Hat is started.
– Linux ppc64le: Little Endian 64-bit Red Hat is started.
– SUSE Linux: SUSE Linux is started.
4. If the LPAR state is Not Activated, then complete the following steps:
a. Update the offset so that it goes to Running state (HBoffset = HBmissed) in the future.
b. If HBmissed > 2*VMFDT, check the reference code history of the LPAR for the word3
attribute of the most recent entry with the refcode attribute having a value of D200A200.
If this word3 value is NULL, then relocate the VM. Otherwise, it contains the reason for
the shutdown (D200A200) and no action must be performed.
5. If the LPAR is in the Open Firmware state where Hbmissed > VMFDT and the bootMode in the
LPAR profile is not sms (SMS), then relocate the VM.
Operations that are provided by the HMC REST API can be synchronous or asynchronous.
Some of the requests coming to the HMC REST server need a significant amount of elapsed
time to complete. To optimize the usage of HMC resources and give the client application the
opportunity to perform other processing, these long-running operations can run
asynchronously by using HMC REST API job resources, which are also called jobs.
In the synchronous case, the HMC REST server does not respond to the client request until
all the processing that is associated with the request completes. When this processing
completes either successfully or in error, the server provides the final result to the client side.
The client remains blocked during this time.
In the asynchronous case, the HMC REST server performs some minimal validation and
quickly returns an HTTP 200 (OK) result to the client, which indicates that the request was
accepted. Along with the HTTP 200 (OK) result, the REST server replies with an identifier that
represents the asynchronous job resource that is created to serve the request. The response
payload is formatted in a particular Atom Publishing Protocol XML structure. The
Content-Type header of the response is set to application/atom+xml, and the client is
provided with details about the newly created job resource, including the job URI, inside the
XML payload of the response.
After receiving the HTTP 200 (OK) response status code and the job details in the response
payload the client can now launch a Job Status GET request to determine whether the job
completed. When the is not completed and is still running, the returned job status code is set
to RUNNING. If the job completed successfully, the returned job status code is set to
COMPLETED_OK. Errors and other special cases are handled with extra status codes. These job
status codes and related details are provided inside the XML payload of the Job Status GET
method response.
The {R} identifier is a REST resource of a so-called root element type. The {UUID} identifier is
a UUID of a resource instance. The {OP} identifier is the name of a job operation. The {JOBID}
is the identifier of a submitted job. We selected the URL model details that are relevant to
these four matched URL patterns. They are cited exactly as they appear in the HMC REST
API reference documentation.
URL model
Concepts:
– Anchor URL patterns provide services for a type of root/child element.
– Instance URL patterns provide services for a uniquely identified root/child element.
URL Pattern Grammar building blocks:
{R} Root element type.
{UUID} A unique UUID value.
{OP} The name of a type of job (an operation).
{JOBID} The ID of a submitted job.
Resource /rest/api/web/Logon
The API receives a user ID and password as a logon request and responds with
X-API-Session. This transaction establishes a valid user session.
When a job is started, a status is returned. The status provides information about the result of
the job and other related details.
Example 2-62 shows the matches of URL occurrences in our krestlong log excerpt with the
URL patterns in the model.
Example 2-62 Matching model URL patterns with URL occurrences in the krestlong log excerpt
$ cat post-restart_krestlong.log|wc -l
38915
$ grep "/rest/api/" post-restart_krestlong.log|head -5
[10] 11/19/18 T(9192) _VMR 13:49:26.265679 DEBUG libkrest_query.c[1536]: PUT at
'/rest/api/web/Logon', with header 'Content-Type:
application/vnd.ibm.powervm.web+xml; type=LogonRequest', payload size 173 bytes
[10] 11/19/18 T(9192) _VMR 13:49:26.265681 DEBUG libkrest_query.c[1540]: URL:
[https://9.3.18.159:12443/rest/api/web/Logon]
[10] 11/19/18 T(9192) _VMR 13:49:26.517243 DEBUG libkrest_query.c[1536]: PUT at
'/rest/api/uom/VirtualIOServer/315CE56B-9BA0-46A1-B4BC-DA6108574E7E/do/PassThrough
Interface', with header 'Content-Type: application/vnd.ibm.powervm.web+xml;
type=JobRequest', payload size 1373 bytes
[10] 11/19/18 T(9192) _VMR 13:49:26.517245 DEBUG libkrest_query.c[1540]: URL:
[https://9.3.18.159:12443/rest/api/uom/VirtualIOServer/315CE56B-9BA0-46A1-B4BC-DA6
108574E7E/do/PassThroughInterface]
<link rel="SELF"
href="https://9.3.18.159:12443/rest/api/uom/jobs/1540961034191"/>
$ grep "/rest/api/" post-restart_krestlong.log|tail -5
<link rel="MANAGEMENT_CONSOLE"
href="https://9.3.18.159:12443/rest/api/uom/ManagementConsole/ec3e2415-2e77-3a0d-b
107-20f0df07ec66"/>
[04] 11/19/18 T(9192) _VMR 13:57:28.863213 DEBUG libkrest_query.c[1536]: GET at
'/rest/api/uom/jobs/1540961034289', with header 'none', payload size 0 bytes
[04] 11/19/18 T(9192) _VMR 13:57:28.863215 DEBUG libkrest_query.c[1540]: URL:
[https://9.3.18.159:12443/rest/api/uom/jobs/1540961034289]
<link rel="SELF"
href="https://9.3.18.159:12443/rest/api/uom/jobs/1540961034289/9095c232-1d0e-49ad-
bc1c-6817a9edc62a"/>
<link rel="MANAGEMENT_CONSOLE"
href="https://9.3.18.159:12443/rest/api/uom/ManagementConsole/ec3e2415-2e77-3a0d-b
107-20f0df07ec66"/>
$ grep -c "/rest/api/" post-restart_krestlong.log
480
$ grep -c "/rest/api/web/Logon" post-restart_krestlong.log
10
$ grep "/rest/api/uom/VirtualIOServer/" post-restart_krestlong.log|grep -c
"/do/PassThroughInterface"
108
$ grep -c "/rest/api/uom/jobs/" post-restart_krestlong.log
285
$ grep -c "/rest/api/uom/ManagementConsole/" post-restart_krestlong.log
77
$
We identified only two {R} root element types in our log: VirtualIOServer and
ManagementConsole.
The URL occurrences for the VirtualIOServer root element type match each time the
/rest/api/uom/{R}/{UUID}/do/{OP} pattern, with {OP} taking the PassThroughInterface
string value and {UUID} taking the UUID values of our managed VIOSs. Plenty of job status
requests that match the /rest/api/uom/jobs/{job_id} pattern are also present, so we have
good reasons to consider that the predominant requests are PassThroughInterface job
operations for instances of the VirtualIOServer root element type.
Example 2-63 Defined job operations for the VirtualIOServer root element
<entry xmlns="http://www.w3.org/2005/Atom"
xmlns:ns2="http://a9.com/-/spec/opensearch/1.1/"
xmlns:ns3="http://www.w3.org/1999/xhtml">
<id>b704f5cd-20b0-34c6-b33f-3c082aa34d51</id>
<title>OperationSet</title>
<published>2018-12-09T11:48:24.157-06:00</published>
<link rel="SELF"
href="https://rthmc3:12443/rest/api/uom/VirtualIOServer/operations/b704f5cd-20b0-3
4c6-b33f-3c082aa34d51"/>
<link rel="MANAGEMENT_CONSOLE"
href="https://rthmc3:12443/rest/api/uom/ManagementConsole/ec3e2415-2e77-3a0d-b107-
20f0df07ec66"/>
<author>...</author>
<content type="application/vnd.ibm.powervm.web+xml; type=OperationSet">
<OperationSet:OperationSet
xmlns:OperationSet="http://www.ibm.com/xmlns/systems/power/firmware/web/mc/2012_10
/" xmlns="http://www.ibm.com/xmlns/systems/power/firmware/web/mc/2012_10/"
xmlns:ns2="http://www.w3.org/XML/1998/namespace/k2" schemaVersion="V1_8_0">
<Metadata>...</Metadata>
<SetName kb="ROR" kxe="false">VirtualIOServer</SetName>
<DefinedOperations kxe="false" kb="ROO" schemaVersion="V1_8_0">
<Metadata>...</Metadata>
<Operation schemaVersion="V1_8_0">...</Operation>
...
<Operation schemaVersion="V1_8_0">...</Operation>
<Operation schemaVersion="V1_8_0">
<Metadata>...</Metadata>
<OperationName kb="ROR" kxe="false">PassThroughInterface</OperationName>
<GroupName kb="ROR" kxe="false">VirtualIOServer</GroupName>
<ProgressType kb="ROR" kxe="false">DISCRETE</ProgressType>
<AllPossibleParameters kb="ROO" kxe="false" schemaVersion="V1_8_0">
<Metadata>...</Metadata>
<OperationParameter schemaVersion="V1_8_0">
<Metadata>...</Metadata>
<ParameterName kxe="false" kb="ROR">inputXML</ParameterName>
</OperationParameter>
The result is an Atom entry XML element response payload and contains the details for all
defined operations. In Example 2-63 on page 101, we manually formatted the element
content for easier reading. We kept only the details about the input and result parameters for
the PassThroughInterface operation. The request itself,
/rest/api/uom/VirtualIOServer/operations, is shown in Example 2-64, together with the
required user session housekeeping.
Example 2-64 Obtaining the defined job operations for the VirtualIOServer root element
$ cat login.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<LogonRequest
xmlns="http://www.ibm.com/xmlns/systems/power/firmware/web/mc/2012_10/"
schemaVersion="V1_1_0">
<Metadata> <Atom/> </Metadata>
<UserID kb="CUR" kxe="false">hscroot</UserID>
<Password kb="CUR" kxe="false">abc123</Password>
</LogonRequest>
$ curl -k -X PUT -H "Content-Type: application/vnd.ibm.powervm.web+xml;
type=LogonRequest" https://rthmc3:12443/rest/api/web/Logon -d @login.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<LogonResponse
xmlns="http://www.ibm.com/xmlns/systems/power/firmware/web/mc/2012_10/"
xmlns:ns2="http://www.w3.org/XML/1998/namespace/k2" schemaVersion="V1_8_0">
<Metadata> <Atom/> </Metadata>
<X-API-Session kxe="false" kb="ROR">JF_rY...WDQI=</X-API-Session>
</LogonResponse>
Running curl on a Linux client, we first established a user session with the HMC REST
server and obtained the session token. Then, we performed our request on the
VirtualIOServer root resource and then deleted the session. The session token is shortened
for easier reading. All requests that are used here can be considered as good examples of
synchronous operations.
Content that is exchanged by the PassThroughInterface operation between KSYS and VIOS,
like inputXML, stdOut, and the rest of parameters in Example 2-63 on page 101, might be in
XML format, so it must be encoded and encapsulated in the HTTP XML payload that is
exchanged between the KSYS and the HMC. The HMC REST server uses the RMC daemon
running on the HMC and the RMC connection that is maintained by the RMC daemon running
on the VIOS.
Figure 2-9shows this encapsulation mechanism of the XML data that is exchanged by KSYS
with the VIOS itself.
The structure and meaning of the XML element messages that are exchanged by KSYS with
a VIOS are shown in the new vioHADR2.00 XML Schema Definition (XSD) that was
introduced with VIOS V3.1, as shown in Example 2-65.
<xsd:schema
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://ausgsa.austin.ibm.com/projects/v/vios/schema/vioHADR2.00"
#
Known resources:
capabilities
cmd
ioscli
padmin
viosmgr
debug
debug_level (a number that indicates the debug level)
resource (a separate resource)
lib
libvio
cluster
lu
partition
query
sp
version
libviopass
passthru
pcm
resources
version
Options:
lib/*/* [-n ElementsPerChunk] [-t tracelevel] [-v version]
Example Commands:
vioservice cmd/ioscli/cluster -status
vioservice lib/libvio/lu -n 10
vioservice debug/8/lib/libvio/cluster
Note:
Any one of these fields can be terminated with "/resources" to display
available subcommands.
Examples:
lib/libvio/resources
cmd/resources
#
We first see the inputXML content as extracted by the HMC REST server from the HTTP
request payload and passed through RMC as input to the vioservice command. Then, the
vioservice command runs, generates its result on standard output, and exits with a return
code. Standard output content and the return code value are passed back through RMC to
the HMC REST server to be encoded and sent back to libkrest and KSYS as parameter
values for the stdOut and CMDReturnCode parameters in the response XML payload.
Example 2-68 The krest log immediately after the KSYS daemon restart
# sed '1,10!d;=' post-restart_krest.log|sed 'N;s/\n/:/'
1:[00] 11/19/18 T( 1) ____ 13:48:35.136313 ******************* Trace Started -
Pid = 16646518 **********************
Now that we introduced the HMC REST API and the mechanics of an asynchronous job, we
see in Example 2-68 on page 105 that the FDE thread, T(9192), opens a session and a
session token is returned. Then, the thread submits a QuickQuery job request to the HMC at
IP address 9.3.18.159 for the VIOS with UUID 315CE56B-9BA0-46A1-B4BC-DA6108574E7E. An
entire job progress pattern follows, and the job with the jobid 1540961034191 is created. The
first job status check shows the job in the RUNNING state, so it is still running. Finally, the
second check shows that the job is complete with the status COMPLETED_OK.
We now check the progress of the same job request in the corresponding krestlong trace file.
Example 2-69 shows an excerpt from the beginning of the krestlong log file, covering the
initial logon request and the immediate request that follows, which is the first QuickQuery
request submission after our KSYS daemon restart.
Example 2-69 The krestlong log immediately after the KSYS daemon restart
[10] 11/19/18 T( 1) ____ 13:48:35.132087 ******************* Trace Started - Pid
= 16646518 **********************
[10] 11/19/18 T(304) _VMR 13:48:35.172701 [ INFO,VMRDaemon,] Start KREST category
logging
[10] 11/19/18 T(304) _VMR 13:49:26.232328 DEBUG libkrest.c[4444]: libkrest global
initialization...
...
[10] 11/19/18 T(9192) _VMR 13:49:26.265637 DEBUG libkrest.c[521]: libkrest context
initialization successful
[10] 11/19/18 T(9192) _VMR 13:49:26.265679 DEBUG libkrest_query.c[1536]: PUT at
'/rest/api/web/Logon', with header 'Content-Type:
application/vnd.ibm.powervm.web+xml; type=LogonRequest', payload size 173 bytes
[10] 11/19/18 T(9192) _VMR 13:49:26.265681 DEBUG libkrest_query.c[1540]: URL:
[https://9.3.18.159:12443/rest/api/web/Logon]
...
[10] 11/19/18 T(9192) _VMR 13:49:26.517109 DEBUG libkrest.c[521]: libkrest context
initialization successful
[10] 11/19/18 T(9192) _VMR 13:49:26.517114 DEBUG libkrest.c[9462]: Using 2.00 as
vio hadr version [ Default ]
[10] 11/19/18 T(9192) _VMR 13:49:26.517176 DEBUG <?xml version="1.0"?>
Entries are logged by the FDE thread T(9192). XML input for the QuickQuery request that is
addressed to the VIOS is logged as generated, and then the XML payload for the HTTP PUT
request is addressed to the HMC. Inside the XML payload of the PUT request, we recognize
the CDATA section with the specific QuickQuery XML content for the ParameterValue element
that is associated with the inputXML ParameterName element, as shown in the definition of the
PassThroughInterface job operation that is detailed in Example 2-63 on page 101. This
content is supposed to be passed by the HMC REST server to the vioservice command that
is run by RMC on the VIOS that is identified with the UUID that is specified in the
/rest/api/uom/VirtualIOServer/315CE56B-9BA0-46A1-B4BC-DA6108574E7E/do/PassThroughI
nterface resource identifier.
The response for the PUT request, both header and payload, is shown in Example 2-70.
<entry xmlns="http://www.w3.org/2005/Atom"
xmlns:ns2="http://a9.com/-/spec/opensearch/1.1/"
xmlns:ns3="http://www.w3.org/1999/xhtml">
<id>8bda9318-74d8-48a7-8ee4-054a2245a8d6</id>
<title>JobResponse</title>
<published>2018-11-19T18:49:27.879Z</published>
<link rel="SELF"
href="https://9.3.18.159:12443/rest/api/uom/jobs/1540961034191"/>
<author>
<name>IBM Power Systems Management Console</name>
</author>
<content type="application/vnd.ibm.powervm.web+xml; type=JobResponse">
<JobResponse:JobResponse
xmlns:JobResponse="http://www.ibm.com/xmlns/systems/power/firmware/web/mc/2012_10/
" xmlns="http://www.ibm.com/xmlns/systems/power/firmware/web/mc/2012_10/"
xmlns:ns2="http://www.w3.org/XML/1998/namespace/k2" schemaVersion="V1_8_0">
<Metadata> <Atom/> </Metadata>
<RequestURL kb="ROR" kxe="false"
href="VirtualIOServer/315CE56B-9BA0-46A1-B4BC-DA6108574E7E/do/PassThroughInterface
" rel="via" title="The URL to which the JobRequest was submitted."/>
<TargetUuid kb="ROR" kxe="false">
315CE56B-9BA0-46A1-B4BC-DA6108574E7E
</TargetUuid>
<JobID kxe="false" kb="ROR">1540961034191</JobID>
<TimeStarted kb="ROR" kxe="false">0</TimeStarted>
<TimeCompleted kxe="false" kb="ROR">0</TimeCompleted>
<Status kb="ROR" kxe="false">NOT_STARTED</Status>
<JobRequestInstance kb="ROR" kxe="false" schemaVersion="V1_8_0">
<Metadata> <Atom/> </Metadata>
<RequestedOperation kxe="false" kb="CUR" schemaVersion="V1_8_0">
<Metadata> <Atom/> </Metadata>
<OperationName kb="ROR" kxe="false">PassThroughInterface</OperationName>
<GroupName kb="ROR" kxe="false">VirtualIOServer</GroupName>
</RequestedOperation>
<JobParameters kxe="false" kb="CUR" schemaVersion="V1_3_0">
<Metadata> <Atom/> </Metadata>
<JobParameter schemaVersion="V1_3_0">
<Metadata> <Atom/> </Metadata>
<ParameterName kxe="false" kb="ROR">inputXML</ParameterName>
<ParameterValue kxe="false" kb="CUR"><?xml version="1.0"?> <VIO
xmlns="http://ausgsa.austin.ibm.com/projects/v/vios/schema/vioHADR2.00"
version="2.00" author="LIBKREST" title="Req Quick Query"> <Request
action_str="VIO_HS_QUICK_QUERY"/> </VIO>
</ParameterValue>
</JobParameter>
</JobParameters>
</JobRequestInstance>
<Progress kb="ROO" kxe="false" schemaVersion="V1_8_0">
As already suggested by the krest log excerpt in Example 2-68 on page 105, we expect to
see in the krestlong file the detailed log entries about the way the KSYS daemon is polling
for the job status through Job Status GET requests. We obtain a first GET at
'/rest/api/uom/jobs/1540961034191' entry, as shown in Example 2-71.
Example 2-71 First job status GET request for jobid 1540961034191
[10] 11/19/18 T(9192) _VMR 13:49:26.657930 DEBUG libkrest.c[521]: libkrest context
initialization successful
[10] 11/19/18 T(9192) _VMR 13:49:26.657935 DEBUG libkrest_query.c[1536]: GET at
'/rest/api/uom/jobs/1540961034191', with header 'none', payload size 0 bytes
[10] 11/19/18 T(9192) _VMR 13:49:26.657937 DEBUG libkrest_query.c[1540]: URL:
[https://9.3.18.159:12443/rest/api/uom/jobs/1540961034191]
...
The response for this first job status GET request is shown in Example 2-72.
Example 2-72 The response of the first job status GET request
[10] 11/19/18 T(9192) _VMR 13:49:26.774897 DEBUG HTTP/1.1 200 OK
Date: Mon, 19 Nov 2018 18:49:27 GMT
...
This time, we again emphasized the job status, which is now RUNNING. Then, we removed the
job parameter content, which is the same as in the response of the previous PUT request, and
emphasized the progress details for the PassThroughInterface operation itself. Its status is
PERFORMPASSTHROUGHAPI_STARTED.
Example 2-73 Second job status GET request for jobid 1540961034191
[10] 11/19/18 T(9192) _VMR 13:49:27.776273 DEBUG libkrest.c[521]: libkrest context
initialization successful
[10] 11/19/18 T(9192) _VMR 13:49:27.776280 DEBUG libkrest_query.c[1536]: GET at
'/rest/api/uom/jobs/1540961034191', with header 'none', payload size 0 bytes
[10] 11/19/18 T(9192) _VMR 13:49:27.776282 DEBUG libkrest_query.c[1540]: URL:
[https://9.3.18.159:12443/rest/api/uom/jobs/1540961034191]
The response for this second job status GET request is shown in Example 2-74.
Example 2-74 The response of the second job status GET request
[10] 11/19/18 T(9192) _VMR 13:49:27.864132 DEBUG HTTP/1.1 200 OK
Date: Mon, 19 Nov 2018 18:49:29 GMT
...
The response in Example 2-74 on page 111 is long because it contains all the progress
history of the PassThroughInterface operation and the final job results. We again
emphasized the job status, which is now COMPLETED_OK, so this is the last job status GET
request. The remaining of the payload must then contain the expected result for the job.
Examining the remaining payload, we see the expected result. The job input parameter
content was again removed for easier reading and the progress history of the
PassThroughInterface operation itself was again emphasized. This time, there is no current
state and the progress history contains all traversed states, with the final one being
PERFORMPASSTHROUGHAPI_COMPLETED.
The expected job result is in the last emphasized section, which is the Results XML element.
Inside it is the content for the StdOut parameter as returned by the vioservice command,
then the content of the CMDReturnCode parameter, which is the returned value of the
vioservice command execution, and finally the content of the RMCReturnCode parameter,
which is the returned value of the RMC command execution in corresponding ParameterValue
XML elements.
The last krestlong entries about this job operation, which immediately follow the response
payload with the job results, are listed in Example 2-75.
Example 2-75 Job results as extracted from the last job status response payload
[10] 11/19/18 T(9192) _VMR 13:49:27.864945 DEBUG libkrest.c[2473]: <StdOut>
[10] 11/19/18 T(9192) _VMR 13:49:27.864954 DEBUG libkrest.c[2555]:
job_result->result : <VIO><Response>
<quickQuery>
<viosStatList><viosStatus machine_type="8286" model="42A" serial="21E0B2V">
<viosStat uuid="5ac3e0dc-13fc-4221-9617-8ef61c4cdd83" state="UP"
hmResponsive="1" hmResponseSec='1' hasData='0' reason='0x00000000' syncMsg="0" />
<viosStat uuid="315ce56b-9ba0-46a1-b4bc-da6108574e7e" state="UP"
hmResponsive="1" hmResponseSec='1' hasData='0' reason='0x00000000' syncMsg="0" />
</viosStatus></viosStatList>
</quickQuery>
<quickQuery>
<viosStatList><viosStatus machine_type="8286" model="42A" serial="2100DEW">
<viosStat uuid="166c89ef-57d8-4e75-815c-aa5b885560b1" state="UP"
hmResponsive="1" hmResponseSec='1' hasData='0' reason='0x00000000' syncMsg="0" />
This is the final job result that is obtained for the first QuickQuery job request that is submitted
by the FDE thread to its known servicing VIOS after the KSYS daemon restart. The way FDE
thread uses this XML response payload is described in “QuickQuery flow” on page 234.
We started the analysis of this QuickQuery request, which is also the first HMC REST API job
request after the KSYS daemon restart, in Example 2-69 on page 106. We presented in detail
a complete asynchronous job operation that is performed by KSYS to communicate with a
managed VIOS by the HMC pass-through APIs.
2.2.7 Host monitor, Health Status Database, and related VIOS internals
Section “HMC pass-through APIs” on page 101 showed how the HMC to VIOS RMC
connection and vioservice command support KSYS to communicate with the VIOS side by
HMC pass through APIs. On the VIOS side, the component endpoints that are involved in this
communication include the HSDB database on the SSP cluster and the HM daemon. The HM
daemon also intermediates KSYS communication with VM Agents running on VMs.
Figure 2-10 on page 115 shows a view of the internal VIOS subsystems supporting the
various communication flows that are involved. It also suggests possible flow initiators, either
external to VIOS, like KSYS; HMC; and VMM, or internal, like the vio_daemon and HM
(ksys_hsmond) daemons.
We further consider the VIOS components in Figure 2-10 and not addressed so far:
Health library
The SSP cluster and HSDB database
Host monitor daemon
Health library
The libviohealth.a library has APIs to support KSYS to VIOS communication. The
vioservice lib/libviopass/passthru command is run for any request to the VIOS side that
originated on KSYS. The XML input of each request, which is generated on the KSYS side in
the vioHADR2.00 format, is extracted by the HMC REST server from the HTTP request
payload. Example 2-67 on page 105 shows how it occurs for a VIO_HS_QUICK_QUERY request.
We did not see how the message reaches its destination endpoint, but now we can describe
the next step. The vioservice command launches the libviopass.a library, which internally
launches the appropriate API from the libviohealth.a library, as shown in Example 2-76.
Example 2-76 The libviohealth.a APIs that were launched for KSYS requests
# dump -H /usr/ios/sbin/vioservice|grep -p "Import File Strings"
***Import File Strings***
INDEX PATH BASE MEMBER
0 /usr/lib:/lib:/usr/lpp/xlC/lib
1 libc.a shr.o
2 libviolog.a shr.o
You can easily see that most of the function names in the last column of the last output in
Example 2-76 on page 115 fit with a corresponding KSYS to VIOS request type in the set of
all these types of requests that is extracted in Example 2-77 from the
/usr/lib/vioHADR2.00.xsd XSD file.
<xsd:enumeration value="VIO_HS_NEEDS_ATTENTION"/>
<xsd:enumeration value="VIO_HS_QUICK_QUERY"/>
<xsd:enumeration value="VIO_HS_ACKNOWLEDGE"/>
<xsd:enumeration value="VIO_HS_QUERY_NODE_DATA"/>
<xsd:enumeration value="VIO_HS_QUERY_VM"/>
<xsd:enumeration value="VIO_HS_QUERY_MSG"/>
<xsd:enumeration value="VIO_HS_QUERY_MS_FOR_VMS"/>
<xsd:enumeration value="VIO_HS_QUERY_CONFLICT"/>
</xsd:restriction>
</xsd:simpleType>
...
You can refer to the /usr/lib/vioHADR2.00.xsd XSD file for descriptions of these request
type definitions and for more information about the related input and response XML elements
format and attributes.
KSYS to VIOS communication traces are logged in to the viohs.log file in the
/home/ios/logs/health directory. In Example 2-78, we examine the occurrence of previously
identified APIs that starting with the vioHs prefix in the viohs.log file. We look first for the
vioHs pattern in a snapshot of viohs.log that was taken on one VIOS of our test environment.
Then, we count the occurrences of each identified KSYS to VIOS request to the libhealth.a
API.
So, any string containing the vioHs substring in our log excerpt in Example 2-78 on page 117
is an occurrence of an API call from the libviohealth.a library:
4045=1747+873+276+1104+45.
Another legitimate question about our viohs.log file excerpt is which are the processes
writing there. From our presentation so far, the best candidate appears to be the vioservice
command. We choose the last entry of the tail output in Example 2-78 on page 117, which is
a vioHsQuickQuery trace that is left by the thread 66847175 of process 17170756. Checking
for this PID-TID combination in viosvc.log of the vioservice command, we obtain the
request and response details of a related VIO_HS_QUICK_QUERY request, as shown in
Example 2-79.
We also added in Example 2-79 on page 118 the traces that were written by our vioservice
thread (17170756 66847175) in viohs.log around the examined vioHsQuickQuery call such
that while looking at the time stamps of these traces in both logs, you can easily visualize the
whole sequence of steps that is performed.
HM to VIOS communication traces are logged in to the viohm.log file in the same
/home/ios/logs/health directory. In Example 2-81, we look for the occurrences of each
identified HM to VIOS request to the libhealth.a API in a snapshot of viohm.log that was
taken on one VIOS of our test environment.
The libviohealth.a library is closely linked with libvio.a and uses APIs that are already
available there. We mention, as examples, APIs to parse XML input or APIs to access the
HSDB database in the SSP, as shown in Example 2-82.
Example 2-84 shows the same daemons on a VIOS in our test environment host group after
the SSP cluster creation with hdisk4 as repo_disk and hdisk3 as ha_disk.
Example 2-84 SSP daemons on a VIOS node in a host group SSP cluster
# lssrc -a|head -1; lssrc -a|egrep "caa|cthags|^ pool|vio_daemon"|sort
Subsystem Group PID Status
clcomd caa 8716570 active
clconfd caa 5833058 active
cthags cthags 13500738 active
pool 18809098 active
vio_daemon 9634178 active
# /usr/ios/cli/ioscli cluster -status
Cluster Name State
KSYS_rbRMHA_1 OK
The PowerVM product documentation for SSP mentions that in VIOS V3.1 the SSP
Management database was migrated from SolidDB in previous VIOS versions to the
PostgreSQL database, as described in this excerpt from the “Getting started with shared
storage pools by using the VIOS command line” section of IBM Power Systems Virtual I/O
Server:
“In VIOS version 3.1 the SSP Management data is stored in a PostgreSQL database. All data
files of the database are stored in the file system of the SSP cluster pool. If the VIOS node
that manages the SSP database is unable to access the file system of the SSP cluster pool,
while the PostgreSQL database process is performing an I/O operation, the PostgreSQL
database aborts all operations and generates a core memory dump. The PostgreSQL
database also generates the pool file system errors and stores them in the system error log
file. The SSP database automatically recovers when the VIOS node that manages the SSP
database regains access to the file system of the SSP cluster pool.”
First, we identify the file system of the SSP cluster pool and the VIOS node that manages the
SSP database for our test environment host group SSP cluster. We expect the file system to
be on the disk that is provided by the ha_disk parameter at the host group creation and show
up under the same mount point on all VIOS nodes (Example 2-85).
Example 2-85 The shared file system of the host group SSP cluster pool
# pooladm pool list
Pool Path
------------------------------
/var/vio/SSP/KSYS_rbRMHA_1/D_E_F_A_U_L_T_061310
# pooladm pool lsdisk /var/vio/SSP/KSYS_rbRMHA_1/D_E_F_A_U_L_T_061310 -v
Tier: SYSTEM
Name: /dev/hdisk3
UID: 000000000A2819F3000000005C420426
Capacity: 20 GB
State: Up
Failure Group: Default
Partitions: 319
Spare Partitions: 1
Stale Partitions: 0
ECCR Partitions: 0
Sector Size: 512 bytes
Max Transfer: 512 KB
Data Start: 8388608 bytes
# mount|head -2;mount|grep "/var/vio/SSP/KSYS_rbRMHA_1/D_E_F_A_U_L_T_061310"
-------------------------------
NODE rt13v2.ausprv.stglabs.ibm.com
-------------------------------
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/var/vio/SSP/KSYS_rbRMHA_1/D_E_F_A_U_L_T_061310 19.88 19.36 3% -
- /var/vio/SSP/KSYS_rbRMHA_1/D_E_F_A_U_L_T_061310
-------------------------------
NODE rt13v1.ausprv.stglabs.ibm.com
-------------------------------
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/var/vio/SSP/KSYS_rbRMHA_1/D_E_F_A_U_L_T_061310 19.88 19.36 3% -
- /var/vio/SSP/KSYS_rbRMHA_1/D_E_F_A_U_L_T_061310
-------------------------------
NODE rt14v1.ausprv.stglabs.ibm.com
-------------------------------
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/var/vio/SSP/KSYS_rbRMHA_1/D_E_F_A_U_L_T_061310 19.88 19.36 3% -
- /var/vio/SSP/KSYS_rbRMHA_1/D_E_F_A_U_L_T_061310
-------------------------------
NODE rt14v2.ausprv.stglabs.ibm.com
-------------------------------
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/var/vio/SSP/KSYS_rbRMHA_1/D_E_F_A_U_L_T_061310 19.88 19.36 3% -
- /var/vio/SSP/KSYS_rbRMHA_1/D_E_F_A_U_L_T_061310
#
For more information about the pooladm options that are used in Example 2-85 on page 123,
run the pooladm help and pooladm pool help commands. The VIOS node that manages the
SSP database is known as a database node (DBN). Some details about the way that the
VIOS node that manages the SSP database regains access to the file system of the SSP
cluster pool are described in the “Problem summary” and “Problem conclusion” sections of
APAR IJ08692: VIOS SSP CANNOT ELECT DBN.
Though APAR IJ08692 applies to AIX 6.1 and was fixed in VIOS 2.2.6.32, we expect the
functions that are described to also be relevant for VIOS V3.1 and subsequent versions.
Going further with our SSP cluster examination, we see a PostgreSQL database server
instance that runs on the DBN node under the control of the vios_daemon subsystem
(Example 2-86).
The PostgreSQL database server instance (PID 17563970) was started as a vpgadmin user,
and its listening port is specified in the command line as -p 6090. The database files are kept
in the shared file system of the SSP cluster pool (by using the -D data directory option).
Options in the command line have priority over the ones in the configuration files. The default
configuration files are in the same location (by using the -D option), so the remaining
database settings can be checked there.
Example 2-87 Log location and trust authentication settings in the configuration files
# lssrc -ls vio_daemon|grep DBN
DBN NODE ID: 43947aac1b4111e9803e001018affe76
DBN Role: Primary
# cd /var/vio/SSP/KSYS_rbRMHA_1/D_E_F_A_U_L_T_061310/VIOSCFG/DB/PG
# ls *.conf
pg_hba.conf pg_ident.conf postgresql.auto.conf postgresql.conf
# grep log_directory postgresql.conf
log_directory = '/home/ios/logs/pg_sspdb' # directory where log files are written,
# grep -i socket postgresql.conf
#unix_socket_directories = '/tmp' # comma-separated list of directories
#unix_socket_group = '' # (change requires restart)
#unix_socket_permissions = 0777 # begin with 0 to use octal notation
# grep ^local pg_hba.conf
local all all trust
# ls -l /tmp|grep vpgadmin|grep ^s
srwxrwxrwx 1 vpgadmin bin 0 Jan 19 03:11 .s.PGSQL.6080
srwxrwxrwx 1 vpgadmin bin 0 Jan 19 03:18 .s.PGSQL.6090
#
For more insights into the database layout, we use a psql client interactive session through a
local UNIX domain socket connection on the DBN node, as shown in Example 2-88.
The health status tables are grouped under the vioshs schema as part of the vios SSP
cluster database. Using the same database as the SSP cluster provides some benefits:
Using existing database access code
Rolling upgrade that is automatically supported
Database resiliency as ensured by the DBN election mechanism
In Example 2-89, we continue with the same psql session, and suggest two methods that you
can use to list the column names of a table to help you select only the columns of interest in a
subsequent query.
vios=#
Example 2-90 Autonomic Health Advisor File System setup before SSP cluster creation
# /usr/ios/cli/ioscli ioslevel
3.1.0.10
# oslevel -s
7200-03-02-1846
# df -vT ahafs
Filesystem 512-blocks Used Free %Used Iused Ifree %Iused
Mounted on
/ahafs - - - - 42 32726 1% /aha
# find /aha -name "*.mon"
/aha/fs/modFile.monFactory/var/sea/SEAevents.mon
# fuser -V /aha/fs/modFile.monFactory/var/sea/SEAevents.mon
/aha/fs/modFile.monFactory/var/sea/SEAevents.mon:
inode=37 size=1 fd=3 6881786
# ps -o pid,args -p 6881786
PID COMMAND
6881786 /usr/sbin/SEAmon /aha
# ls -d /aha/*/*.monFactory
/aha/cluster/cap.monFactory /aha/disk/repDiskState.monFactory
/aha/cluster/hostnameChange.monFactory /aha/disk/vgState.monFactory
/aha/cluster/networkAdapterState.monFactory /aha/fs/modDir.monFactory
/aha/cluster/nodeAddress.monFactory /aha/fs/modFile.monFactory
/aha/cluster/nodeContact.monFactory /aha/fs/modFileAttr.monFactory
/aha/cluster/nodeList.monFactory /aha/fs/utilFs.monFactory
/aha/cluster/nodeState.monFactory /aha/mem/vmo.monFactory
/aha/cluster/siteState.monFactory /aha/mem/waitTmPgInOut.monFactory
/aha/cpu/pidProcessMon.monFactory /aha/mem/waitersFreePg.monFactory
/aha/cpu/processMon.monFactory /aha/net/inetsock.monFactory
/aha/cpu/schedo.monFactory /aha/pool/pool.membership.monFactory
/aha/cpu/waitTmCPU.monFactory /aha/pool/tier.capacity.monFactory
/aha/disk/clDiskList.monFactory
/aha/pool/tier.ecLowWaterMark.monFactory
/aha/disk/clDiskState.monFactory /aha/pool/tier.freespace.monFactory
/aha/disk/diskState.monFactory /aha/pool/tier.overcommit.monFactory
#
When an SSP cluster is deployed, a couple of other monitor files show up on its member
nodes, as shown in Example 2-91, where we picked the DBN node. The same monitor files,
except for the ones in the /aha/pool base directory, show up on all the other nodes.
Example 2-91 AHAFS monitor files on the DBN node after the SSP cluster creation
# lssrc -ls vio_daemon|grep DBN
DBN NODE ID: 43947aac1b4111e9803e001018affe76
DBN Role: Primary
# find /aha -name "*.mon"
/aha/cpu/processMon.monFactory/usr/sbin/rsct/bin/hagsd.mon
/aha/fs/modFile.monFactory/var/sea/SEAevents.mon
/aha/fs/utilFs.monFactory/var.mon
/aha/cluster/hostnameChange.monFactory/nodeHostNameEvent.mon
/aha/cluster/cap.monFactory/capEvent.mon
/aha/cluster/siteState.monFactory/siteEvent.mon
/aha/cluster/nodeList.monFactory/nodeListEvent.mon
/aha/cluster/networkAdapterState.monFactory/networkAdapterStateEvent.mon
/aha/cluster/nodeAddress.monFactory/nodeAddressEvent.mon
/aha/cluster/nodeState.monFactory/nodeStateEvent.mon
/aha/disk/repDiskState.monFactory/repDiskStateEvent.mon
/aha/disk/clDiskState.monFactory/clDiskStateEvent.mon
/aha/pool/tier.freespace.monFactory/poolId:000000000A2819F3000000005C420425,tierNa
me:SYSTEM,unitName:percentage.mon
/aha/pool/tier.ecLowWaterMark.monFactory/poolId:000000000A2819F3000000005C420425,t
ierName:SYSTEM.mon
#
The vio_daemon (PID 2032030) is among the event consumers. It monitors CAA cluster
events and SSP pool events (Example 2-92).
Example 2-92 AHAFS monitor files and event consumers on the DBN node
# for f in `find /aha -name "*.mon"`; do fuser -V $f; done
/aha/cpu/processMon.monFactory/usr/sbin/rsct/bin/hagsd.mon:
inode=44 size=1 fd=11 11207164
/aha/fs/modFile.monFactory/var/sea/SEAevents.mon:
inode=37 size=1 fd=3 6291956
/aha/fs/utilFs.monFactory/var.mon:
inode=45 size=1 fd=4 24838578
/aha/cluster/hostnameChange.monFactory/nodeHostNameEvent.mon:
inode=60 size=1 fd=38 10617192
/aha/cluster/cap.monFactory/capEvent.mon:
inode=59 size=1 fd=37 10617192
/aha/cluster/siteState.monFactory/siteEvent.mon:
inode=54 size=1 fd=14 11207164
/aha/cluster/nodeList.monFactory/nodeListEvent.mon:
inode=53 size=2 fd=12 10617192
inode=53 size=2 fd=13 11207164
/aha/cluster/networkAdapterState.monFactory/networkAdapterStateEvent.mon:
Table 2-3 presents the documented CAA event types for each of the CAA event producers
showing up as configured for the vio_daemon event consumer in Example 2-92 on page 129.
This whole monitoring framework is extended for KSYS such that vio_daemon can update the
HSDB about received CAA events. When vio_daemon is notified about a CAA event
occurrence, it uses libvio.a and ultimately libviohealth.a to update tables in the
vios.vioshs schema. The VIOS state in the vios table is updated with the received node
state from CAA and the trans table is inserted with CAA event details. To see how some of
these event flows happen, we manually perform a shutdown of a non-DBN VIOS node and
then start it.
Example 2-93 The vios and trans tables before the VIOS node shutdown
vios=# select viosuuid, state, reason from vios;
viosuuid | state | reason
--------------------------------------+-------+--------
315ce56b-9ba0-46a1-b4bc-da6108574e7e | 1 | 0
4e0f6f60-214b-4d2c-a436-0eac46f7f71f | 1 | 0
5ac3e0dc-13fc-4221-9617-8ef61c4cdd83 | 1 | 0
166c89ef-57d8-4e75-815c-aa5b885560b1 | 1 | 0
(4 rows)
vios=# select * from trans;
vmuuid | viosuuid | msid | tag | opcode | state | data | integer | txstarted
--------+----------+------+-----+--------+-------+------+---------+-----------
(0 rows)
vios=#
To reduce the rate of the updates in the database log and make sure that the CAA event
updates in the database are not affected by KSYS, we stop KSYS (by running stopsrc -s
IBM.VMR) during the test. We also stop HMs by running clcmd stopsrc -s ksys_hsmon on one
VIOS. At the database level, we modify some log parameters for easier reading, as shown in
Example 2-94.
In Example 2-95, we show the SSP cluster state after the node shutdown and an excerpt
from the CAA log on the DBN node with related data about the generated CAA NODE_DOWN
event.
Example 2-95 SSP cluster state and CAA log excerpt for the NODE_DOWN event
# lssrc -ls vio_daemon|grep DBN
DBN NODE ID: 43947aac1b4111e9803e001018affe76
DBN Role: Primary
# /usr/ios/cli/ioscli cluster -status
Cluster Name State
KSYS_rbRMHA_1 DEGRADED
We obtain the expected CAA NODE_DOWN event that is logged around 09:08:04, as marked in
bold in Example 2-95 on page 131. Now, we check the vio_demon log entries on the DBN
node around the same 09:08:04 moment and look for their TID. We also do a quick check of
the thread stack for the thread that logged the entries, which has the TID 0x2134
(Example 2-96).
Example 2-96 The vio_daemon log entries and involved thread details
# pwd
/home/ios/logs
# cp viod.log viod.log.caaevttst.txt
# more viod.log.caaevttst.txt
...
Jan 20 2019, 09:07:58.675 0x129 viod_dbhandler.c viod_start_db_timer 1.83 340
TRC -- Setting up the DB handler timer dbcl 110099400
Jan 20 2019, 09:08:04.201 0x2134 violibDB.c vioCheckSchemaExists
1.132.12.3 23524 DEB -- Parms schemaName=vioshs
Jan 20 2019, 09:08:04.219 0x2134 violibDB.c modifyDB
1.132.12.3 2040 DEB -- Statement = SET search_path TO vios
Jan 20 2019, 09:08:04.222 0x2134 violibDB.c vioGetEADB
1.132.12.3 16415 DEB -- eaName = VIO_DB_RWLEVEL has no value
Jan 20 2019, 09:08:04.222 0x2134 violibDB.c vioQueryDB
1.132.12.3 2692 DEB -- Statement= SELECT COUNT(schema_name) FROM
information_schema.schemata WHERE schema_name='vioshs'
Jan 20 2019, 09:08:04.225 0x2134 violibDB.c bindQueryResults
1.132.12.3 2497 DEB -- Query column count = 1
Jan 20 2019, 09:08:21.114 0x1 viod_misc_handler.c viod_cache_timercb 1.3 283
TRC -- Misc task handler condition set 0x1100994b0
viod.log.caaevttst.txt (91%)
# printf "%d\n" 0x2134
8500
# lssrc -s vio_daemon
Subsystem Group PID Status
vio_daemon 2032030 active
# procstack 2032030|grep tid|grep 8500
---------- tid# 62521639 (pthread ID: 8500) ----------
# procstack 2032030|more
...
---------- tid# 62521639 (pthread ID: 8500) ----------
0x090000000016590c __fd_select(??, ??, ??, ??, ??) + 0xcc
0x090000000128b7a0 caa_aha_monitor(??) + 0x760
0x0900000000b40674 vio_caa_aha_monitor(??, ??) + 0x54
0x000000010006e8fc viod_cluster_caa_thread(??) + 0x21c
0x090000000059ffe8 _pthread_body(??) + 0xe8
---------- tid# 64029023 (pthread ID: 8243) ----------
/62521639
#
Example 2-97 Health library entries around the CAA NODE_DOWN event occurrence moment
# pwd
/home/ios/logs/health
# alog -f viohs.log -o > viohs.log.20jan.caaevttst.txt
# more viohs.log.20jan.caaevttst.txt
...
[END 19661164 79298955 01/20/19-08:38:55.639 ha_util.c 1.43 253] exited with rc=0
[START 2032030 62521639 01/20/19-09:08:04.226 ha_util.c 1.43 230]
[3 2032030 62521639 01/20/19-09:08:04.226 hs_util.c hs_libvio_traces 1.66 60]
HEALTH:62521639 -- violibExchange.c initTransaction 1.27 646 Got
cluster info from vioGetCluste
rInfoFromODM! cluster name='KSYS_rbRMHA_1' id=43b050741b4111e9803e001018affe76
[3 2032030 62521639 01/20/19-09:08:04.226 hs_util.c hs_libvio_traces 1.66 60]
HEALTH:62521639 -- violibDB.c _allocHandleDB 1.132.12.3 589
HDBC = 1100987d0
[3 2032030 62521639 01/20/19-09:08:04.280 hs_util.c hs_libvio_traces 1.66 60]
HEALTH:62521639 -- violibDB.c _allocHandleDB 1.132.12.3 589
HDBC = 1105341b0
viohs.log.20jan.caaevttst.txt: END
#
The entries in the database log around the same moment, 09:08:04, match perfectly in terms
of content and time stamps with the previous entries that were logged by the vio_daemon in its
own log and in the health library log (Example 2-98).
Example 2-98 Database log around the CAA NODE_DOWN event occurrence moment
# ifconfig en0|grep inet
inet 10.40.25.243 netmask 0xfffffe00 broadcast 10.40.25.255
# pwd
/home/ios/logs/pg_sspdb
# more pg_sspdb-20-08-27.log
...
2019-01-20 09:07:58.675 CST|5c448ece.bb01f0|10.40.25.243(49883)LOG: disconnection: session
time: 0:00:00.030 user=viosadmin database=vios host=10.40.25.243 port=49883
2019-01-20 09:08:04.206 CST|5c448ed4.18d01ce|10.40.25.243(49884)LOG: connection received:
host=10.40.25.243 port=49884
2019-01-20 09:08:04.210 CST|5c448ed4.18d01ce|10.40.25.243(49884)LOG: connection authorized:
user=viosadmin database=vios
2019-01-20 09:08:04.216 CST|5c448ed4.18d01ce|10.40.25.243(49884)LOG: statement: SET DateStyle =
'ISO';SET extra_float_digits = 2;show transaction_isolation
2019-01-20 09:08:04.217 CST|5c448ed4.18d01ce|10.40.25.243(49884)LOG: statement: select oid,
typbasetype from pg_type where typname = 'lo'
2019-01-20 09:08:04.218 CST|5c448ed4.18d01ce|10.40.25.243(49884)LOG: statement: set
client_encoding to 'UTF8'
2019-01-20 09:08:04.219 CST|5c448ed4.18d01ce|10.40.25.243(49884)LOG: statement: BEGIN;SET
search_path TO vios
2019-01-20 09:08:04.220 CST|5c448ed4.18d01ce|10.40.25.243(49884)LOG: statement: COMMIT
2019-01-20 09:08:04.222 CST|5c448ed4.18d01ce|10.40.25.243(49884)LOG: statement: BEGIN;SELECT
COUNT(schema_name) FROM information_schema.schemata WHERE schema_name='vioshs'
2019-01-20 09:08:04.231 CST|5c448ed4.18d01ce|10.40.25.243(49884)LOG: disconnection: session
time: 0:00:00.024 user=viosadmin database=vios host=10.40.25.243 port=49884
END_EVPROD_INFO
END_EVENT_INFO
', 64, NOW())
2019-01-20 09:08:04.312 CST|5c448ed4.13c0198|10.40.25.243(49886)LOG: statement: COMMIT
2019-01-20 09:08:04.313 CST|5c448ed4.13c0198|10.40.25.243(49886)LOG: disconnection: session
time: 0:00:00.023 user=viosadmin database=vios host=10.40.25.243 port=49886
2019-01-20 09:08:18.743 CST|5c448ee2.19b0118|10.40.25.242(34310)LOG: connection received:
host=10.40.25.242 port=34310
pg_sspdb-20-08-27.log (99%)
#
We see in Example 2-98 on page 134 that three subsequent database sessions opened from
local IP 10.40.25.243 on ports 49884, 49886, and 49886. The session starting entry
(containing the connection received string) for each of them is marked as bolded for easier
orientation in this long excerpt. The %c escape in the log_line_prefix parameter prints a
quasi-unique session identifier that consists of two 4-byte hexadecimal numbers (without
leading zeros) that are separated by a dot. The numbers are the process start time and the
process ID (PID), so %c can also be used for easier orientation.
Comparing the time stamps of the actions that are performed on the database in the first
session (5c448ed4.18d01ce on port 49884) with the time stamps and actions that are written
by the vio_demon thread 0x2134 in viod.log from Example 2-96 on page 133, we see both log
the same query, SELECT COUNT(schema_name) FROM information_schema.schemata WHERE
schema_name='vioshs', performed at the same moment, 09:08:04.222.
We now examine the actions of the second database session, 5c448ed4.18d01d0, initiated
from the same DBN node, IP address 10.40.25.243 and port 49885. The same vio_daemon
thread logs an _allocHandleDB entry at 09:08:04.226 in the health library (Example 2-97 on
page 134) and then we see how this second database session starts at 09:08:04.237 and
performs an update in the vioshs.vios table for the row that is identified by the WHERE
nodeid_1=-3230213171645967895 AND nodeid_2=-9197194794887020938 clause.
The nodeid_1 and nodeid_2 values in the WHERE clause are derived from the 128-bit CAA
cluster node UUID that is delivered within the CAA event message itself. Here it shows up first
in Example 2-95 on page 131 inside the syslog.caa excerpt, as
NODE_ID=0xD32BFA3C1B4111E9805D001018AFFE76. The most 64 significant bits and the least 64
significant bits of the NODE_ID convert to the nodeid_1 and nodeid_2 field values of type
bigint in the vios table (Example 2-99).
Example 2-99 Database nodeid_1, nodeid_2, SSP cluster node_id, and CAA cluster node UUID
vios=# select viosuuid, nodeid_1, nodeid_2 from vios;
viosuuid | nodeid_1 | nodeid_2
--------------------------------------+----------------------+----------------------
166c89ef-57d8-4e75-815c-aa5b885560b1 | 4869651976704561641 | -9205920519165051274
4e0f6f60-214b-4d2c-a436-0eac46f7f71f | 9220310481544221161 | -9201135444560970122
5ac3e0dc-13fc-4221-9617-8ef61c4cdd83 | -6297861994804342295 | -9199165119723995530
315ce56b-9ba0-46a1-b4bc-da6108574e7e | -3230213171645967895 | -9197194794887020938
(4 rows)
vios=# select to_hex(-3230213171645967895);
to_hex
------------------
The reason=64 value in the UPDATE command appears to be taken from the CAA event
message, which is Reason 0x40. state=2, and all the other values appear to be hardcoded
for this type of event.
Similarly, the third session (5c448ed4.13c0198) starts at 09:08:04.290, just after the
vio_daemon thread logs the _allocHandleDB entry at 09:08:04.280 in the health library
(Example 2-97 on page 134). The code inside this third session extracts the viosuuid and
msid values from the vios table row that is identified by the same WHERE clause as in the
previous session. Then, the code inserts a row with appropriate field values in the trans table.
The vmmuuid is not relevant here (it is a constant string ‘none’ value), the viosuuid and msid
values are the ones that were extracted previously from the vios table, and tag appears to be
generated by code. Both opcode and state appear to be hardcoded, data and integer are
taken from the event message itself, and txtstarted is computed by the NOW() function.
Example 2-100 shows the resulting vios and trans tables in the database.
Example 2-101 Manual VIO_HS_QUICK_QUERY request after the CAA event occurrence
# cat qq.xml
<VIO xmlns="http://ausgsa.austin.ibm.com/projects/v/vios/schema/vioHADR2.00"
version="2.00" author="LIBKREST" title="Req Quick Query">
<Request action_str="VIO_HS_QUICK_QUERY"/>
</VIO>
# /usr/ios/sbin/vioservice lib/libviopass/passthru <qq.xml
<VIO><Response>
<quickQuery>
<viosStatList><viosStatus machine_type="8286" model="42A" serial="21E0B2V">
<viosStat uuid="5ac3e0dc-13fc-4221-9617-8ef61c4cdd83" state="UP" hmResponsive="0"
hmResponseSec='206498' hasData='0' reason='0x00000000' syncMsg="0" />
<viosStat uuid="315ce56b-9ba0-46a1-b4bc-da6108574e7e" state="DOWN"
hmResponsive="0" hmResponseSec='206498' hasData='1' reason='0x00000040'
syncMsg="0" />
</viosStatus></viosStatList>
</quickQuery>
<quickQuery>
<viosStatList><viosStatus machine_type="8286" model="42A" serial="2100DEW">
<viosStat uuid="166c89ef-57d8-4e75-815c-aa5b885560b1" state="UP" hmResponsive="0"
hmResponseSec='206499' hasData='0' reason='0x00000000' syncMsg="0" />
<viosStat uuid="4e0f6f60-214b-4d2c-a436-0eac46f7f71f" state="UP" hmResponsive="0"
hmResponseSec='206498' hasData='0' reason='0x00000000' syncMsg="0" />
</viosStatus></viosStatList>
</quickQuery>
</Response></VIO>
#
Let us compare the response in Example 2-101 with a typical VIO_HS_QUICK_QUERY response
of a normal steady state where all VIOSs and their HMs are in the OK state. As reference, we
use the VIO_HS_QUICK_QUERY response in Example 2-210 on page 235, where we also detail
the meaning of each viosStat element attribute of a VIOS. We see the normal expected
values of the attributes as being state="UP" hmResponsive="1" hmResponseSec='1'
hasData='0' reason='0x00000000'. Here we have a series of abnormal values, state="DOWN"
hmResponsive="0" hmResponseSec='206498' hasData='1' reason='0x00000040', for the
VIOS we stopped. Each value is expected. The state, hasData, and reason attributes provide
CAA status data. The node state is reported as DOWN and there are extra details in HSDB for
this NODE_DOWN CAA event with reason code 0x40. Conversely, when we stopped HM daemons
on all four VIOSs, they are reported as unresponsive (hmResponsive="0") with the last
heartbeat received from them 206498 seconds before.
The vioHADR2.00.xsd file mentions how KSYS retrieves this special CAA event data from
HSDB by using the VIO_HS_QUERY_NODE_DATA query. Related data about the node and
repository disk CAA events is sent to KSYS with a tag value. Then, KSYS must acknowledge
the tag back to the VIOS side, which results in clearing that data from the HSDB. The
question is when and how KSYS decides it is the correct moment to use the
VIO_HS_QUERY_NODE_DATA request. We look for this course of action in the KSYS and VIOS
logs after we restart the KSYS daemon, VIOS node, and all HMs that we stopped at the
beginning of this exercise. We also check the database for the expected data clearing.
Example 2-103 shows the trans table shortly after the VIOS node restart.
Example 2-103 HSDB trans table shortly after the VIOS restart
vios=# select data, integer, txstarted from trans order by txstarted;
data | integer | txstarted
-----------------------------------------------+---------+------------------------
BEGIN_EVENT_INFO +| 64 | 2019-01-20 09:08:04.301
TIME_tvsec=1547996884 +| |
TIME_tvnsec=200198833 +| |
SEQUENCE_NUM=4 +| |
We notice that two extra CAA events were inserted into the trans table, one ADAPTER_UP event
type and one NODE_UP event type, for the VIOS node we restarted. The occurrence of these
events is expected, but we also expect that the VIO_HS_QUERY_NODE_DATA request will appear
followed by VIO_HS_ACKNOWLEDGE, which clears the database. The correct places to look for
their direct traces are the krestlong log on the KSYS side and viosvc.log on the VIOS side.
Checking the krest log file around the same moment, 22:04:35.57, we see the sequence of a
few requests that are submitted by KSYS before and after the kriSubmitQueryNodeData
specific call, which corresponds to the VIO_HS_QUERY_NODE_DATA request (Example 2-105).
Example 2-105 Submitted requests in krest log around the VIO_HS_QUERY_NODE_DATA moment
# grep kriSubmit 24jan_krest.log.nodeupcaaevttst.txt|more
...
[04] 01/23/19 T(9192) _VMR 22:03:52.483187 DEBUG libkrest.c[9398]:
kriSubmitQuickQuery:hmc->(9.3.207.78),vios->(166C89EF-57D8-4E75-815C-AA5B885560B1)
[04] 01/23/19 T(9192) _VMR 22:03:52.602859 DEBUG libkrest.c[9514]:
kriSubmitQuickQuery returning 0 with jobid->(1547106777613).
[04] 01/23/19 T(9192) _VMR 22:04:12.705990 DEBUG libkrest.c[9398]:
kriSubmitQuickQuery:hmc->(9.3.18.159),vios->(5AC3E0DC-13FC-4221-9617-8EF61C4CDD83)
[04] 01/23/19 T(9192) _VMR 22:04:12.819585 DEBUG libkrest.c[9514]:
kriSubmitQuickQuery returning 0 with jobid->(1547106660320).
[04] 01/23/19 T(9192) _VMR 22:04:34.055212 DEBUG libkrest.c[6738]:
kriSubmitNeedAttn:hmc->(9.3.18.159),vios->(5AC3E0DC-13FC-4221-9617-8EF61C4CDD83)
[04] 01/23/19 T(9192) _VMR 22:04:34.188767 DEBUG libkrest.c[6854]:
kriSubmitNeedAttn returning 0 with jobid->(1547106660325).
[04] 01/23/19 T(9192) _VMR 22:04:35.374900 DEBUG libkrest.c[9541]:
kriSubmitAck:hmc->(9.3.18.159),vios->(5AC3E0DC-13FC-4221-9617-8EF61C4CDD83)
[04] 01/23/19 T(9192) _VMR 22:04:35.483532 DEBUG libkrest.c[9664]: kriSubmitAck
returning 0 with jobid->(1547106660326).
Up the stack at the ksys log level, we see in Example 2-106 the same sequence of actions in
the time interval around 22:04:35.57: getNeedAttn, getAcknowledge, getCAAevent, and
getAcknowledge. The condition for the sequence to happen appears to be the fact that all the
HMs get responses back a bit earlier, which is suggested by the VIOS_HM_RESP_DETECTED
events logged before the getNeedAttn call as an effect of processing the preceding
getQuickQuery.
Example 2-106 Sequencing of events and actions at the ksys log level
# more 24jan_ksys.log.nodeupcaaevttst.txt
...
[10] 01/23/19 T(9192) _VMR 22:03:52.694790 DEBUG VMR_HMC.C[6938]: getQuickQuery
[166C89EF-57D8-4E75-815C-AA5B885560B1] JobStatus: COMPLETED_OK, ReturnCode: 0
[10] 01/23/19 T(9192) _VMR 22:04:14.004125 DEBUG VMR_HMC.C[6938]: getQuickQuery
[5AC3E0DC-13FC-4221-9617-8EF61C4CDD83] JobStatus: COMPLETED_OK, ReturnCode: 0
[10] 01/23/19 T(9192) _VMR 22:04:14.019399 DEBUG VMR_SITE.C[6261]: INFO:
eventNotify entering. event:VIOS_HM_RESP_DETECTED, event type:4, comp:VIOS,
notificationLevel:low, dupEventProcess:yes
[10] 01/23/19 T(9192) _VMR 22:04:14.032162 DEBUG VMR_SITE.C[6261]: INFO:
eventNotify entering. event:VIOS_HM_RESP_DETECTED, event type:4, comp:VIOS,
notificationLevel:low, dupEventProcess:yes
[10] 01/23/19 T(9192) _VMR 22:04:14.044447 DEBUG VMR_SITE.C[6261]: INFO:
eventNotify entering. event:VIOS_HM_RESP_DETECTED, event type:4, comp:VIOS,
notificationLevel:low, dupEventProcess:yes
[10] 01/23/19 T(9192) _VMR 22:04:35.364432 DEBUG VMR_HMC.C[6827]: getNeedAttn
[5AC3E0DC-13FC-4221-9617-8EF61C4CDD83] JobStatus: COMPLETED_OK, ReturnCode: 0
[10] 01/23/19 T(9192) _VMR 22:04:35.374873 DEBUG VMR_retry.C[1151]: Doing
operation with opCode: 34(VMDR_ACK)
[10] 01/23/19 T(9192) _VMR 22:04:35.374891 DEBUG VMR_retry.C[178]: INFO: Trying
with HMC: rthmc3.
[10] 01/23/19 T(9192) _VMR 22:04:35.570806 DEBUG VMR_HMC.C[7050]: getAcknowledge
[5AC3E0DC-13FC-4221-9617-8EF61C4CDD83] JobStatus: COMPLETED_OK, ReturnCode: 0
Example 2-106 on page 142 shows how internal events are conveyed toward the event
notification subsystem of the KSYS itself. We see this mechanism acting in two distinct
instances:
The first instance occurs after the getQuickQuery processing when the
VIOS_HM_RESP_DETECTED events, corresponding to start of the HM daemons we performed
manually, are passed by the eventNotify call to the KSYS event notification subsystem.
In the second instance, three other events, VIOS_NODE_FAILURE,
NETWORK_INTERFACE_ACTIVE, and VIOS_NODE_ACTIVE, corresponding to the CAA events that
are generated by our manual VIOS node shutdown and restart, are passed to the KSYS
event notification subsystem by an eventNotify call, this time after a getCAAevent call.
Coming back to the VIOS level, we first correlate the HM daemon start moment with the
immediate preceding and subsequent VIO_HS_QUICK_QUERY requests entries in the
vioservice logs on the involved VIOS (Example 2-107).
Example 2-107 HM restart moment versus the preceding and subsequent VIO_HS_QUICK_QUERY
requests
# lsattr -El vios0|grep uuid
vios_uuid 5ac3e0dc-13fc-4221-9617-8ef61c4cdd83 VIOS Unique Identifier
False
# more /home/ios/logs/health/host_monitor.log.24jan.nodeupcaaevttst.txt
...
[2 13894106 64225577 01/20/19-09:12:28.187 main.C:main(int, char **)
: 190] HostMonitor has been shutdown.
We notice in Example 2-107 on page 143 that all three HMs except the one on the VIOS node
started. Earlier, they were reported in the reply that was returned before the HM restart as
unresponsive since the same moment (hmResponsive="0" hmResponseSec='301561'), and as
responsive immediately after (hmResponsive="1" hmResponseSec='0'). We stopped and
started the HMs simultaneously by running the clcmd stopsrc -s ksys_hsmon (stop) and
clcmd startsrc -s ksys_hsmon -a "-l2" (start) commands, so the observed behavior is
consistent.
In the last VIO_HS_QUICK_QUERY of Example 2-107 on page 143, it is the first time since the
KSYS restart that the health library does not log any errors after retrieving the VIOS status
data from the HSDB, as shown in Example 2-108, for the same instances of the vioservice
command, 22610190 26214731 and 22610194 58130901, which were run before and after the
HM restart.
The next VIO_HS_NEEDS_ATTENTION request is replied to by the health library with a tag
attribute value and is followed by an acknowledgment request, VIO_HS_ACKNOWLEDGE, from the
KSYS side with the same tag value, hsTag="-6630805255682309212" (Example 2-109). Then,
the sequence continues, as shown in the KSYS side logs, by the acknowledged
VIO_HS_QUERY_NODE_DATA request, with the hsTag="378617663756949861" (Example 2-110 on
page 147).
We reached the moment when the details about the CAA events are retrieved from the HSDB
and sent up the stack toward KSYS by the VIO_HS_QUERY_NODE_DATA request. The request is
acknowledged (Example 2-110).
Following the VIO_HS_QUERY_NODE_DATA acknowledgment, the vios and trans tables look
“clean” like they did before this exercise. The last exercise output in Example 2-111 is
identical to the output in Example 2-93 on page 131.
Example 2-111 The vios and trans tables after VIO_HS_QUERY_NODE_DATA acknowledgment
vios=# select viosuuid, state, reason from vios;
viosuuid | state | reason
--------------------------------------+-------+--------
5ac3e0dc-13fc-4221-9617-8ef61c4cdd83 | 1 | 0
4e0f6f60-214b-4d2c-a436-0eac46f7f71f | 1 | 0
166c89ef-57d8-4e75-815c-aa5b885560b1 | 1 | 0
315ce56b-9ba0-46a1-b4bc-da6108574e7e | 1 | 0
(4 rows)
vios=# select data, integer, txstarted from trans order by txstarted;
data | integer | txstarted
------+---------+-----------
(0 rows)
vios=#
This whole CAA event monitoring exercise presented the case where the event is happening
on a non-DBN node and the HSDB remains operational. Things become even more
complicated if the event affects the DBN node and the running SSP database crashes.
We used throughout this section various CAA, SSP, and PostgreSQL concepts and
commands. For extensive coverage of these topics, see the following resources:
AIX Event Infrastructure for AIX and AIX clusters-AHAFS
Cluster management
IBM PowerVM Enhancements What is New in 2013, SG24-8198
VIOS Shared Storage Pool 2.2.5 Enhancements
VIOS Shared Storage Pool phase 6 - New Features
PostgreSQL 10
The -R action attribute value restarts automatically the ksys_hsmon subsystem if it stops
abnormally. The ksys_hsmon subsystem activates when the first discovery for the involved
host group occurs. Example 2-113 shows some related excerpts from KSYS trace files that
were collected shortly after discovery.
Example 2-113 Host monitor in the host group discovery trace files
# ksysmgr discover hg rbHG
# ksysmgr trace log=ksys > ksys.log.HGdiscov; ksysmgr trace log=krest >
krest.log.HGdiscov; ksysmgr trace log=krestlong > krestlong.log.HGdiscov
# ls
krest.log.HGdiscov krestlong.log.HGdiscov ksys.log.HGdiscov
# grep hsmon *
krestlong.log.HGdiscov: <ParameterValue kb="CUR" kxe="false">0513-059 The
ksys_hsmon Subsystem has been started. Subsystem PID is 22086114.
krestlong.log.HGdiscov:[04] 04/26/19 T(91b5) _VMR 18:32:33.015960 DEBUG
libkrest.c[2558]: job_result->result : 0513-059 The ksys_hsmon Subsystem has been
started. Subsystem PID is 22086114.
krestlong.log.HGdiscov: <ParameterValue kb="CUR" kxe="false">0513-059 The
ksys_hsmon Subsystem has been started. Subsystem PID is 16122210.
krestlong.log.HGdiscov:[04] 04/26/19 T(91b5) _VMR 18:32:38.325017 DEBUG
libkrest.c[2558]: job_result->result : 0513-059 The ksys_hsmon Subsystem has been
started. Subsystem PID is 16122210.
ksys.log.HGdiscov:[02] 04/26/19 T(91b5) _VMR 18:32:33.016320DEBUG VMR_HMC.C[7355]:
jresultP->result: 0513-059 The ksys_hsmon Subsystem has been started. Subsystem
PID is 22086114.
ksys.log.HGdiscov:[02] 04/26/19 T(91b5) _VMR 18:32:38.325368DEBUG VMR_HMC.C[7355]:
jresultP->result: 0513-059 The ksys_hsmon Subsystem has been started. Subsystem
PID is 16122210.
# grep -p 18:32:33.015960 krestlong.log.HGdiscov
[04] 04/26/19 T(91b5) _VMR 18:32:33.015958 DEBUG libkrest.c[2476]: <StdOut>
[04] 04/26/19 T(91b5) _VMR 18:32:33.015960 DEBUG libkrest.c[2558]:
job_result->result : 0513-059 The ksys_hsmon Subsystem has been started. Subsystem
PID is 22086114.
<?xml version="1.0"?>
The host group that is used for this analysis consists of two frames, each of which has two
VIOS LPARs. The trace files contain entries about the activation of only two ksys_hsmon
subsystems even though there are four VIOSs. In the filtered krestlong trace file, the output
of the last grep command (Example 2-113 on page 150) shows the activation as part of a
VIOS ADD_MS pass-through API request. By checking the logged PIDs at the VIOS level, we
see that the selected VIOSs that are used by KSYS for the ADD_MS requests are
usaxvib053ccpx1 and usaxvib063ccpx1 (one is the designated VIOS for each host)
(Example 2-114).
Then, we filter the trace of this thread instance in the health library log, which contains the
expected SRC subsystem activation command. Example 2-116 shows this trace with the
vioHsAddMs health library API call.
On the partner VIOS of the same host, usaxvib053ccpx1, we obtain a different health library
sequence of records. As shown in Example 2-117, this time the action is performed by a
vio_daemon thread.
Example 2-117 The ksys_hsmon call started by vio_daemon on the partner VIOS
# uname -uL
IBM,0221282FW 2 usaxvia053ccpx1
# alog -f /home/ios/logs/health/viohs.log -o > viohs.log.postHGdiscovery
# grep ksys_hsmon viohs.log.postHGdiscovery
[3 8323398 20971909 04/26/19-18:32:28.124 hs_ms.c startHsmon 1.83 318]
cmd=/usr/bin/startsrc -s ksys_hsmon -a '-l3', rc=0, errno=0
[3 8323398 20971909 04/26/19-18:32:28.124 hs_ms.c startHsmon 1.83 374] ksys_hsmon
daemon started.
# grep "8323398 20971909" viohs.log.postHGdiscovery
[START 8323398 20971909 04/26/19-18:32:27.909 ha_util.c 1.43 230]
[3 8323398 20971909 04/26/19-18:32:27.910 hs_util.c hs_libvio_traces 1.66 60]
HEALTH:20971909 -- violibExchange.c initTransaction 1.27 646 Got
Looking at the vio_daemon log around 18:32:27.909 - 18:32:28.130 interval, we see specific
messages that are logged by the thread with tid# 20971909 (Example 2-118).
Example 2-118 ADD_HM message processing at the vSCSI kernel extension layer on the partner
VIOS
# cp /home/ios/logs/viod.log.bkp viod.log.postHGdiscovery.bkp
# more viod.log.postHGdiscovery.bkp
...
Apr 26 2019, 18:32:26.844 0x2124 viod_misc_handler.c
viod_start_cache_timer 1.3 72 TRC -- Setting up the miscellaneous
timers 11008abd0
Apr 26 2019, 18:32:27.909 0xf10 viod_ke.c viod_ke_recv_thread 1.81
1085 TRC -- VIOKE Recvd pkt
Apr 26 2019, 18:32:27.909 0xf10 viod_ke.c viod_ke_recv_thread 1.81
1111 TRC -- VIOKE Recvd pkt pyld 64
Apr 26 2019, 18:32:27.909 0xf10 viod_ke.c viod_ke_recv_thread 1.81
1153 TRC -- VIODKE enqueue msg 110089e90
Apr 26 2019, 18:32:27.909 0x1314 viod_ke.c viod_ke_prcs_thread
1.81 1270 TRC -- VIODKE condition is true
Apr 26 2019, 18:32:27.909 0x1314 viod_ke.c viod_ke_prcs_thread
1.81 1284 TRC -- VIODKE dequeue msg 0x110089e90
Apr 26 2019, 18:32:27.909 0x1314 viod_protocol.c viodDoReq
1.28 52 TRC -- viodDoReq: Enter
The user space pthread ID 4884 in decimal is 0x1314 in hexadecimal. So, we deduce that the
partner VIOS receives the ADD_HM message at the health library layer from the designated
VIOS on the same frame through the virtual Small Computer System Interface (vSCSI) kernel
extension (VKE) cluster communication layer and local viod_daemon. Matching the time
stamps, apparently the received message is processed inside the viod_dispatch_hs_msg call
of the 0x1314 viod_daemon thread. For more information about the VIOS VKE and its related
cluster communication protocol, see Communication protocol for virtual input/output server
(VIOS) cluster communication.
Now that we clarified the way that HM daemon activates on each VIOS, let us examine its
thread structure after its activation.
HM threads
Example 2-119 shows the threads of the /usr/sbin/ksys_hsmond monitor daemon (PID
24445232). Each one is identified by a kernel TID value and a user space pTID value. We are
interested in the pTID value because it shows up in various logs that we check for problem
determination and other purposes.
Table 2-4 describes each thread type that is observed. In the first column, there are some
abbreviations for further use.
Initial process thread N/A Initial thread of the process, pTID=1, running
main() function call.
UnixSocket UnixSocket Gets the commands from the VIOS side and does
the related processing.
The main job of the HM is to monitor the health of the VMs and to update their status in the
HSDB. The Processing thread of the HM receives heartbeat messages that originated at the
VM level, and updates the VM status in the HSDB. The VM heartbeat path is shown by the
red arrows with “HB” labels coming from VM, through the HM, to the SSP HSDB. Another
heartbeating mechanism is used to provide status updates about the HM itself, as shown by
the red arrow from the ProcessingDbHb thread to the SSP HSDB.
The HM also acts as a gateway between the VM and KSYS in the AR Messaging protocol.
The protocol supports the transfer of the application status and configuration updates from
VM to the KSYS side. The communication flow for this data transfer is represented by a green
line with arrows at its ends in Figure 2-11. The flow is a sequence of subflows that are listed
here in the order they occur:
Door Bell (dbell) Short notification message from the VMM to the KSYS that signals
that application changes happened.
Put Message Request & Response (PMSG_REQ, PMSG_RSP)
When DBELL is received at the KSYS side, a request message is sent
by KSYS to the VMM asking for details. VMM replies with a response
message containing the requested details.
Get Message (GMSG)
A confirmation is expected by VMM that the Put Message response
reached the KSYS. After the confirmation is received, the sent payload
information is discarded from the local AR cache on the VM side.
Clear DB A subflow of messages between the VMM on one side and HM and
HSDB on the other side that ensures that any entry that is related to
the pending AR Messaging cycle is cleared out from the HSDB. It
closes the whole cycle that is initiated by dbell.
For more information about the AR Messaging protocol, see 2.3.3, “Application Reporting
flow” on page 262.
On the VIOS side, the HM updates the heartbeat and messaging details that are received
from the VMs in the HSDB through the callback APIs that are described in “Health library” on
page 115. HM handles its own heartbeat updates in a similar fashion. KSYS retrieves these
updates from HSDB through periodic polling that is performed by the HMC and pass-through
REST APIs. Communication from KSYS toward a target VM goes through the HMC and
pass-through APIs to the VIOS RMC level. The VIOS hands commands from KSYS to the HM
by using the UNIX socket of the UnixSocket thread. Based on the type of the received
command, the UnixSocket thread cooperates with other HM threads, which perform related
processing and forward specific messages to the target VM by using HVNCP.
HM startup sequence
Example 2-122 shows the initial 10 records that are logged by the HM daemon instance with
PID 24445232 when it is activated on VIOS usaxvia053ccpx1.
Now that we have these syntax details, we can parse the log entries that are shown in
Example 2-122 on page 159. The main thread, with TID 70189439, generates most of the log
entries. After the main function call, the thread sets the log level to 3 and obtains a transaction
handle for HSDB access. The MAC address 16DC662CB90E value of the VEPA trunk adapter is
retrieved from the HSDB where it was stored during the previous host group discovery stage.
This MAC address value is used to identify the trunk adapter logical device, which is ent22
here.
Transaction handles are further obtained for the rest of HM daemon threads that are going to
access the HSDB, which are Receiver thread calls, Processing thread calls, HM
heartbeating, Error Handler, and messaging, as shown in Example 2-123.
Among the various initializations that are performed, note the update of the HM identifier in
row 27 and the related message, HostMonitor identifier updated to
'1556303548.719328000.38', a 'Hello' message will be broadcast every 30.00 seconds.
A unique identifier for each HM daemon instance is generated based on the VIOS system
clock. It changes at any subsequent HM restart so that the components around it become
aware of the new HM instance incarnation and act.
The last entry in Example 2-123 on page 160 is logged by the ProcessingMsg thread, TID
20578667, which announces that it created.
A socket is further configured and bound to the trunk VEPA interface that was identified
earlier by the MAC address, as shown in Example 2-124.
The Processing thread (TID 67437005) that is handling VM heartbeats, and the HvncpSender
thread (TID15204793) that is dedicated to sending HVNCP packets over the trunk interface,
get ready to communicate with the VM Agents around them. The first Hello broadcast
message is sent at this stage, as logged in rows 44-49 of the excerpt in Example 2-124 on
page 161. Row 50 and the subsequent entries in the HM log file are mostly recording
repeated Hello broadcast messages once every 30 seconds, as shown in Example 2-125.
For HM daemon startup sequence scenario, we did not install the VM Agent on any of the
managed VMs, which explains why the HM does not receive any messages at from the VM
side to its repeated broadcast. A complete HM to VMM communication establishment flow,
where an active agent reacts on the VM side to a received Hello message, is described in
“HM-VMM communication establishment” on page 215.
Example 2-125 on page 162 also contains traces of the ProcessingDbHb thread (TID
70123875), which informs us about the HM heartbeat that was sent to HSDB. Checking for
their occurrence, we see that they show up every 30 seconds, as shown in Example 2-126.
Section “VM Agent activation” describes basic common aspects like VMM daemon
initialization, threads and other local components, and VMM to HM communication
establishment. It also details the activation of the VMM subsystem that is responsible for the
heartbeat with the HMs.
Section “Application Management Engine” on page 176 describes the specific aspects of the
Application Management Framework subsystem.
VM Agent activation
The initial installation of the VM Agent fileset creates the ksys_vmm SRC subsystem so that a
VMM daemon starts automatically at any subsequent AIX restart. Example 2-127 shows
some details about the VMM configuration as an SRC subsystem. We also mention the
location of the related binary, configuration, and log files that were the result of the fileset
installation.
To get insights about the daemon’s thread structure, we examine a simpler case where the
AIX VM has monitoring disabled at the KSYS level, as shown in Example 2-128.
Example 2-129 shows that the VMM daemon starts with six threads.
Example 2-129 Kernel threads of the ksys_vmm daemon: VM monitoring not enabled yet
# uname -uL
IBM,0221282FW 4 vmraix2
# proctree -t -p 14549402
590204 /usr/sbin/srcmstr
TID : 8257809 (pTID : 1)
14549402 /usr/sbin/ksys_vmmd
TID : 24641969 (pTID : 1)
TID : 21234107 (pTID : 1286)
TID : 28967291 (pTID : 1029)
TID : 26870267 (pTID : 772)
TID : 22872321 (pTID : 515)
TID : 15860149 (pTID : 258)
#
The symbolic names of the functions in the thread stacks provide valuable hints about the
purpose and function of each thread. We removed the most recent calls in each thread stack
to make the output shorter. Let us see how these threads create at the daemon start.
The excerpts in Example 2-131 on page 167 show us some initial entries that are logged at
the daemon start in both the ksys_vmm.log and ksys_vmm.debug files. They are all logged by
the main thread (pTID 1). The first entry begins in both places with the distinctive START string
as the value of the leftmost field in the row, followed by the daemon PID value and the main
thread pTID value, and then followed by the daemon start time stamp.
The ksys_vmm.debug excerpt in Example 2-131 on page 167 shows that the first action that is
performed by the main thread is to check for the parameters in the
/var/ksys/config/HAMonitoring.xml configuration file. Then, the uptime and who -b system
commands ran, as logged in both the ksys_vmm.log and ksys_vmm.debug files.
Checking further both log files, you can see an entry about a kernel extension and then how
the AppsMngt (pTID 258), AppReport (pTID 515), and VmNetIntfHandler (pTID 772) threads
are started in that order. Example 2-132 shows the related entries in ksys_vmm.log, which are
bolded rows 11, 13, 14, and 24.
Example 2-133 VMM steady state: Listening for the Hello message
sed '31,$!d;=' ksys_vmm.log|sed 'N;s/\n/:/'|head -20
31:[2 14549402 1 05/19/19-02:16:43 VmMonitor.C:VmMonitor::VmMonitor() :
213] Critical VG was set for rootvg
32:
33:[2 14549402 1 05/19/19-02:16:43 VmMonitor.C:VmMonitor::VmMonitor() :
226] Kernel Extension vmmke loaded successfully
34:
35:[2 14549402 1 05/19/19-02:16:43 VmMonitor.C:VmMonitor::VmMonitor() :
276] VmMonitor thread successfully started.
36:[2 14549402 1 05/19/19-02:16:43 VmMonitor.C:VmMonitor::VmMonitor() :
286] List of network interfaces created
37:[2 14549402 1 05/19/19-02:16:43 VmUtils.C:vmutils::runSystemCommand(string) :
561] Command: lscfg -vl ent* > /var/ksys/log/cmdop_14549402_1 2>&1,
38: Command output:
39: ent0 U8408.E8E.21282FW-V4-C32-T1 Virtual I/O Ethernet Adapter (l-lan)
40:
41: Network Address.............FA858F9A8C20
42: Displayable Message.........Virtual I/O Ethernet Adapter (l-lan)
43: Hardware Location Code......U8408.E8E.21282FW-V4-C32-T1
44:
45:, returnCode: 0
46:[1 14549402 772 05/19/19-02:16:48 VmNetIntfHandler.C:VmNetIntfHandler::selectFD() :
133] Unable to find a valid interface to listen for 'Hello' messages.
47:[1 14549402 772 05/19/19-02:16:53 VmNetIntfHandler.C:VmNetIntfHandler::selectFD() :
133] Unable to find a valid interface to listen for 'Hello' messages.
48:[1 14549402 772 05/19/19-02:16:58 VmNetIntfHandler.C:VmNetIntfHandler::selectFD() :
133] Unable to find a valid interface to listen for 'Hello' messages.
49:[1 14549402 772 05/19/19-02:17:03 VmNetIntfHandler.C:VmNetIntfHandler::selectFD() :
133] Unable to find a valid interface to listen for 'Hello' messages.
50:[1 14549402 772 05/19/19-02:17:08 VmNetIntfHandler.C:VmNetIntfHandler::selectFD() :
133] Unable to find a valid interface to listen for 'Hello' messages.
#
Finally, we see in Example 2-133 how VMM enters a steady state loop where the
VmNetIntfHandler thread is looking repeatedly for interfaces on which to listen for Hello
messages, at every 5 seconds.
Because VMM kernel extension-related records appeared twice already, we take a quick look
at its log, as shown in Example 2-134.
After the initialization steps, the kernel extension enters a steady state in which it monitors the
VMM daemon by using a local heartbeat mechanism and is ready to take appropriate actions
if the daemon fails unexpectedly.
Let us now look at the complete thread structure of a fully functional VM Agent case where
communication with the underlying HMs is established. In Example 2-135, we see that our
agent daemon, with PID 6029594 and running the /usr/sbin/ksys_vmmd, has more threads
than in Example 2-129 on page 165.
In the process stack snapshot that is taken in Example 2-136, we again removed the most
recent calls in each thread stack and even the whole stack for some threads to make the
output shorter and more readable.
You can easily identify in the output of Example 2-136 on page 171 all the expected initial
threads of the process (pTID 1) by running the main() function. Then, you see that a specific
class method, <class>::threadMain(), was started for each of the remaining threads. The
class names in the filtered list are suggestions for each thread, as shown in Example 2-137.
Comparing with the threads in Example 2-130 on page 166 where VM monitoring is not
enabled, we notice here two extra triplets of communication threads: HvncpSender,
HvncpReceiver, and VlanCommunication. These triplets are created when the communication
with the underlying HMs is established, with one triplet per HM counterpart, as described in
“HM-VMM communication establishment” on page 215.
We now organize a bit of the bulk of information that is acquired by examining the daemon
logs and thread structure. Table 2-5 describes each listed thread type.
Initial process thread N/A Initial thread of the process, pTID=1, running the
main() function call.
We also identified the kernel extension component and the HAMonitoring.xml configuration
file. To better understand the way these VMM daemon threads interact with each other and
with related components, we placed them all in the diagram in Figure 2-12.
Threads responsible for communication with the HMs are grouped to simplify the diagram. Let
us first summarize what we know at this moment about the setup of communication with the
HMs.
Two scenarios are possible for the adapter detection happening during the VmNetIntfHandler
scan:
KSYS discovery creates the adapters before VM Agent is installed.
Virtual Ethernet adapters that are created at the LPAR container level are discovered by
the first subsequent operating system device scan. During the VM Agent fileset
installation, the VMM daemon starts and detects the adapters, which are either already
discovered by the operating system or first discovered by its own device scan.
VM Agent is installed first, and then KSYS discovery creates the adapters.
VMM scans repeatedly until the adapters are created. When host group discovery creates
the adapters at the LPAR container level, they show up at the OS level and are detected by
the first subsequent VM Agent scan.
In both scenarios, the Hello packets that are broadcast by an HM eventually reach an
interface on the VMM side. This situation is described in “HM-VMM communication
establishment” on page 215. We summarize it here.
If a Hello packet from the HM side arrives at an interface on the VMM side, then
VlanCommunication, HvncpSender, and HvncpReceiver created threads take over the
communication through that interface and a communication session is established with the
HM by a handshake protocol. VMM uses this session to send heartbeat messages to KSYS
and exchange application management-related messages with the KSYS side.
The VMM heartbeats are sent once every 10 seconds and coordinated by the VmMonitor
thread. The kernel extension stays in sync with the VmMonitor thread and takes over this
heartbeating task with the HM in case the VMM daemon hangs or fails. We delimit the VMM
subsystems that are responsible for the heartbeats: the VmMonitor thread and the kernel
extension. The kernel extension does not cover the application management messaging.
VMM and HM must use for this communication a protocol stack on top of the physical link
Ethernet layer that is similar to the TCP/IP stack. Because the communication volume
requirements are not that high and the media is reliable (hypervisor switch), we use a
simplified proprietary protocol stack that is called the HVNCP protocol. For more information
about this stack, see 2.2.9, “HVNCP protocol” on page 199.
In the upper left area of Figure 2-12 on page 173, we placed the ksysvmmgr command and the
HAMonitoring.xml configuration file. The ksysvmmgr command and VMM daemon
communicate directly, when needed, by UNIX sockets. On the VMM side, this communication
is the responsibility of the UnixSocket thread. Example 2-138 shows some samples for the
ksysvmmgr command that are used for VMM daemon management.
-------
We also notice in Example 2-138 on page 174 that the VMM status change that is performed
at the VM level is propagated to KSYS.
Example 2-139 lists the last records of the VMM daemon (PID 6095308), which stopped its
threads before exiting.
We also notice how the last heartbeats ((3 bytes) HB) that were sent to the HMs before
entering the shutdown sequence are followed by the SHDN:DAEMON packets ((11 bytes),
which eventually cause the VM status to become SUSPENDED.
In terms of configuration updates, the ksysvmmgr command applies the changes that are
entered by the user to the HAMonitoring.xml configuration file and can send a notification to
the VMM daemon only through the UNIX socket interface (seethe -s flag or sync option for
ksysvmmgr). The VMM daemon reloads the XML configuration when it receives such a
notification.
For more information about the ksysvmmgr command, see 3.7, “Setting up the VM Agent” on
page 309.
To conclude, we covered in this section the default functions that are obtained by a blind
installation of the VM Agent on a managed VM. When the VM is enabled for monitoring at the
KSYS level, heartbeat-based VM monitoring is activated after the HM to VMM communication
is established. We also introduced the thread structure and some other involved components.
The group of communication threads handles the low-level communication housekeeping,
which supports higher-level heartbeats and exchange of application-related messages with
the neighboring HMs. The ksysvmmgr command that was demonstrated earlier in
Example 2-138 on page 174 can also be used for application-level management.
We are now ready to approach the other core functions of the VM Agent, which are
application monitoring and management.
AME normally refers to the thread that is the core component that handles the application
management and monitoring.
During the initial VM Agent installation, the VMM daemon starts with some default parameter
values and with no application configured, as shown in Example 2-140.
Example 2-140 VMM and application status just after the agent installation
[vmraix2:root:/home/root:] ksysvmmgr q
VmMonitor
log=2
period=1
version=1.0
comment=2019-05-19 02:16:43
rebootappstarttype=VMM
VmUUID=1950678c-18bf-4c87-860c-f9e6ec24b513
[vmraix2:root:/home/root:] ksysvmmgr q app
[vmraix2:root:/home/root:] ksysvmmgr status
Subsystem Group PID Status
ksys_vmm 6881590 active
[vmraix2:root:/home/root:] tail -3 /var/ksys/log/ksys_vmm_ame.log
[START 6881590 1 07/22/19-14:26:17 ]
[1 6881590 258 07/22/19-14:26:17 AppsMngt.C:AppsMngt::doXML() : 1322]
No application to monitor into configuration (into "/var/ksys/config/HAMonitoring.xml" file).
This is not an error. You can add application into configuration using "ksysvmmgr add" command,
and synchronize configuration with "ksysvmmgr sync" command.
[2 6881590 258 07/22/19-14:26:17 AppsMngt.C:AppsMngt::doXML() : 1350]
No app at startup : AME thread going to sleep.
[vmraix2:root:/home/root:] procstack 6881590|sed '/(pthread ID: 258) ----------/,$!d'
---------- tid# 9568555 (pthread ID: 258) ----------
0xd057a534 _event_sleep(??, ??, ??, ??, ??, ??) + 0x4f4
0xd057b21c _event_wait(??, ??) + 0x35c
0xd058a8bc _cond_wait_local(??, ??, ??) + 0x19c
0xd058b1d4 _cond_wait(??, ??, ??) + 0x34
0xd058bc2c pthread_cond_wait(??, ??) + 0x1ac
0x1021b910 AppsMngt::doXML()(??) + 0xd0
0x10247424 AppsMngt::doMainLoop()(??) + 0x44
0x100ed1ec AppsMngt::threadMain()(??) + 0x12c
0x102d02d0 Runnable::runThread(void*)(??) + 0x30
0xd0565f64 _pthread_body(??) + 0xe4
[vmraix2:root:/home/root:]
In Example 2-140, the AME thread (pTID 258) goes to sleep in the doXML() call, which is
inside the doMainLoop() main call.
period
Specifies the time duration in seconds between
two consecutive occurrences of checks that are
performed by the Application Management Engine
(AME). By default, the value of this attribute
is 1 second. The value of this attribute must
be in the range 0 - 6. For best monitoring
performance, do not modify the default value.
...
Our purpose here is to come up with a simple application setup and obtain an AME thread log
that is detailed enough to capture relevant insights about the AME internal operation. A
pattern of records is expected to show up repeatedly in the AME log at each second for the
AME checks because of the period parameter default value of 1. So, we opt for in
Example 2-142 a close monitor_period value of our test application to limit the length of the
log section that we must examine. Also, the log level is raised to 3 at this moment to capture
all possible details.
We ignore for the moment the start and stopped scripts. This test monitoring script returns
after a sleep of 1 second, either a 0 value, if the /data/FAILAPP1MON file exists, or 1 otherwise.
In a normal steady state for the application, the script returns 0. If the application is in a
normal state and we want to simulate a failure, we can create manually the FAILAPP1MON file
by running the >/data/FAILAPP1MON command. To simulate a recovery, we remove the file by
running a rm /data/FAILAPP1MON command.
At one moment in time after it was started, the script can be found in four possible states:
Still running before its monitor_timeout timeout interval
Already returned OK
Already returned with an error
Not returned after a timeout
We can force manually this last timeout state by editing online the script and increasing the
sleep interval. For our monitor_timeout=3 case, a value of 7 for the sleep parameter enforces
the timeout state.
Example 2-145 shows how the application configuration is loaded from the .xml file at the
thread startup.
You can see that all records are logged in the same second and that we skipped most of the
parsing and processing that the loaded XML configuration file logged further in the doXML()
method until the first occurrence of the doLaunch() method. Note the Populating map of
applications entry logged about our application with the 1567267349133928000 unique
identifier.
With app1 application monitoring that is activated in this way, we examined repeatedly the
AME thread stack that is shown in Example 2-147. In all our captures, the thread showed up
in the doSleep() call under the doMainLoop() call, which suggests that the AME thread stays
most of the time in this sleep mode, and the other method calls showing up as logging
records in the AME log show up in the stack for such short periods that we can hardly capture
them.
In Example 2-148 through Example 2-154 on page 185, we extracted excerpts from the AME
log by using the called method and the line code details fields only for the first
7 seconds from the start of our normal steady state scenario. AME uses its own counter for
these rounds, starting from a 0 value, and is logging this counter value each time as a first
action when entering the doLaunch() method. The doLaunch() method also logs a high
precision decimal time stamp at this moment. The round counter is initialized to zero at each
VMM start.
In Example 2-148 on page 182, we see how the doLaunch() call does its core job, which is to
check the internal application status in memory (various maps like the application map that is
mentioned in Example 2-145 on page 180) and start the appropriate script. It might be one of
the monitoring, stop, or start scripts, depending on the retrieved application status from
memory structures. The expected timeout moment for the started script is also computed and
saved for later use by adding the monitor_timeout value (here 3) to the call entering time,
3624865.469372535=3+3624862.469372535. The doNonBlockingWait() method is called
after a return from doLaunch(), here at 3624862.470544447, which means doLaunch() was
short (around 0.00117191 seconds). The doNonBlockingWait() checks the started children
processes and whether they returned; if so, then the application status is updated. The last
records print the status of the applications in the synthetic form of a table with 11 columns and
one row per monitored application.
In the next round, 1 second later, as shown in Example 2-149, doLaunch() and
doNonBlockingWait() are called but leave no relevant trace because there is no script to start
or a returned script to update afterward. The status remains unchanged.
In the next two rounds, as shown in Example 2-151 and Example 2-152 on page 185,
doLaunch() and doNonBlockingWait() return without performing any relevant actions.
In round 5, as shown in Example 2-153, doLaunch() discovers that it is time for the monitoring
script to start again and starts it. It also computes and saves the related timeout moment. The
doNonBlockingWait() method notices that the script is running, so we see that the RUN field
changed to 1. At this moment, we are in the same status as at the end of round 0, so we enter
a new similar cycle.
The last round, round 6 in Example 2-154, is identical to round 1 in terms of the doLaunch()
and doNonBlockingWait() actions and resulting status.
There is one more observation we make here that is related to the records about the AME
notifying the AR. They show up logged at the beginning of the next immediate round following
a round with configuration or status changes.
Example 2-155 lists the script files also for a second application, app2, and the content of its
monitoring script. You notice the similarity of the monitoring scripts.
The output is logged by both monitoring scripts in the same file, as shown in Example 2-157.
We see that both monitoring scripts start at 08:25:34, with app1_mon.sh returning after
1 second and app2_mon.sh after 2 seconds. Then, 10 seconds later, app2_mon.sh is called
again and returns in 2 seconds, as expected. After another 10-second interval, everything
repeats, which is consistent with our monitor_period settings and sleep intervals that are
inside the scripts.
Example 2-158 shows the records of the first round for this new scenario with two
applications that are configured and started simultaneously by VMM start after a prior VMM
stop.
Example 2-158 First round: AME counter 0 for a scenario with two applications
[vmraix1:root:/home/root:] grep "08/31/19-08:25:34" /var/ksys/log/ksys_vmm_ame.log|sed
'/doLaunch/,$!d'|while read lglvl pid tid ts method rest; do a=${method##*:}; b=${a%%\(*}; echo
">
doLaunch() : AME counter is 0 (at 3596275.926736902). List of 2 app uuid managed is
"1567235918072992000 1567235971589285000".
doLaunch() : For app "1567235918072992000": this application monitoring was already active.
dependencyCheck() : Dependency list is empty
doLaunch() : For app "1567235918072992000": application script "/tmp/HA/app1_mon.sh" being
forked ! pidParent(7995782) forks pidChild(8454448).
doLaunch() : For app "1567235918072992000": computing time of timeout by adding
xmlAppTimeout(10) to appEntry.m_timeScriptTimeoutAt => 3596285.926736902.
doLaunch() : For app "1567235971589285000": this application monitoring was already active.
dependencyCheck() : Dependency list is empty
doLaunch() : For app "1567235971589285000": application script "/tmp/HA/app2_mon.sh" being
forked ! pidParent(7995782) forks pidChild(6095140).
doLaunch() : For app "1567235971589285000": computing time of timeout by adding xmlAppTimeout(5)
to appEntry.m_timeScriptTimeoutAt => 3596280.926736902.
doLaunch() : Synthesis: list of 2 processes forked during this loop is
"(uuid=1567235918072992000;monitor;/tmp/HA/app1_mon.sh)
(uuid=1567235971589285000;monitor;/tmp/HA/app2_mon.sh) ".
doNonBlockingWait() : AME waiting for 2 script(s) (at 3596275.946924373).
doNonBlockingWait() : +Synthesis of all (2) applications:
doNonBlockingWait() : | UUID STATE RESULT RUN MON DEL FAIL
STOP START RESTART DESCRIPTION
doNonBlockingWait() : | 1567235918072992000 1( UNSET) 99(NOTSET) 1 1 0 0/ 5 0/
3 0/ 3 0/3
doNonBlockingWait() : | 1567235971589285000 1( UNSET) 99(NOTSET) 1 1 0 0/ 5 0/
3 0/ 3 0/3
doNonBlockingWait() : +
[vmraix1:root:/home/root:]
Example 2-160 Enforcing a monitoring script failure for the app1 application
[vmraix1:root:/home/root:] ksysvmmgr mod app app1 monitor_failure_threshold=2 max_restart=1
[vmraix1:root:/home/root:] date;ksysvmmgr start vmm
Sun Sep 1 17:28:49 CUT 2019
0513-059 The ksys_vmm Subsystem has been started. Subsystem PID is 11469174.
[vmraix1:root:/home/root:] date; > /data/FAILAPP1MON
Sun Sep 1 17:28:57 CUT 2019
[vmraix1:root:/home/root:] ksysvmmgr stop vmm
0513-044 The ksys_vmm Subsystem was requested to stop.
[vmraix1:root:/home/root:]
Example 2-161 State changes during the monitoring script failure scenario for the app1 application
[vmraix1:root:/home/root:] while true; do dt=`date`; echo "$dt :\c"; ksysvmmgr -a status q app
app1|grep status; sleep 1; done
Sun Sep 1 17:28:49 CUT 2019 : status=NOT_MONITORED
Sun Sep 1 17:28:50 CUT 2019 : status=UNSET (YELLOW)
Sun Sep 1 17:28:51 CUT 2019 : status=UNSET (YELLOW)
Sun Sep 1 17:28:52 CUT 2019 : status=NORMAL (GREEN)
...
Sun Sep 1 17:29:00 CUT 2019 : status=NORMAL (GREEN)
Sun Sep 1 17:29:01 CUT 2019 : status=FAILING (YELLOW)
...
Sun Sep 1 17:29:06 CUT 2019 : status=FAILING (YELLOW)
Sun Sep 1 17:29:07 CUT 2019 : status=TO_STOP (YELLOW
...
Sun Sep 1 17:29:11 CUT 2019 : status=TO_STOP (YELLOW)
Sun Sep 1 17:29:12 CUT 2019 : status=TO_START (YELLOW)
From the AME log, we extracted only the relevant rounds, as shown in Example 2-162
through Example 2-165 on page 191. In round 12, as shown in Example 2-162, the
monitoring script is found to have failed the first time, so some status fields are updated from
normal as follows: applicate state (STATE) to 3(FAILING), last script return code (RESULT) to
1(ABNORM), RUN to 0 because we have no script running, and the current number of failures
counter (FAIL) to 1/2. Then, the next launch of the monitoring script, second here, happens in
round 16. Only the RUN field changes to 1 because we now have a script running.
Example 2-163 The next relevant AME log rounds for the monitoring script failure scenario: 18 and 19
09/01/19-17:29:07
doLaunch() : AME counter is 18 (at 3715288.700335582). List of 1 app uuid managed is
"1567267349133928000".
doNonBlockingWait() : AME waiting for 1 script(s) (at 3715288.700378443).
doNonBlockingWait() : For app "1567267349133928000": rc is "1", result set to "ABNORM"
doNonBlockingWait() : For app "1567267349133928000": app state is "3(FAILING)", previous result
was "1(ABNORM)", current result is "1(ABNORM)".
doNonBlockingWait() : For app "1567267349133928000": monitor failures(2)/max(2). Needs to be
stopped.
doNonBlockingWait() : app '1567267349133928000' restartcount '0' appEntry.m_statsCollectionDone
: 'false'
doNonBlockingWait() : appEntry.m_lastStatsCollectedAt : '0' currentTime : '3715288'
doNonBlockingWait() : +Synthesis of all (1) applications:
doNonBlockingWait() : | UUID STATE RESULT RUN MON DEL FAIL
STOP START RESTART DESCRIPTION
doNonBlockingWait() : | 1567267349133928000 4(TO_STOP) 1(ABNORM) 0 1 0 2/ 2 0/
3 0/ 3 0/1
doNonBlockingWait() : +
09/01/19-17:29:08
doLaunch() : AME counter is 19 (at 3715289.708493896). List of 1 app uuid managed is
"1567267349133928000".
doLaunch() : For app "1567267349133928000": this application monitoring was already active.
dependencyCheck() : Dependency list is empty
doLaunch() : For app "1567267349133928000": application script "/tmp/HA/app1_stop.sh" being
forked ! pidParent(11469174) forks pidChild(12059062).
doLaunch() : Synthesis: list of 1 processes forked during this loop is
"(uuid=1567267349133928000;stop;/tmp/HA/app1_stop.sh) ".
doNonBlockingWait() : AME waiting for 1 script(s) (at 3715289.710508228).
doNonBlockingWait() : +Synthesis of all (1) applications:
doNonBlockingWait() : | UUID STATE RESULT RUN MON DEL FAIL
STOP START RESTART DESCRIPTION
doNonBlockingWait() : | 1567267349133928000 4(TO_STOP) 1(ABNORM) 1 1 0 2/ 2 0/
3 0/ 3 0/1
doNonBlockingWait() : +
In the next round (19), as shown in Example 2-163, the stop script is started and RUN is
changed to 1.
The next relevant round, 23, is shown in Example 2-164. It finds the stop script returned OK,
and then updates STATE to 6(TO_START) because we are in a restart stage, RESULT to
0(NORMAL), and RUN to 0. So the next round, 24, starts the start script and updates RUN to 1.
Example 2-164 The next two relevant AME log rounds for a monitoring script failure scenario: 23 and 24
09/01/19-17:29:12
doLaunch() : AME counter is 23 (at 3715293.704649728). List of 1 app uuid managed is
"1567267349133928000".
doLaunch() : For app "1567267349133928000": script running.
doNonBlockingWait() : AME waiting for 1 script(s) (at 3715293.704713107).
doNonBlockingWait() : For app "1567267349133928000": rc is "0", result set to "NORMAL"
The start script returned OK in round 28, as shown in Example 2-165. As max_restart, 1 here,
is reached, the STATE goes to 9(FAILURE), RUN goes to 0, and RESTART to 1/1. This is the final
state for our failure scenario.
Example 2-165 The last relevant AME log round for the monitoring script failure scenario: 28
09/01/19-17:29:17
doLaunch() : AME counter is 28 (at 3715298.700496138). List of 1 app uuid managed is "1567267349133928000".
doLaunch() : For app "1567267349133928000": script running.
doNonBlockingWait() : AME waiting for 1 script(s) (at 3715298.700559031).
doNonBlockingWait() : For app "1567267349133928000": rc is "0", result set to "NORMAL"
doNonBlockingWait() : For app "1567267349133928000": app state is "6(TO_START)", previous result was
"0(NORMAL)", current result is "0(NORMAL)".
doNonBlockingWait() : For app "1567267349133928000": start ok
...
doNonBlockingWait() : Application : '1567267349133928000' status changed
doNonBlockingWait() : For app "1567267349133928000": cycles(1)/max(1). State is now FAILURE.
doNonBlockingWait() : app '1567267349133928000' restartcount '1' appEntry.m_statsCollectionDone : 'false'
doNonBlockingWait() : appEntry.m_lastStatsCollectedAt : '0' currentTime : '3715298'
doNonBlockingWait() : +Synthesis of all (1) applications:
doNonBlockingWait() : | UUID STATE RESULT RUN MON DEL FAIL STOP START
RESTART DESCRIPTION
doNonBlockingWait() : | 1567267349133928000 9(FAILURE) 0(NORMAL) 0 1 0 2/ 2 0/ 3 0/ 3
1/1
doNonBlockingWait() : +
Example 2-166 Enforcing the timeout for the monitoring script while in the normal steady state
[vmraix1:root:/home/root:] date;ksysvmmgr start vmm
Sun Sep 1 20:15:47 CUT 2019
0513-059 The ksys_vmm Subsystem has been started. Subsystem PID is 12648876.
[vmraix1:root:/home/root:] ksysvmmgr q app app1|grep status
status=NORMAL (GREEN)
[vmraix1:root:/home/root:] vi /tmp/HA/app1_mon.sh
[vmraix1:root:/home/root:] date
Sun Sep 1 20:16:07 CUT 2019
[vmraix1:root:/home/root:] grep sleep /tmp/HA/app1_mon.sh
sleep 7
[vmraix1:root:/home/root:]
In Example 2-167, we probed the application state by using ksysvmmgr on a 1-second basis
and kept only the captures before and after the state transitions.
Example 2-167 State transitions during the monitoring script timeout scenario for the app1 application
[vmraix1:root:/home/root:] while true; do dt=`date`; echo "$dt :\c"; ksysvmmgr -a
status q app app1|grep status; sleep 1; done
Sun Sep 1 20:15:47 CUT 2019 : status=NOT_MONITORED
Sun Sep 1 20:15:48 CUT 2019 : status=UNSET (YELLOW)
Sun Sep 1 20:15:49 CUT 2019 : status=UNSET (YELLOW)
Sun Sep 1 20:15:50 CUT 2019 : status=NORMAL (GREEN)
...
Sun Sep 1 20:16:10 CUT 2019 : status=NORMAL (GREEN)
Sun Sep 1 20:16:11 CUT 2019 : status=FAILING (YELLOW)
...
Sun Sep 1 20:16:16 CUT 2019 : status=FAILING (YELLOW)
Sun Sep 1 20:16:17 CUT 2019 : status=TO_STOP (YELLOW)
...
Sun Sep 1 20:16:21 CUT 2019 : status=TO_STOP (YELLOW)
Sun Sep 1 20:16:22 CUT 2019 : status=TO_START (YELLOW)
...
Sun Sep 1 20:16:27 CUT 2019 : status=TO_START (YELLOW)
Sun Sep 1 20:16:28 CUT 2019 : status=FAILURE (RED)
^C[vmraix1:root:/home/root:]
From the AME log, we extracted only the relevant rounds, as shown in Example 2-168,
Example 2-169 on page 193, and Example 2-170 on page 194. In round 20, as shown in
Example 2-168, the monitoring script is started after a prior normal return and the time of
timeout is computed and saved.
Example 2-168 The first two relevant AME log rounds for the monitoring script timeout scenario: 20 and 23
09/01/19-20:16:07
doLaunch() : AME counter is 20 (at 3725309.045299853). List of 1 app uuid managed is
"1567267349133928000".
doLaunch() : For app "1567267349133928000": this application monitoring was already active.
dependencyCheck() : Dependency list is empty
The next relevant round, which is shown in Example 2-168 on page 192, is 23, where a new
method, doInspectForTimeout(), shows up with two records mentioning that the script is
found in timeout and that it kills the script process.
The next round, 24, shown in Example 2-169, finds the script that is killed and updates the
application status fields. STATE is changed to 3(FAILING), RESULT to 2(TIMOUT), RUN to 0, and
FAIL to 1/ 2.
Example 2-169 The next two relevant AME log rounds for the monitoring script timeout scenario: 24 and 25
09/01/19-20:16:11
doLaunch() : AME counter is 24 (at 3725313.044644154). List of 1 app uuid managed is
"1567267349133928000".
doNonBlockingWait() : AME waiting for 1 script(s) (at 3725313.044686074).
doNonBlockingWait() : For app "1567267349133928000": pid "11075936" killed, result set to
"TIMOUT"
doNonBlockingWait() : For app "1567267349133928000": app state is "2(NORMAL)", previous result
was "0(NORMAL)", current result is "2(TIMOUT)".
doNonBlockingWait() : For app "1567267349133928000": monitor failures(1)/max(2).
doNonBlockingWait() : app '1567267349133928000' restartcount '0' appEntry.m_statsCollectionDone
: 'false'
doNonBlockingWait() : appEntry.m_lastStatsCollectedAt : '0' currentTime : '3725313'
doNonBlockingWait() : +Synthesis of all (1) applications:
doNonBlockingWait() : | UUID STATE RESULT RUN MON DEL FAIL
STOP START RESTART DESCRIPTION
doNonBlockingWait() : | 1567267349133928000 3(FAILING) 2(TIMOUT) 0 1 0 1/ 2 0/
3 0/ 3 0/1
doNonBlockingWait() : +
The next round in Example 2-169 on page 193, 25, starts the monitoring script after the first
failure, computes its timeout time, and updates the status, which is RUN set to 1.
Example 2-170 shows the next relevant rounds, 29, 30, and 31. In round 29, the monitoring
script that restarted earlier in round 25 is found again in timeout by doInspectForTimeout(),
so its process is killed again.
Example 2-170 The next three relevant AME log rounds for the monitoring script timeout scenario: 29, 30, and 31
09/01/19-20:16:16
doLaunch() : AME counter is 29 (at 3725318.043083808). List of 1 app uuid managed is
"1567267349133928000".
doNonBlockingWait() : AME waiting for 1 script(s) (at 3725318.043125326).
doNonBlockingWait() : +Synthesis of all (1) applications:
doNonBlockingWait() : | UUID STATE RESULT RUN MON DEL FAIL
STOP START RESTART DESCRIPTION
doNonBlockingWait() : | 1567267349133928000 3(FAILING) 2(TIMOUT) 1 1 0 1/ 2 0/
3 0/ 3 0/1
doNonBlockingWait() : +
doInspectForTimeout() : For app "1567267349133928000": timeout was set to 3725317.046365310.
doInspectForTimeout() : For app "1567267349133928000": script was in timeout, killing now script
process (pid "11731218").
09/01/19-20:16:17
doLaunch() : AME counter is 30 (at 3725319.044798625). List of 1 app uuid managed is
"1567267349133928000".
doLaunch() : For app "1567267349133928000": script was killed because of timeout.
doNonBlockingWait() : AME waiting for 1 script(s) (at 3725319.044863876).
doNonBlockingWait() : For app "1567267349133928000": pid "11731218" killed, result set to
"TIMOUT"
doNonBlockingWait() : For app "1567267349133928000": app state is "3(FAILING)", previous result
was "2(TIMOUT)", current result is "2(TIMOUT)".
doNonBlockingWait() : For app "1567267349133928000": monitor failures(2)/max(2). Needs to be
stopped.
doNonBlockingWait() : app '1567267349133928000' restartcount '0' appEntry.m_statsCollectionDone
: 'false'
doNonBlockingWait() : appEntry.m_lastStatsCollectedAt : '0' currentTime : '3725319'
doNonBlockingWait() : +Synthesis of all (1) applications:
In the next round of Example 2-170 on page 194, 30, the script is discovered as killed while
running, so it fails a second time. The monitor_failure_threshold is reached, so a first
restart must be initiated. STATE changes to 4(TO_STOP), RUN to 0, FAIL to 2/2. The subsequent
round, 31, starts the stop script as the first restart step and sets RUN to 1. We do not continue
here with the subsequent rounds evolving toward the final FAILURE state because they are
similar to the rounds following to round 19 of the scenario in “Monitoring script timeout
scenario” on page 192.
Example 2-171 Enforcing a script not usable case for the monitoring script while in the normal steady
state
[vmraix1:root:/home/root:] date;ksysvmmgr start vmm
Mon Sep 2 03:22:29 CUT 2019
0513-059 The ksys_vmm Subsystem has been started. Subsystem PID is 8585668.
[vmraix1:root:/home/root:] ksysvmmgr q app app1|grep status; mv
/tmp/HA/app1_mon.sh /tmp/HA/app1_mon.shh; date
status=NORMAL (GREEN)
Mon Sep 2 03:22:39 CUT 2019
[vmraix1:root:/home/root:]
In Example 2-172, we probed the application state by using ksysvmmgr on a 1-second basis
and captured the sequence of state transitions similar to the way we did in previous
scenarios.
Example 2-172 State transitions during the monitoring script not usable scenario for the app1
application
[vmraix1:root:/home/root:] while true; do dt=`date`; echo "$dt :\c"; ksysvmmgr -a
status q app app1|grep status; sleep 1; done
Mon Sep 2 03:22:29 CUT 2019 : status=NOT_MONITORED
You must check for AME log rounds around 03:22:39. Example 2-173 shows the first round,
9, logged at 03:22:39, which shows a normal application state and a prior successful return.
The next round in Example 2-173 on page 196, 10, shows how the script that will be started is
not found in an appropriate state and how the application state is set to 8(ABNORMAL). This is a
final state, as shown in the next rounds.
Similar findings can be easily obtained for the start and stop script not usable scenarios.
Similarly, putting together all noticed state transitions of our test application in the presented
scenarios, we come up with the state machine diagram in Figure 2-14.
The HVNCP protocol provides the low-level communication means so that on top of the
Ethernet layer, the HM and VMM exchange messages for the following three main types of
communication flows:
Communication session establishment
Heartbeat messages from VMM toward KSYS
Application management messages between the VMM and KSYS sides
Example 2-65 on page 103 shows the complete list of possible KSYS to VIOS requests, as
documented in the /usr/lib/vioHADR2.00.xsd file at the VIOS level. Some of them are used
by KSYS to trigger communication flows by itself. Others are used in intermediate stages of
the flows that are initiated by other components. If the stages of different flows are similar, we
describe the first occurrence in detail and refer to it later. Section “CAA event monitoring” on
page 128 shows an example of a communication flow that is initiated outside of KSYS at the
VIOS CAA subsystem level underneath the host group SSP cluster.
From the KSYS subsystem perspective, first we consider all the flows running periodically
during the normal steady state operation of the cluster.
As described in “Quick Discovery and Deep Discovery scheduling records” on page 72,
periodic discovery actions are scheduled to identify configuration changes that might have
occurred in the whole infrastructure. A Quick Discovery action is performed hourly and a
Deep Discovery action every 24 hours. Detailed flow examples for each of these actions are
shown in 2.3.1, “Discovery of a VM that is enabled for monitoring” on page 203. To shorten
the explanations, we use the simplest possible environment of a host group with two hosts
and two VMs, one on each host. VMs themselves are configured in the simplest possible way,
which still allows us to capture the aspects of interest for the scenario under study.
Conversely, as described in “Failure Detection Engine” on page 94, two QuickQuery requests
and one NeedsAttention request are submitted at 20 seconds, one after the other, by the host
group FDE thread. The flows of these requests are described in 2.3.2, “Failure Detection
Engine: QuickQuery and NeedsAttention flows” on page 234.
The status of the ms, vm, map, and trans tables in the HSDB before we start this exercise is
shown in Example 2-174.
Example 2-174 The ms, vm, map, and trans tables in HSDB just before the discovery command
vios=# select * from ms;
msid | clusterid | mtm | machinetype | model | serial
----------------------+-----------+-----------------+-------------+-------+---------
7926057748306591969 | 1 | 8408-E8E21282FW | 8408 | E8E | 21282FW
-5878939440565042805 | 1 | 8408-E8E212424W | 8408 | E8E | 212424W
(2 rows)
vios=# select * from vm;
vmuuid | name | msid | misshistory | state | up_major | up_minor | lo_major | lo_minor |
shortmsg | msgsync | msgsdeleted | misshistoryts | suspendts
--------+------------+------+-------------+-------+----------+----------+----------+----------+-
---------+---------+-------------+---------------+-----------
| default_vm | | 0 | 1 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | |
(1 row)
vios=# select * from map;
vmuuid | viosuuid | misshistory | hmrecovery | viosrecovery | vmhbmissed
--------+----------+-------------+------------+--------------+------------
(0 rows)
vios=# select * from trans;
vmuuid | viosuuid | msid | tag | opcode | state | data | integer | txstarted
--------+----------+------+-----+--------+-------+------+---------+-----------
(0 rows)
vios=#
The ms table is already populated with host entries, and the other tables are empty.
Example 2-175 shows the output of the manual discovery that is performed shortly after the
VM monitoring enablement command.
Example 2-175 Host group discovery after enabling monitoring for the vmraix1 VM
# ksysmgr q vm vmraix1|grep -e UUID -e HA_monitor
UUID: 61BE20F0-09A8-4753-B5E5-564F8A545ED5
HA_monitor: disable
# date; ksysmgr mod vm vmraix1 ha_monitor=enable
Tue Apr 30 08:58:25 CUT 2019
For VM vmraix1 attribute(s) 'ha_monitor' was successfully modified.
# date;ksysmgr -t discover hg rbHG
Tue Apr 30 08:59:54 CUT 2019
09:00:01 Running discovery on Host_group rbHG, this may take few minutes...
09:00:32 Existing HA trunk adapter found for VIOS usaxvib063ccpx1
09:00:37 Existing HA trunk adapter found for VIOS usaxvia063ccpx1
From the time stamps, we see that the whole discovery took about 4 minutes. We are
interested in the VM-related actions (the underlying infrastructure-related actions are covered
in 2.3.1, “Discovery of a VM that is enabled for monitoring” on page 203). The test
environment consists of a simple dual-VIOS configuration on two hosts with one managed
VM on each host. At least one previous discovery was performed since the rbHG host group
was added, so the SSP cluster is already created at the VIOS level and the hypervisor switch
on each component host. The presence of the hypervisor switches on each host is confirmed
in the output of Example 2-175 on page 203 by the virtual Ethernet trunk adapter found
messages for each VIOS apart.
The vmraix1 VM on the first host was enabled for monitoring before the current discovery, but
the vmraix2 VM remained disabled. You can see in the output of the discovery command how
vmraix1 is Prepared for HA management by the creation of two HA client adapters followed by
the action of Starting HA monitoring for VM vmraix1. Also, we see that VIOS HA trunk
adapters are found as existing from previous host group discovery. The final message that
vmraix1 VMM state is not moved to started shows up because the VM Agent is not yet
installed on the VM.
The KSYS trace files that we intend to examine are collected and shown in Example 2-176.
Checking the entries in the user trace file that are logged after the discovery command start
moment, Tue Apr 30 08:59:54 CUT 2019, we see a first trace of the discovery flow in an
abridged form that is more detailed than the command output in Example 2-141 on page 178.
We divide this user trace file content, as shown in Example 2-177 and Example 2-178 on
page 206. In Example 2-177, we see that the host group discovery starts with a successful
verification block for the HMC, followed by a first CEC verification block, inside which VIOS
and trunk adapter verifications are passed successfully.
Example 2-177 The user trace file records for VM monitoring enable discovery
# more user.log.posthamonitorenable
[09] 04/30/19 T(203) _VMR 08:59:13.165979 [ INFO]: SCHEDULER initiated QUICK discoverVerify
Completed
[09] 04/30/19 T(9113) _VMR 08:59:57.241032 DEBUG VMR_HG.C[4919]: ------------- HG Verification
process START for HG : rbHG ------------
[09] 04/30/19 T(9113) _VMR 08:59:57.324091 [ INFO,VMR_HMCRccp,] Starting verify_HMCRccp
[09] 04/30/19 T(9113) _VMR 08:59:57.324110 [ INFO,VMR_HMCRcp,usaxhmc013ccpf1] Verify_HMCRcp
for HMC = usaxhmc013ccpf1, IP = 129.41.127.3
[09] 04/30/19 T(9113) _VMR 08:59:57.324116 [ INFO,VMR_HMCRcp,usaxhmc013ccpf1] Start
discover_HMC usaxhmc013ccpf1
[09] 04/30/19 T(9113) _VMR 08:59:57.636845 [ INFO,VMR_HMCRcp,usaxhmc013ccpf1] End
discover_HMC usaxhmc013ccpf1
[09] 04/30/19 T(9113) _VMR 08:59:57.636887 [ INFO,VMR_HMCRcp,usaxhmc013ccpf1] CEC List &
SessionID is same. So no update is required.
[09] 04/30/19 T(9113) _VMR 08:59:57.636894 [ INFO,VMR_HMCRcp,usaxhmc013ccpf1] Verify_HMCRcp
for HMC = usaxhmc013ccpf1 completed
[09] 04/30/19 T(9113) _VMR 08:59:57.636901 [ INFO,VMR_HMCRccp,] Completed verify_HMCRccp
The sequence of actions continues with a similar CEC verification block for the second host in
the configuration, as shown in Example 2-178.
Example 2-178 The user trace file records for VM monitoring enable discovery (continued)
# more user.log.posthamonitorenable
[09] 04/30/19 T(9113) _VMR 09:00:35.406256 [
INFO,VMR_CECRcp,33613bcd-7ca1-3558-874a-c1e1d3ceee32] verify_CECRcp
CEC:Server-8408-E8E-SN21282FW(33613bcd-7ca1-3558-874a-c1e1d3ceee32), cluster_type:HA
[09] 04/30/19 T(9113) _VMR 09:00:48.562443 [
INFO,VMR_VIOSRcp,72DEE902-1210-4BD7-A35F-3A6C771C6453] Performing verify_VIOSRcp on VIOS
usaxvib053ccpx1
[09] 04/30/19 T(9113) _VMR 09:00:49.824186 DEBUG VMR_VIOS.C[5751]: verifyTrunkAdapter passed
on usaxvib053ccpx1
[09] 04/30/19 T(9113) _VMR 09:00:49.826664 [
INFO,VMR_VIOSRcp,72DEE902-1210-4BD7-A35F-3A6C771C6453] verify_VIOSRcp done for VIOS
usaxvib053ccpx1
[09] 04/30/19 T(9113) _VMR 09:00:49.826680 [
INFO,VMR_VIOSRcp,10875B47-D737-44F9-A745-554F4DF4ADF8] Performing verify_VIOSRcp on VIOS
usaxvia053ccpx1
[09] 04/30/19 T(9113) _VMR 09:00:51.204494 DEBUG VMR_VIOS.C[5751]: verifyTrunkAdapter passed
on usaxvia053ccpx1
[09] 04/30/19 T(9113) _VMR 09:00:51.207185 [
INFO,VMR_VIOSRcp,10875B47-D737-44F9-A745-554F4DF4ADF8] verify_VIOSRcp done for VIOS
usaxvia053ccpx1
[09] 04/30/19 T(9113) _VMR 09:00:53.173723 [
INFO,VMR_CECRcp,33613bcd-7ca1-3558-874a-c1e1d3ceee32] Leaving verify_CECRcp
CEC:Server-8408-E8E-SN21282FW
[09] 04/30/19 T(9113) _VMR 09:01:14.254966 DEBUG VMR_HG.C[12360]: Add managed system success for
CEC Server-8408-E8E-SN21282FW(33613bcd-7ca1-3558-874a-c1e1d3ceee32)
[09] 04/30/19 T(9113) _VMR 09:01:14.523511 DEBUG VMR_HG.C[12360]: Add managed system success for
CEC Server-8408-E8E-SN212424W(3a53a6fc-8dcb-3671-a853-70526651f83d)
[09] 04/30/19 T(9113) _VMR 09:01:14.785362 DEBUG VMR_CEC.C[13243]: add_VM() passed for host
Server-8408-E8E-SN212424W so now setting the status to VIOs managed for VM vmraix1
[09] 04/30/19 T(9113) _VMR 09:01:38.928457 DEBUG VMR_CEC.C[14972]: ClientEthernetAdapter
state:OK for VM:vmraix1(61BE20F0-09A8-4753-B5E5-564F8A545ED5)
[09] 04/30/19 T(9113) _VMR 09:01:44.801030 DEBUG VMR_CEC.C[14588]: StartVM() passed for host
Server-8408-E8E-SN212424W, VM vmraix1 so Moving VMMonitor state to starting
[09] 04/30/19 T(9113) _VMR 09:03:54.849762 DEBUG VMR_HG.C[5741]: ------------- HG Verification
process END for HG : rbHG ------------
user.log.posthamonitorenable: END
The krest trace file reveals counterpart pass-through API requests to involve VIOSs, as
shown in Example 2-179, so we know what to look for at the VIOS level.
Example 2-179 The krest trace file records for VM monitoring enabled discovery
# egrep -i "addms|addvm|startvm" krest.log.posthamonitorenable
...
[00] 04/30/19 T(9113) _VMR 09:01:13.826110 DEBUG libkrest.c[6561]:
kriSubmitaddMS:hmc->(129.41.127.3),
[00] 04/30/19 T(9113) _VMR 09:01:14.119470 DEBUG libkrest.c[6712]: kriSubmitaddMS returning 0
with jobid->(1555954664653).
[00] 04/30/19 T(9113) _VMR 09:01:14.259633 DEBUG libkrest.c[6561]:
kriSubmitaddMS:hmc->(129.41.127.3),
[00] 04/30/19 T(9113) _VMR 09:01:14.395984 DEBUG libkrest.c[6712]: kriSubmitaddMS returning 0
with jobid->(1555954664654).
[00] 04/30/19 T(9113) _VMR 09:01:14.537656 DEBUG libkrest.c[6884]:
kriSubmitaddVM:hmc->(129.41.127.3),
[00] 04/30/19 T(9113) _VMR 09:01:14.670640 DEBUG libkrest.c[7036]: kriSubmitaddVM returning 0
with jobid->(1555954664655).
[00] 04/30/19 T(9113) _VMR 09:01:39.437633 DEBUG libkrest.c[10006]:
kriSubmitstartVM:hmc->(129.41.127.3),
[00] 04/30/19 T(9113) _VMR 09:01:39.612644 DEBUG libkrest.c[10158]: kriSubmitstartVM returning 0
with jobid->(1555954664660).
#
In Example 2-180, we select only krest API requests that are submitted during the discovery
interval and then filter the less relevant of them for our purposes. To shorten the output, we
also remove the beginning fields.
For the first kriSubmitaddMS call (the Server-8408-E8E-SN21282FW host), we identify the
counterpart VIO_HS_ADD_MS pass-through API request at the VIOS level, as shown in
Example 2-181.
Using the vioservice thread PID to TID combination, we identify the health library log trace,
as shown in Example 2-182.
Example 2-183 VIO_HS_ADD_VM request for the vmraix1 VM as received at the VIOS level
[START 20644150 79626541 04/30/19-09:01:14.675 vioservice.c 1.17 306] /usr/ios/sbin/vioservice
lib/libviopass/passthru
[0 20644150 79626541 04/30/19-09:01:14.675 viosvc_res.c 1.26 456] stdin pipe input:[<?xml
version="1.0"?>
<VIO xmlns="http://ausgsa.austin.ibm.com/projects/v/vios/schema/vioHADR2.00" version="2.00"
author="LIBKREST" title="Add VM">
<Request action_str="VIO_HS_ADD_VM" hsTag="0">
<addVM>
<vmList machine_type="8408" model="E8E" serial="212424W">
<vm uuid="61be20f0-09a8-4753-b5e5-564f8a545ed5"/>
</vmList>
</addVM>
</Request>
</VIO>
]
[0 20644150 79626541 04/30/19-09:01:14.714 viosvc_res.c 1.26 464] vio_response.result=[<?xml
version="1.0"?>
<VIO xmlns="http://ausgsa.austin.ibm.com/projects/v/vios/schema/vioHADR2.00"
version="2.00"
title="VIOS ADD_VM"
published="2019-04-30T01:17:00Z"
author="IBM Power Systems VIOS"
>
<Response>
</Response>
</VIO>
]
[END 20644150 79626541 04/30/19-09:01:14.714 vioservice.c 1.17 309] exited with rc=0
Using the vioservice thread PID - TID combination, we identify the health library log trace, as
shown in Example 2-184.
The hsParseVmList entry before vioHsAddVm successful return is rather vague: Found
61be20f0-09a8-4753-b5e5-564f8a545ed5 in addvm document. But, the PostgreSQL log entry
around this return moment, as shown in Example 2-185, clarifies it.
Example 2-185 SQL INSERT into the VM table for the vmraix1 VM
# pwd
/home/ios/logs/pg_sspdb
# more pg_sspdb-30-08-40.log
...
2019-04-30 09:01:14.712 CUT|5ccdbfa7.ed017a|LOG: statement: RELEASE
_EXEC_SVP_20122d28;SAVEPOINT _EXEC_SVP_20122d28;INSERT INTO vioshs.vm(vmuuid, name, state)
VALUES('61be20f0-09a8-4753-b5e5-564f8a545ed5', '', 1)
...
So, an entry for the vmraix1 VM is inserted into the vm table as a result of the VIO_HS_ADD_VM
request from KSYS. Note the value 1 for the state field at this insert moment. The vm table
status at the end of this discovery command in shown in Example 2-188 on page 213, where
the state field value is 2 at that moment.
The next action that we investigate is the kriSubmitstartVM call with its counterpart
VIO_HS_START_VM at the VIOS level, as shown in Example 2-186.
Example 2-186 VIO_HS_START_VM request for the vmraix1 VM as received at the VIOS level
[START 22282714 72745423 04/30/19-09:01:39.618 vioservice.c 1.17 306] /usr/ios/sbin/vioservice
lib/libviopass/passthru
[0 22282714 72745423 04/30/19-09:01:39.619 viosvc_res.c 1.26 456] stdin pipe input:[<?xml
version="1.0"?>
<VIO xmlns="http://ausgsa.austin.ibm.com/projects/v/vios/schema/vioHADR2.00" version="2.00"
author="LIBKREST" title="Start VM">
<Request action_str="VIO_HS_START_VM" hsTag="0">
<startVM>
<vmList machine_type="8408" model="E8E" serial="212424W">
<vm uuid="61be20f0-09a8-4753-b5e5-564f8a545ed5"/>
</vmList>
</startVM>
</Request>
</VIO>
]
[0 22282714 72745423 04/30/19-09:01:39.761 viosvc_res.c 1.26 464] vio_response.result=[]
[END 22282714 72745423 04/30/19-09:01:39.761 vioservice.c 1.17 309] exited with rc=0
The sequence of health library records in Example 2-187 reveals that actions are performed
at both the HSDB and HM levels.
A tag with a D7E3D316D0D88C47 hexadecimal value is generated at the health library level for
the StartVM request, and the HSDB is updated with details about this request, including this
tag value. Example 2-188 on page 213 shows the ms, vm, map, and trans tables at the end of
the discovery command.
...
vmStateType
===========
ADDED: The VM has been added via the addVM flow.
STARTING: HM is attempting to start monitoring of the VM in response to a request
from KSYS
STARTED: KSYS has requested that a particular VM is monitored on a managed system
and both HMs on that system have received acknowledgements from the respective VMM.
VIOS asserts need attention when VM moves from starting to started state. KSYS can
choose to fail adding a VM after some reasonable time period if it has not moved to
started state.
..
In our case, the vmraix1 VM changed from the ADDED (1) state to the STARTING (2) state,
where it remained after the completion of the discovery command. To understand why it
remained in STARTING state, we look into the HM logs for traces that are related to the last
entries in Example 2-187 on page 212. The HM log records around the same interval for the
VIOS with the 5c3196a3-c41f-4b5c-9861-dee6077081c6 UUID are shown in Example 2-189.
Example 2-189 UNIX Socket command that is received by an HM to start monitoring on the vmraix1 VM
# uname -uL
IBM,02212424W 3 usaxvib063ccpx1
# lsattr -El vios0 -a vios_uuid
vios_uuid 5c3196a3-c41f-4b5c-9861-dee6077081c6 VIOS Unique Identifier False
# alog -f /home/ios/logs/health/host_monitor.log -o > host_monitor.log.posthamonitorenable
# more host_monitor.log.posthamonitorenable
[3 16122210 91554293 04/30/19-09:01:36.356 Processing.C:Processing::cleanMsgBuffer(const s...:
833] Removing message to VmMon at macAddr 'FFFFFFFFFFFF' from send queue. Reason: Fully sent
[3 16122210 91095433 04/30/19-09:01:39.756 HostMonitor.C:HostMonitor::socketReceiveMessage...:
134] Unix Socket: Received command 0x00000101 (tag 0xD7E3D316D0D88C47)
[3 16122210 91095433 04/30/19-09:01:39.758 HostMonitor.C:HostMonitor::socketReceiveMessage...:
137] Received Valid Request Payload.
[2 16122210 91095433 04/30/19-09:01:40.798 DatabaseAccess.C:DatabaseAccess::getVmList(vect...:
219] vioHmStartVMs: VMs to Start=1, rc=0
[2 16122210 91095433 04/30/19-09:01:40.798 Processing.C:Processing::controlVm(const string...:
505] HostMonitor is asked to START monitoring VM '61be20f0-09a8-4753-b5e5-564f8a545ed5' (command
from SSP Database)
[3 16122210 91095433 04/30/19-09:01:40.798 VmHealthStatusMgr.C:VmHealthStatusMgr::monitorV...:
183] Start monitoring VM '61be20f0-09a8-4753-b5e5-564f8a545ed5'.
[2 16122210 91095433 04/30/19-09:01:40.798 VmHealthStatusMgr.C:VmHealthStatusMgr::monitorV...:
209] Monitoring won't start for VM '61be20f0-09a8-4753-b5e5-564f8a545ed5' : VmMonitor has not
yet been discovered.
[2 16122210 91095433 04/30/19-09:01:40.798 Processing.C:Processing::controlVm(const string...:
540] VM '61be20f0-09a8-4753-b5e5-564f8a545ed5' has not yet been discovered. Cannot process
action.
[3 16122210 91095433 04/30/19-09:01:40.798 UnixSocket.C:UnixSocket::disconnectClient() :
93] Unix Socket: Closing connnection to client
[3 16122210 91619721 04/30/19-09:02:05.303 ProcessingDbHb.C:ProcessingDbHb::threadMain() :
86] HM: Sending HM heartbeats to health library
host_monitor.log.posthamonitorenable (99%)
Install the fileset and see what happens. As described “HM startup sequence” on page 159,
the running HM instances are already broadcasting continuously Hello messages, waiting for
the agents on the managed VMs to respond. A complete HM to VMM handshake flow is
detailed in “HM-VMM communication establishment”.
Here we present the case where the VMM agent is deployed on a VM enabled for monitoring
and discovered by KSYS after enablement, as described in 2.3.1, “Discovery of a VM that is
enabled for monitoring” on page 203. The focus here is on the handshake protocol between
the VMM and the HMs, but we are also interested in the HSDB updates about the VM status.
The running HMs on the VIOS side are broadcasting repeatedly Hello messages and waiting
for the possible agents on the managed VMs to respond, as described in “HM activation” on
page 150.
In Example 2-190, we filter the output of the procstack command on our test VM, vmraix1, so
that we can easily identify later the thread type for each TID.
Example 2-190 VMM threads for the HM to VMM communication establishment scenario
# uname -uL
IBM,02212424W 5 vmraix1
[vmraix1:root:/var/ksys/log/hmvmmhs:] lssrc -s ksys_vmm
Subsystem Group PID Status
ksys_vmm 10682746 active
# procstack 10682746|egrep "tid|main|threadMain"
---------- tid# 17563989 (pthread ID: 1) ----------
0x10007e9c main(??, ??) + 0xf1c
---------- tid# 19202453 (pthread ID: 1286) ----------
0x100e6110 UnixSocket::threadMain()(??) + 0x50
---------- tid# 20513041 (pthread ID: 1029) ----------
0x102cc564 VmMonitor::threadMain()(??) + 0x3c4
---------- tid# 27918627 (pthread ID: 2828) ----------
0x1007185c HvncpSender::threadMain()(??) + 0x8fc
---------- tid# 23724513 (pthread ID: 2571) ----------
0x10083e24 HvncpReceiver::threadMain()(??) + 0x6a4
---------- tid# 17826247 (pthread ID: 2314) ----------
0x1008ee44 VlanCommunication::threadMain()(??) + 0x44
---------- tid# 17760557 (pthread ID: 2057) ----------
0x1007185c HvncpSender::threadMain()(??) + 0x8fc
---------- tid# 24183173 (pthread ID: 1800) ----------
The initial steps of the VMM daemon startup sequence until the scan for the Virtual Ethernet
adapters are omitted in Example 2-191 because the steps are similar the ones that are
described in “VM Agent activation” on page 164. For the case we present here, the prior host
group discovery created the HA client Virtual Ethernet adapters at the LPAR container level,
as shown in Example 2-175 on page 203. So, the VmNetIntfHandler thread (pTID 772) can
identify its counterpart logical devices at the operating system level, ent1 and ent2, among
the other scanned adapters.
Example 2-191 Discovery of the Virtual Ethernet adapters for HM to VMM communication
# uname -uL; pwd
IBM,02212424W 5 vmraix1
/var/ksys/log
# more ksys_vmm.debug
[START 10682746 1 05/14/19-01:50:26
...
[2 10682746 1 05/14/19-01:50:26 VmUtils.C:vmutils::runSystemCommand(string) : 561]
Command: lsdev -c adapter | grep '^ent'> /var/ksys/log/cmdop_10682746_1 2>&1,
Command output:
ent0 Available Virtual I/O Ethernet Adapter (l-lan)
ent1 Available Virtual I/O Ethernet Adapter (l-lan)
ent2 Available Virtual I/O Ethernet Adapter (l-lan)
, returnCode: 0
[3 10682746 1 05/14/19-01:50:26 HACommon.C:hacommon::runSystemCommand(string) : 117]
Execution of command 'lsdev -c adapter | grep '^ent' | grep -w Defined | awk '{print $1
}' > /var/ksys/log/interfaceList.txt' succeeded (rc=0).
[2 10682746 1 05/14/19-01:50:26 VmNetIntfHandler.C:VmNetIntfHandler::startThread(): 75]
Starting VmNetIntfHandler thread.
[3 10682746 1 05/14/19-01:50:26 VmMonitor.C:VmMonitor::check_and_set_crit_VG_at...: 1636]
get CRITICAL VG value: LC_ALL=C /usr/sbin/lsvg rootvg | /usr/bin/grep "CRITICAL VG:" |
/usr/bin/sed 's/.*CRITICAL VG:[ ]*//g' 2>/dev/null
[3 10682746 772 05/14/19-01:50:26 VmNetIntfHandler.C:VmNetIntfHandler::selectFD() :
125] Found 7 possible interfaces to listen (not filtered).
[3 10682746 772 05/14/19-01:50:26 VmUtils.C:vmutils::getNetIntf(int, list<std::ba...:
253] intfAlias=en0
[3 10682746 772 05/14/19-01:50:26 VmUtils.C:vmutils::getNetIntf(int, list<std::ba...:
254] intfName=ent0
[3 10682746 772 05/14/19-01:50:26 VmUtils.C:vmutils::getNetIntf(int, list<std::ba...:
277] Interface en0 has IP address, discard it.
[3 10682746 772 05/14/19-01:50:26 VmUtils.C:vmutils::getNetIntf(int, list<std::ba...:
253] intfAlias=en1
[3 10682746 772 05/14/19-01:50:26 VmUtils.C:vmutils::getNetIntf(int, list<std::ba...:
254] intfName=ent1
[3 10682746 772 05/14/19-01:50:26 VmUtils.C:vmutils::getNetIntf(int, list<std::ba...:
265] Interface en1 has no IP address, keep it.
The child interfaces, en1 and en2, have no IP configured, so ent1 and ent2 are good
candidates to be the wanted HA client adapters. On the contrary, en0 has an IP configured, so
it is discarded.
The last three records in Example 2-191 on page 216 show how the same thread (pTID 772)
binds to the ent1 interface to listen for expected Hello packets coming from the counterpart
HM on the VIOS side. Though not shown, the same happens for the identified ent2 interface
and the other HM instance on the second VIOS of the host.
In Example 2-192, we collected the Ethernet connectivity details for the HA client adapters of
the considered VM, vmraix1, and also for their counterpart VIOS trunk adapters. These
details are mentioned here to help with easier identification of the communication endpoints
for the handshake protocol we present.
The VLAN ID of the ent1 interface on vmraix1 is 101, so the Hello packets that are expected
on ent1 are sent from the VIOS that is connected to the same VLAN, that is,
usaxvib063ccpx1. The excerpt that is shown in Example 2-193 from the HM log on the
usaxvib063ccpx1 VIOS shows the Hello packets that are broadcast during the time interval of
interest, once every 30 seconds. Clocks on the client VM and VIOSs are synced by the NTP
protocol for easier examination of the sequencing of events.
Each Hello packet contains as payload a specific identifier string that is generated randomly
at the HM level based on the system clock. This string is used by the VMM to identify the
originator HM of the received messages. If the HM restarts for any reason, then a new and
unique identifier is generated for this identification purpose. In the case of an already
established session, the VMM notices the new identity of the counterpart communication
endpoint, drops the current session, and triggers a new protocol handshake sequence.
Example 2-193 Hello broadcast records in the HM log on the VIOS side
# uname -uL
IBM,02212424W 3 usaxvib063ccpx1
# lsattr -El vios0 -a vios_uuid
vios_uuid 5c3196a3-c41f-4b5c-9861-dee6077081c6 VIOS Unique Identifier False
# alog -f /home/ios/logs/health/host_monitor.log -o > host_monitor.log.hmvmmhandshake
# more host_monitor.log.hmvmmhandshake
...
[3 16122210 91554293 05/14/19-01:50:13.351 Processing.C:Processing::sendMessage(bool, stri...:
682] Broadcasting 'Hello' message with identifier '1556303553.811512000.38'
[3 16122210 89719261 05/14/19-01:50:13.351 HvncpSender.C:HvncpSender::threadMain() :
814] Sending Packet to addr: FFFFFFFFFFFF
[3 16122210 89719261 05/14/19-01:50:13.351 HvncpPacket.C:HvncpPacket::display() const :
292] +Packet Frame (23 bytes) 1556303553.811512000.38
[3 16122210 91554293 05/14/19-01:50:13.844 Processing.C:Processing::cleanMsgBuffer(const s...:
833] Removing message to VmMon at macAddr 'FFFFFFFFFFFF' from send queue. Reason:
Fully sent
[3 16122210 91619721 05/14/19-01:50:34.353 ProcessingDbHb.C:ProcessingDbHb::threadMain() :
86] HM: Sending HM heartbeats to health library
[3 16122210 91554293 05/14/19-01:50:43.339 Processing.C:Processing::sendMessage(bool, stri...:
682] Broadcasting 'Hello' message with identifier '1556303553.811512000.38'
[3 16122210 89719261 05/14/19-01:50:43.339 HvncpSender.C:HvncpSender::threadMain() :
814] Sending Packet to addr: FFFFFFFFFFFF
[3 16122210 89719261 05/14/19-01:50:43.339 HvncpPacket.C:HvncpPacket::display() const :
292] +Packet Frame (23 bytes) 1556303553.811512000.38
[3 16122210 91881983 05/14/19-01:50:43.796 HvncpPacket.C:HvncpPacket::display() const :
292] +Packet Frame (84 bytes) uuid=61be20f0-09a8-4753-b5e5-564f8a545ed5:ostype=AI
X:verVmmLower=2.0:verVmmUpper=2.0
[3 16122210 77726013 05/14/19-01:50:43.796 HvncpReceiver.C:HvncpReceiver::threadMain() :
390] Received Packet from macAddr '6A6A0FFDF707'.
[3 16122210 77726013 05/14/19-01:50:43.796 HvncpPacket.C:HvncpPacket::display() const :
292] +Packet Frame (84 bytes) uuid=61be20f0-09a8-4753-b5e5-564f8a545ed5:ostype=AI
X:verVmmLower=2.0:verVmmUpper=2.0
[3 16122210 77726013 05/14/19-01:50:43.796 HvncpReceiver.C:HvncpReceiver::threadMain() :
411] Received SYN packet 1 from macAddr '6A6A0FFDF707'.
host_monitor.log.hmvmmhandshake (99%)
#
Example 2-194 Receiving the Hello packet on the VMM side and sending back the SYN packet
# more ksys_vmm.debug
...
[3 10682746 772 05/14/19-01:50:43 VmNetIntfHandler.C:VmNetIntfHandler::checkMessa...:
387] Ethertype 1536
[3 10682746 772 05/14/19-01:50:43 VmNetIntfHandler.C:VmNetIntfHandler::checkMessa...:
402] Received a 'Hello' packet from macAddr '6A6A0884EE11' with helloIdentifier
'1556303553.811512000.38'.
[3 10682746 772 05/14/19-01:50:43 VmNetIntf.C:VmNetIntf::getIntfName() const :
58] getIntfName()
[2 10682746 772 05/14/19-01:50:43 VmNetIntfHandler.C:VmNetIntfHandler::selectFD() :
300] Interface ent1 : Found a HostMonitor with MacAddr '6A6A0884EE11'
[2 10682746 772 05/14/19-01:50:43 VmNetIntfHandler.C:VmNetIntfHandler::selectFD() :
302] Interface ent1 : Now used by the VmMonitor and won't be listened anymore
[3 10682746 772 05/14/19-01:50:43 VmNetIntf.C:VmNetIntf::~VmNetIntf() :
45] ~VmNetIntf()
[3 10682746 772 05/14/19-01:50:43 VmNetIntf.C:VmNetIntf::~VmNetIntf() :
45] ~VmNetIntf()
[3 10682746 772 05/14/19-01:50:43 VmNetIntfHandler.C:VmNetIntfHandler::threadMain() :
461] New HostMonitor '6A6A0884EE11' is found
[3 10682746 772 05/14/19-01:50:43 VmNetIntfHandler.C:VmNetIntfHandler::threadMain() :
466] Pipe is being establshed between Interface 'ent1' and HostMonitor '6A6A0884EE11'
[3 10682746 772 05/14/19-01:50:43 ethernet.C:configureEthernetSocketAddr(socketHa...:
255] +Socket '12' parameters (SNDD_8022)
[3 10682746 772 05/14/19-01:50:43 ethernet.C:configureEthernetSocketAddr(socketHa...:
256] | family : 23
[3 10682746 772 05/14/19-01:50:43 ethernet.C:configureEthernetSocketAddr(socketHa...:
257] | len : 36
[3 10682746 772 05/14/19-01:50:43 ethernet.C:configureEthernetSocketAddr(socketHa...:
258] | filtertype : 4
[3 10682746 772 05/14/19-01:50:43 ethernet.C:configureEthernetSocketAddr(socketHa...:
259] | ethertype : 1536
[3 10682746 772 05/14/19-01:50:43 ethernet.C:configureEthernetSocketAddr(socketHa...:
260] | filterlen : 12
[3 10682746 772 05/14/19-01:50:43 ethernet.C:configureEthernetSocketAddr(socketHa...:
261] | nddname : ent1
[3 10682746 772 05/14/19-01:50:43 ethernet.C:configureEthernetSocketAddr(socketHa...:
262] +
[3 10682746 772 05/14/19-01:50:43 ethernet.C:configureEthernetSocketAddr(socketHa...:
304] Found interface 'ent1' : macAddr '6A6A0FFDF707'.
[3 10682746 772 05/14/19-01:50:43 VmMonitor.C:VmMonitor::addAdapter(string &, str...:
767] Interface ent1 : successfully connected to VmMonitor.
[2 10682746 1029 05/14/19-01:50:43 VmMonitor.C:VmMonitor::threadMain() :
1289] Connect to the found HostMonitor at macAddr '6A6A0884EE11'.
A socket is configured on the ent1 interface that has the MAC address 6A6A0FFDF707 to
establish a pipe with the originator HM of the received Hello message, and acts at the remote
counterpart VIOS endpoint interface that is identified by the MAC address 6A6A0884EE11. A
network controller starts for this pipe on the VMM side by creating three threads:
HvncpSender, VlanCommunication, and HvncpReceiver with pTIDs 2057, 1543, and 1800. The
pTID values are confirmed by the final log entries in Example 2-194 on page 219 and the
subsequent entries in Example 2-197 on page 222.
This triplet takes over the communication and an initial synchronization SYN packet is sent by
the HvncpSender thread to the HM endpoint at the address 6A6A0884EE11 as a reaction to the
received Hello packet. The payload of this initial SYN packet has 84 bytes and contains
various VMM details, among which is the VM UUID with a value of
61be20f0-09a8-4753-b5e5-564f8a545ed5, as shown in Example 2-194 on page 219.
On the HM side, we have a similar thread triplet setup handling the HM to VMM
communication at the trunk adapter endpoint. In Example 2-195, note this triplet among all
the HM daemon threads: the HvncpSender thread, the HvncpReceiverer thread, and the
VlanCommunication thread, with kernel thread IDs (tid#) 89719261, 77726013, and
91881983.
The SYN packet that is sent to the 6A6A0884EE11 MAC address from the vmraix1 VM MAC
address 6A6A0FFDF707 is received as “SYN packet 1” by the HvncpReceiver thread on the
usaxvib063ccpx1 VIOS HM side, as shown by the first record in Example 2-196.
Example 2-196 Receiving the SYN packet on the HM side and replying with the SYN ACK and SYN packets
# more host_monitor.log.hmvmmhandshake
...
[3 16122210 77726013 05/14/19-01:50:43.796 HvncpReceiver.C:HvncpReceiver::threadMain() :
411] Received SYN packet 1 from macAddr '6A6A0FFDF707'.
[3 16122210 77726013 05/14/19-01:50:43.796 HvncpReceiver.C:HvncpReceiver::threadMain() :
444] SYN begin, invalidate the previous communcaiton.On the sender. mac: 6A6A0FFDF707.
[3 16122210 77726013 05/14/19-01:50:43.796 Processing.C:Processing::notify(string, HvncpMe...:
384] Received a SYN payload
'uuid=61be20f0-09a8-4753-b5e5-564f8a545ed5:ostype=AIX:verVmmLower=2.0:verVmmUpper=2.0'
[2 16122210 77726013 05/14/19-01:50:43.796 Processing.C:Processing::notify(string, HvncpMe...:
398] VmMonitor '6A6A0FFDF707' connection> HostMonitor is now able to receive messages
...
[2 16122210 77726013 05/14/19-01:50:43.797 Processing.C:Processing::notify(string, HvncpMe...:
425] +
[2 16122210 89719261 05/14/19-01:50:43.797 HvncpSender.C:HvncpSender::revokeSendSyn(const ...:
401] Delete any communication pending to 6A6A0FFDF707 (except discovery)
[3 16122210 89719261 05/14/19-01:50:43.797 HvncpSender.C:HvncpSender::sendSyn(const string...:
523] Send SYN packet to 6A6A0FFDF707
[3 16122210 89719261 05/14/19-01:50:43.797 HvncpSender.C:HvncpSender::threadMain() :
726] Send SYN acknowledgment 1 to 6A6A0FFDF707
[3 16122210 89719261 05/14/19-01:50:43.797 HvncpSender.C:HvncpSender::threadMain() :
814] Sending Packet to addr: 6A6A0FFDF707
[3 16122210 89719261 05/14/19-01:50:43.797 HvncpPacket.C:HvncpPacket::display() const :
292] +Packet Frame (58 bytes) uuid=5c3196a3-c41f-4b5c-9861-dee6077081c6:verHmRunning=2.0
[3 16122210 89719261 05/14/19-01:50:43.797 HvncpSender.C:HvncpSender::threadMain() :
814] Sending Packet to addr: 6A6A0FFDF707
[3 16122210 89719261 05/14/19-01:50:43.797 HvncpPacket.C:HvncpPacket::display() const :
292] +Packet Frame (0 bytes)
[3 16122210 91881983 05/14/19-01:50:43.798 HvncpPacket.C:HvncpPacket::display() const :
292] +Packet Frame (0 bytes)
host_monitor.log.hmvmmhandshake (99%)
Example 2-197 Receiving SYN ACK and SYN packets on the VMM side and replying with SYN ACK
# more ksys_vmm.debug
...
[3 10682746 2057 05/14/19-01:50:43 HvncpSender.C:HvncpSender::threadMain() :
814] Sending Packet to addr: 6A6A0884EE11
[3 10682746 2057 05/14/19-01:50:43 HvncpPacket.C:HvncpPacket::display() const :
294] +Packet Frame (84 bytes)
uuid=61be20f0-09a8-4753-b5e5-564f8a545ed5:ostype=AIX:verVmmLower=2.0:verVmmUpper=2.0
[3 10682746 1543 05/14/19-01:50:43 HvncpPacket.C:HvncpPacket::display() const :
294] +Packet Frame (58 bytes) uuid=5c3196a3-c41f-4b5c-9861-dee6077081c6:verHmRunning=2.0
[3 10682746 1543 05/14/19-01:50:43 HvncpPacket.C:HvncpPacket::display() const :
294] +Packet Frame (0 bytes)
[3 10682746 1800 05/14/19-01:50:43 HvncpReceiver.C:HvncpReceiver::threadMain() :
394] Received Packet from macAddr '6A6A0884EE11'.
[3 10682746 1800 05/14/19-01:50:43 HvncpPacket.C:HvncpPacket::display() const :
294] +Packet Frame (58 bytes) uuid=5c3196a3-c41f-4b5c-9861-dee6077081c6:verHmRunning=2.0
[3 10682746 1800 05/14/19-01:50:43 HvncpReceiver.C:HvncpReceiver::threadMain() :
415] Received SYN packet 9 from macAddr '6A6A0884EE11'.
[3 10682746 1800 05/14/19-01:50:43 VmMonitor.C:VmMonitor::notify(string, HvncpMess...:
1121] Received a SYN payload 'uuid=5c3196a3-c41f-4b5c-9861-dee6077081c6:verHmRunning=2.0'.
[2 10682746 1800 05/14/19-01:50:43 VmMonitor.C:VmMonitor::notify(string, HvncpMess...:
1127] HostMonitor '6A6A0884EE11' connection> VmMonitor is now able to receive messages.
[2 10682746 1800 05/14/19-01:50:43 VmMonitor.C:VmMonitor::notify(string, HvncpMess...:
1141] vmm major version '2' and HM major version '2' matches, ignoring vmm minor version '0' and
HM minor version '0' processing packet
[2 10682746 1800 05/14/19-01:50:43 VmMonitor.C:VmMonitor::notify(string, HvncpMess...:
1155] +Parameters of the HostMonitor being connected:
[2 10682746 1800 05/14/19-01:50:43 VmMonitor.C:VmMonitor::notify(string, HvncpMess...:
1156] | MacAddr : 6A6A0884EE11
[2 10682746 1800 05/14/19-01:50:43 VmMonitor.C:VmMonitor::notify(string, HvncpMess...:
1157] | UUID : 5c3196a3-c41f-4b5c-9861-dee6077081c6
[2 10682746 1800 05/14/19-01:50:43 VmMonitor.C:VmMonitor::notify(string, HvncpMess...:
1158] | HsMon running version : 2.0
[2 10682746 1800 05/14/19-01:50:43 VmMonitor.C:VmMonitor::notify(string, HvncpMess...:
1159] +VmMon <> HsMon new relation status :
[2 10682746 1800 05/14/19-01:50:43 VmMonitor.C:VmMonitor::notify(string, HvncpMess...:
1160] | Version compatibility : YES
[2 10682746 1800 05/14/19-01:50:43 VmMonitor.C:VmMonitor::notify(string, HvncpMess...:
1164] | VmMon running version : 2.0
[2 10682746 1800 05/14/19-01:50:43 VmMonitor.C:VmMonitor::notify(string, HvncpMess...:
1166] +
[3 10682746 1800 05/14/19-01:50:43 HvncpReceiver.C:HvncpReceiver::threadMain() :
394] Received Packet from macAddr '6A6A0884EE11'.
[3 10682746 1800 05/14/19-01:50:43 HvncpPacket.C:HvncpPacket::display() const :
294] +Packet Frame (0 bytes)
[3 10682746 1800 05/14/19-01:50:43 HvncpReceiver.C:HvncpReceiver::threadMain() :
469] Received SYN acknowledgment from macAddr '6A6A0884EE11'.
[3 10682746 1800 05/14/19-01:50:43 HvncpReceiver.C:HvncpReceiver::threadMain() :
470] Notify main thread when comm is fully established.
The VlanCommunication thread (pTID 1543) on the VMM side receives the frames and passes
them to the HvncpReceiver thread (pTID 1800). The communication pipe with the HM is now
opened from the VMM perspective, and a final SYN acknowledgment 9 is sent to the HM as a
reply to the 0-byte payload SYN packet 9 received. At this moment, the VMM expects a
START_HB message to start sending heartbeats to the HM at the other end of the pipe.
In Example 2-198, the HM receives the SYN ACK packet and the communication pipe is now
opened also from this HM perspective, so the HM further checks and updates the VM status
in the HSDB by using a vioHmVmDiscover() call.
Example 2-198 Receiving SYN ACK on the HM side and replying with START_HB
# more host_monitor.log.hmvmmhandshake
...
[3 16122210 91881983 05/14/19-01:50:43.798 HvncpPacket.C:HvncpPacket::display() const :
292] +Packet Frame (0 bytes)
[3 16122210 77726013 05/14/19-01:50:43.798 HvncpReceiver.C:HvncpReceiver::threadMain() :
390] Received Packet from macAddr '6A6A0FFDF707'.
[3 16122210 77726013 05/14/19-01:50:43.798 HvncpPacket.C:HvncpPacket::display() const :
292] +Packet Frame (0 bytes)
[3 16122210 77726013 05/14/19-01:50:43.799 HvncpReceiver.C:HvncpReceiver::threadMain() :
465] Received SYN acknowledgment from macAddr '6A6A0FFDF707'.
[3 16122210 77726013 05/14/19-01:50:43.799 HvncpReceiver.C:HvncpReceiver::threadMain() :
466] Notify main thread when comm is fully established.
[3 16122210 89719261 05/14/19-01:50:43.799 HvncpSender.C:HvncpSender::threadMain() :
631] Communication pipe to 6A6A0FFDF707 is now opened
[3 16122210 89719261 05/14/19-01:50:43.799 HvncpSender.C:HvncpSender::threadMain() :
676] Deleting the packet from commWindow. src: 6A6A0FFDF707.
[2 16122210 77726013 05/14/19-01:50:43.799 Processing.C:Processing::notify(string, HvncpMe...:
268] VmMonitor '6A6A0FFDF707' connection> HostMonitor is now able to send messages
[2 16122210 77726013 05/14/19-01:50:43.799 Processing.C:Processing::notify(string, HvncpMe...:
271] HM Running version: 2.0
VM Running version: 2.0
Example 2-199 Excerpt from HSDB ProgreSQL log for a vioHmVmDiscover callback
2019-05-14 01:50:43.727 CUT|5cda1ef3.16801a2|LOG: duration: 0.018 ms
2019-05-14 01:50:43.810 CUT|5cda1ef3.16801a4|LOG: statement: BEGIN;SET search_path TO vios
2019-05-14 01:50:43.811 CUT|5cda1ef3.16801a4|LOG: statement: BEGIN;SELECT msid FROM vioshs.vios
where viosuuid='5c3196a3-c41f-4b5c-9861-dee6077081c6'
2019-05-14 01:50:43.818 CUT|5cda1ef3.16801a4|LOG: statement: SAVEPOINT _EXEC_SVP_31256208;BEGIN
WORK; LOCK TABLE vioshs.locked IN EXCLUSIVE MODE NOWAIT
2019-05-14 01:50:43.818 CUT|5cda1ef3.16801a4|WARNING: there is already a transaction in
progress
2019-05-14 01:50:43.818 CUT|5cda1ef3.16801a4|LOG: statement: RELEASE
_EXEC_SVP_31256208;SAVEPOINT _EXEC_SVP_31256208;SELECT viosuuid, tid, count, CAST ((EXTRACT(
epoch FROM CLOCK_TIMESTAMP()::timestamp ) - EXTRACT( epoch FROM
vioshs.locked.lockts::timestamp)) AS INT) FROM vioshs.locked
2019-05-14 01:50:43.821 CUT|5cda1ef3.16801a4|LOG: statement: RELEASE
_EXEC_SVP_31256208;SAVEPOINT _EXEC_SVP_31256208;UPDATE vioshs.locked SET viosuuid
='5c3196a3-c41f-4b5c-9861-dee6077081c6', tid=82969031, count=1, lockTs=NOW()
Only the 5cda1ef3.16801a4 ProgreSQL session logged records during that interval, so we are
sure that they were performed by our callback run. The relevant actions that we observe there
are as follows:
1. Retrieve the entry in the vm table where the vmuuid field is equal to the VM UUID that was
passed by the VMM in the initial SYN packet.
2. Update the state field of the row corresponding to the identified VM in the vm table to a
value of 3 (STARTED).
3. Update the vmHbMissed field to NULL for the entry in the map table corresponding to our VM
and involved VIOS (WHERE vmuuid='61be20f0-09a8-4753-b5e5-564f8a545ed5' AND
viosuuid='5c3196a3-c41f-4b5c-9861-dee6077081c6').
VMM reacts to the received START_HB message by setting up its scene for sending the
heartbeat to the HM every 10 seconds. The kernel extension is also informed about this setup
for the case that it must take over. We are also informed that an initial AR flow is going to be
triggered by sending a DBELL frame, but this topic is covered in 2.3.3, “Application Reporting
flow” on page 262, so we skip it for now. The cycle is closed by sending an acknowledgment
(Packet Frame (0 bytes)) back to the HM, which is received and processed in turn by the HM
communication thread triplet, as shown in Example 2-201.
Shortly after receiving the SEND_HB acknowledgment, the HM also receives at 01:50:46.580
(Example 2-201 on page 226) the first VMM heartbeat (Packet Frame (3 bytes) HB) and
updates the HSDB by using a vioHmHbDetected call. Subsequent VMM heartbeats are
periodically received every 10 seconds and logged in an interlaced manner with already
known entries of HM heartbeat updates to the HSDB and Hello broadcasts toward VMMs,
both every 30 seconds, in a kind of a steady state pattern.
We removed records from the log excerpt in Example 2-202 on page 228 to shorten the
output. The listing was still long because the VMM log contains the interlaced records about
messages that are exchanged both with the HM instance listening at the 6A6A0884EE11
address (which we used so far in our analysis) and with the other HM instance listening at the
6A6A01581F11 address on the partner VIOS. The removed records were mainly traces of the
similar handshake sequence that was performed by our VMM with this second HM on the
partner VIOS. The VM heartbeat and Hello broadcast messages that were exchanged with
our HM instance (listening at the 6A6A0884EE11 address) are marked in bold and italic typeset
for easier identification.
Example 2-203 shows the HSDB vm, map, and trans tables in this steady state situation after
successful HM to VMM communication establishment.
Example 2-203 The vm, map, and trans tables in HSDB after successful HM to VMM communication establishment
vios=# select vmuuid,name,msid,state from vm;
vmuuid | name | msid | state
--------------------------------------+------------+----------------------+-------
61be20f0-09a8-4753-b5e5-564f8a545ed5 | | -5878939440565042805 | 3
| default_vm | | 1
(2 rows)
vios=# select * from map;
vmuuid | viosuuid | misshistory |
hmrecovery | viosrecovery | vmhbmissed
--------------------------------------+--------------------------------------+-------------+----
--------+--------------+------------
61be20f0-09a8-4753-b5e5-564f8a545ed5 | 5c3196a3-c41f-4b5c-9861-dee6077081c6 | 0|
0 | 0 |
61be20f0-09a8-4753-b5e5-564f8a545ed5 | 0ab5001f-2c3e-4fbb-97ea-41d3f8c4f898 | 0|
0 | 0 |
(2 rows)
vios=# select * from trans;
vmuuid | viosuuid | msid | tag | opcode | state | data | integer | txstarted
--------+----------+------+-----+--------+-------+------+---------+-----------
(0 rows)
vios=#
Example 2-204 Happy case steady state NeedsAttention request and reply
# ksysmgr trace log=fde | grep -i polling|tail -1
[00] 08/25/19 T(34c) _VMR 18:16:37.891076 DEBUG FDEthread.C[208]: Use VIOS usaxvib063ccpx1:
5C3196A3-C41F-4B5C-9861-DEE6077081C6 for polling
#
--------
# uname -uL
IBM,02212424W 3 usaxvib063ccpx1
# cat na.xml
<VIO xmlns="http://ausgsa.austin.ibm.com/projects/v/vios/schema/vioHADR2.00"
version="2.00" author="LIBKREST" title="Req Needs Attention">
<Request action_str="VIO_HS_NEEDS_ATTENTION" dataType="GLOBAL_DATA"/>
</VIO>
# /usr/ios/sbin/vioservice lib/libviopass/passthru <na.xml
<VIO><Response>
<needsAtt>
<vmStatList>
<vmList machine_type="8408" model="E8E" serial="212424W">
</vmList>
</vmStatList>
</needsAtt>
</Response></VIO>
#
Checking the KSYS requests that are logged on the designated VIOS around the session
establishment moment, we notice two NeedsAttention requests followed by acknowledgment
requests and also a VIO_HS_QUERY_MSG/VIO_HS_GET_MSG messaging sequence, as shown in
Example 2-205.
We use these details to comment on the typical NeedsAttention request in Example 2-204 on
page 230 and on those two acknowledged NeedsAttention requests showing up around the
session establishment moment. Various other Need Attention request cases with their
specific response payloads are covered in 2.3.2, “Failure Detection Engine: QuickQuery and
NeedsAttention flows” on page 234.
So, coming back to the typical happy case NeedsAttention request in Example 2-204 on
page 230, VIOS returns continuously this kind of short payload with no VM status, which
means that no heartbeat was missed and no VM status change happened since the last
acknowledged NeedsAttention request. It happens periodically until there is a further change
for that VM, such as a missed heartbeat, a VM state transition, or an application-related
change.
Example 2-206 and Example 2-207 on page 232 shows occurrences of the last two cases.
The stateChg attribute in Example 2-206 on page 231 is documented in the same reference
file, /usr/lib/vioHADR2.00.xsd:
...
stateChg: This attribute when present indicates that this VM is undergoing a state
change and state field will indicate the current state. When this attribute is
present, the payload may include a tag will need to be acknowledged. that has been
discovered.
...
The shortMsg attribute conveys the initial hardcoded DBELL in Example 2-200 on page 226 as
generated after the first four heartbeats that are sent by VMM after a successful HM to VMM
handshake. The VIO_HS_QUERY_MSG/VIO_HS_GET_MSG messaging sequence in Example 2-205
on page 230 is part of this AR flow. The AR topic is covered in 2.3.3, “Application Reporting
flow” on page 262.
Example 2-208 shows the case status of our VM at the KSYS level.
VM is monitored for its heartbeat, as confirmed by the STARTED value of the HA_monitor_state
VM attribute at the ksysmgr level, which is the same as the HMstate attribute value for the
counterpart RMC resource instance of vmraix1. The Last_response VM attribute at the
ksysmgr level and the HBmissed attribute at the RMC level are both 0, which means that the
heartbeat is received normally. The heartbeat missed case is described in “NeedsAttention
flow” on page 240.
In “Failure Detection Engine” on page 94, we saw that the main job of an FDE thread is to
retrieve the health status of the hosts and VMs from the HSDB. Example 2-60 on page 94
reveals a normal steady state sequence of three polling requests repeated continuously by an
FDE thread: two QQ requests followed by a NA request at 20-second intervals on average.
We now examine an end-to-end QQ request flow. We then repeat the same analysis for an
NA request flow.
QuickQuery flow
A QQ request is initiated by the FDE thread of a host group, so we start our check by looking
at the fde log. Example 2-38 on page 76 shows the trace in the fde log that was left by the
first QQ request at KSYS daemon startup. Comparing that first QQ trace with the trace of our
current QQ in Example 2-209, we notice a similar pattern for the sequence of records.
Example 2-209 Excerpt of a QuickQuery request trace from the fde log
...
[05] 12/22/18 T(9192) _VMR 16:25:08.551636 DEBUG FDEthread.C[709]: Sleep for 20
sec. sleepCounter 37
[05] 12/22/18 T(9192) _VMR 16:25:28.551712 DEBUG FDEthread.C[127]: Monitoring
Enabled for HG rbHG.
[05] 12/22/18 T(9192) _VMR 16:25:28.551752 DEBUG FDEthread.C[145]: CEC is
4462c37c-65c6-3614-b02a-aa09d752c2ee
[05] 12/22/18 T(9192) _VMR 16:25:28.551764 DEBUG FDEthread.C[158]: VIOS
315CE56B-9BA0-46A1-B4BC-DA6108574E7E in CEC
[05] 12/22/18 T(9192) _VMR 16:25:28.551773 DEBUG FDEthread.C[208]: Use VIOS
rt13v2: 315CE56B-9BA0-46A1-B4BC-DA6108574E7E for polling
[05] 12/22/18 T(9192) _VMR 16:25:28.551777 DEBUG FDEthread.C[285]: Current scan [
38 ]
[05] 12/22/18 T(9192) _VMR 16:25:28.551796 DEBUG VMR_VIOS.C[6134]: setCAAtopology
[05] 12/22/18 T(9192) _VMR 16:25:28.551810 DEBUG VMR_VIOS.C[6134]: setCAAtopology
[05] 12/22/18 T(9192) _VMR 16:25:28.551820 DEBUG VMR_VIOS.C[6134]: setCAAtopology
[05] 12/22/18 T(9192) _VMR 16:25:28.551825 DEBUG VMR_VIOS.C[6134]: setCAAtopology
[05] 12/22/18 T(9192) _VMR 16:25:28.551829 DEBUG VMR_HG.C[11664]: FDE performing
doQuickQuery to 315CE56B-9BA0-46A1-B4BC-DA6108574E7E
Note the entry marking the 20-second sleep interval between the previous request
(sleepCounter 37) and current request (Current scan [ 38 ]). By comparison, there is no
sleep interval before the first QQ request (Current scan [ 1 ]), as shown in Example 2-38 on
page 76. The same VIOS, rt13v2, is used for polling in both cases. A doQuickQuery call is
performed for this VIOS, and inside this call is the HMC REST API job request pattern that is
handled at the krest library level by the kriSubmitQuickQuery and krigetJobResult() calls.
This pattern is covered in “QuickQuery asynchronous job example” on page 105. Here, we
focus on the actions at the FDE and VIOS levels and on the content of the request and
response payloads they exchange with each other.
A typical QQ request that originated at the FDE level on the KSYS side and delivered to the
vioservice command at the VIOS level together with its immediate corresponding XML
response that was obtained at the VIOS level from the HSDB are shown in Example 2-210.
Example 2-210 QuickQuery request and response in the vioservice log on the VIOS side
# alog -f /home/ios/logs/viosvc.log -o > viosvc.log.QQ.txt
# more viosvc.log.QQ.txt
...
[START 12058936 62259481 12/22/18-15:20:23.789 vioservice.c 1.18 320]
/usr/ios/sbin/vioservice lib/libviopass/passthru
[0 12058936 62259481 12/22/18-15:20:23.789 viosvc_res.c 1.26 456] stdin pipe
input:[<?xml version="1.0"?>
<VIO xmlns="http://ausgsa.austin.ibm.com/projects/v/vios/schema/vioHADR2.00"
version="2.00" author="LIBKREST" title="Req Quick Query">
<Request action_str="VIO_HS_QUICK_QUERY"/>
</VIO>
]
[0 12058936 62259481 12/22/18-15:20:23.862 viosvc_res.c 1.26 464]
vio_response.result=[<VIO><Response>
<quickQuery>
Comparing the time stamps in the vioservice log excerpt in Example 2-210 on page 235 with
the ones in the viohs.log log excerpt that is shown in Example 2-211, which were taken from
the same vioservice command execution instance (same process and thread IDs), we
discover that the HSDB was queried for the status details by the vioHsQuickQuery call.
Example 2-211 vioHsQuickQuery call that is performed by the vioservice command to query the HSDB
# alog -f /home/ios/logs/health/viohs.log -o > viohs.log.22dec.txt
# grep "12058936 62259481" viohs.log.22dec.txt
[START 12058936 62259481 12/22/18-15:20:23.790 ha_util.c 1.43 230]
[3 12058936 62259481 12/22/18-15:20:23.790 hs_util.c hs_libvio_traces 1.66 60]
HEALTH:62259481 -- violibExchange.c initTransaction 1.27 646 Got
cluster info from vioGetClusterInfoFromODM! cluster name='KSYS_rbRMHA_1'
id=730caed6da2211e8800498be9454b8e0
[3 12058936 62259481 12/22/18-15:20:23.792 hs_util.c hs_libvio_traces 1.66 60]
HEALTH:62259481 -- violibDB.c _allocHandleDB 1.132.12.3 589
HDBC = 200176e8
[3 12058936 62259481 12/22/18-15:20:23.812 hs_util.c hs_libvio_traces 1.66 60]
HEALTH:62259481 -- violibDB.c _allocHandleDB 1.132.12.3 589
HDBC = 200176e8
[3 12058936 62259481 12/22/18-15:20:23.862 hs_atten.c vioHsQuickQuery 1.107 4669]
End, rc=0
[END 12058936 62259481 12/22/18-15:20:23.862 ha_util.c 1.43 253] exited with rc=0
#
The same VIOS response XML, now transferred on the KSYS side by the job request (jobid:
1543584834864), is shown as logged in the fdelong log file (Example 2-212 on page 237).
Both Example 2-210 on page 235 and Example 2-212 show the same QQ request XML
response. The response contains a quickQuery/viosStatList/viosStatus structure of nested
subelements for each host in the host group. Inside the viosStatus element are more
self-closing viosStat subelements, one for each VIOS on the host. We examine the attributes
of the viosStat subelement. Each referred VIOS is identified by a uuid attribute. The rest of
the attributes provide health status details for the referred VIOS as follows:
state State of the VIOS as a node in the CAA cluster underlying the SSP.
Possible values are UP or DOWN.
hmResponsive A value of 1 indicates that the HM daemon on the VIOS is running and
sending heartbeats periodically as expected; otherwise, the value is 0.
hmResponseSec Gives the time elapsed in seconds since the last moment the HM on
the VIOS updated its heartbeat within the HSDB.
hasData A value of 1 indicates that HSDB was updated with some information
about a CAA event that happened on the VIOS node; 0 is used
otherwise.
The values of these attributes in Example 2-212 on page 237 are for the typical happy case
where the status is OK on all monitored VIOSs. Cases with VIOS issues are covered in “CAA
event monitoring” on page 128.
We continue with our typical happy case. The XML response that is received on the KSYS
side is now parsed by the FDE thread in the handleVIOResponse call starting immediately after
the doQuickQuery call return (scan [ 38 ]). Example 2-213 lists the XML parsing-related
records of this handleVIOResponse section, which is also logged in the fdelong log. To shorten
the output, we kept only the records for the first viosStat XML element, which corresponds to
the rt13v2 VIOS.
Nothing abnormal is detected in this happy case example, so the flow finishes and the FDE
thread enters a new 20-second sleep period before starting a new QQ or NA check. This
happy case pattern repeats in a normal steady state operation.
The same QQ flow pattern is followed when case events happen. An abnormal situation
exercise is simulated in “CAA event monitoring” on page 128, where we simulate a VIOS
failure by shutting down a VIOS node. Then, we restart the VIOS node. We describe the way
that generated CAA events are conveyed to the HSD, the FDE, and up the stack to other
KSYS components, such as the event notification subsystem.
Fresh VIOS health state data that is obtained this way feeds continuously into the FDE
thread. The FDE thread decides on the next action, which is to do nothing if all is OK or to
notify the user and decide on host relocation if something abnormal is found, as described in
“Host relocation decision” on page 95.
This section continues with the analysis of some relevant NA request cases. We start with the
case in Example 2-204 on page 230, which has a short reply payload because we used the
simple setup from “HM-VMM communication establishment” on page 215. The test
environment that we used in that section consists of only two hosts in a host group, with one
AIX VM on each host and only one of these two VMs enabled for monitoring. At the end of
that exercise, the VMM was sending heartbeats every 10 seconds to HSDB by using its
underlying HMs, and KSYS was retrieving continuously the VM consolidated status by
sending NA requests every 60 seconds. Example 2-215 shows the NA reply payload.
Example 2-216 Returned payload of the NA request after the migration of the VM
# ksysmgr lpm vm vmraix1
Running LPM on VM(S), this may take few minutes...
LPM has started for vmraix1
LPM has completed for vmraix1 on Target host Server-8408-E8E-SN21282FW
Waiting for rediscovery.
1 out of 1 VMs have been successfully performed LPM
root:vmrksys: /home/root >
# ksysmgr q vm state=manage|egrep "^Name|^Host|^HA_monitor"
Name: vmraix1
Host: Server-8408-E8E-SN21282FW
HA_monitor: enable
Name: vmraix2
Host: Server-8408-E8E-SN21282FW
HA_monitor: disable
root:vmrksys: /home/root >
------------------
From the serial attribute value, we see that the vmList element is now on the other host
where the VM is after the LPM migration.
We enable the second VM for monitoring, run a discovery, migrate the first VM back to its
home host, and recheck the NA reply payload (Example 2-217).
Example 2-217 Reply payload for NA request: Two heartbeat VMs each on distinct hosts
# ksysmgr mod vm vmraix2 ha_monitor=enable
For VM vmraix2 attribute(s) 'ha_monitor' was successfully modified.
root:vmrksys: /home/root >
# ksysmgr discover hg rbHG
Running discovery on Host_group rbHG, this may take few minutes...
...
Discovery has finished for rbHG
2 out of 2 managed VMs have been successfully discovered
root:vmrksys: /home/root >
# ksysmgr lpm vm vmraix1
Running LPM on VM(S), this may take few minutes...
LPM has started for vmraix1
---------
As expected, we get as the response payload a vmList element inside the nested
needsAtt/vmStatList structure for each host with at least one heartbeat VM. The vmList
elements contain nothing, which means that the heartbeats are received for both VMs.
We analyze the case where the heartbeat from one VM is lost on both VMM to HM paths. We
enforce this missed heartbeat event by disabling the HA client adapters for one of the
heartbeat VMs, as shown in Example 2-218.
Example 2-219 VM heartbeat missed time stamps in the HSDB map table
vios=# select vmuuid,viosuuid,vmhbmissed from map where
vmuuid='61be20f0-09a8-4753-b5e5-564f8a545ed5';
vmuuid | viosuuid |
vmhbmissed
--------------------------------------+--------------------------------------+------------------
----------
61be20f0-09a8-4753-b5e5-564f8a545ed5 | 5c3196a3-c41f-4b5c-9861-dee6077081c6 | 2019-08-27
22:09:14.258417
61be20f0-09a8-4753-b5e5-564f8a545ed5 | 0ab5001f-2c3e-4fbb-97ea-41d3f8c4f898 | 2019-08-27
22:09:54.292171
(2 rows)
vios=#
The excerpts in Example 2-220 show the records that are logged by both HMs at the moment
when each ibe notices that the missed heartbeat event happened, which is around 19
seconds after the last received VM heartbeat.
-------------
The effect at the NA request level is evaluated by probing 1 second at a time, as shown in
Example 2-221.
We observe that a new vmStat element shows up inside the vmList for our VM only when
both the first and second HM update their missed heartbeat time stamps in the HSDB.
Example 2-221 on page 245 shows that at 22:09:55, shortly after the missed heartbeat, a
time stamp update was performed by the second HM at 22:09:54.303 (vioHmClientMissed
return). Inside the new vmStat element, a VM subelement is now present with the state and
missedHb attributes having values set as retrieved from the HSDB. The update of the VM
state to the STARTED state is described in “HM-VMM communication establishment” on
page 215. The VM state is now retrieved from the HSDB as it was set then.
We also observe that the missedHb attribute value looks like an elapsed time in seconds value
rather than an integer counter of missed heartbeats value. At 22:09:55, the value is 42,
1 second later is 43, and so on. The /usr/lib/vioHADR2.00.xsd VIOS reference file
documents this attribute as missedHB: integer regarding number of Heartbeats missed by
the VM. We check the health library and HSDB logs around 22:09:55 to get more insight into
how the value of this missedHb attribute is computed.
Example 2-222 shows the entries that are logged in vioshs.log by the NA request that is
issued at 22:09:55, the first one that returns a missedHb value, and the final entries that are
logged by the prior NA request issued a second earlier.
Example 2-222 Health library log around the start of missedHb reporting
# alog -f /home/ios/logs/health/viohs.log -o > viohs.log.NeedsAttn.missedHb
# more viohs.log.NeedsAttn.missedHb
[START 19333444 71762285 08/27/19-22:09:54.695 ha_util.c 1.43 230]
...
[3 19333444 71762285 08/27/19-22:09:54.770 hs_atten.c hsGetHighest 1.108 429] ====>>
61be20f0-09a8-4753-b5e5-564f8a545ed5: count1:40 count2=0
[3 19333444 71762285 08/27/19-22:09:54.770 hs_atten.c hsGetHighest 1.108 514] ====>>
61be20f0-09a8-4753-b5e5-564f8a545ed5: highCnt:40 whichVios=1
[3 19333444 71762285 08/27/19-22:09:54.770 hs_atten.c hs_check_and_segregate_vms 1.108 3727]
>>Missing HB from VIOS:1 cnt=40 noEntry=0
For the prior NA request, Example 2-222 on page 246 shows that one VIOS has a missed
heartbeat counter of 40, but the other VIOS received fresh heartbeats (count1:40 count2=0),
so no heartbeat missed event is returned. But for the NA request that is issued at 22:09:55,
both VIOSs have positive missed heartbeat counters (count1:42 count2=2). So, this time the
health library encounters a full missed heartbeat even reported by both HMs and returns with
the new elements and a positive value of the missedHb attribute for the affected VM
(61be20f0-09a8-4753-b5e5-564f8a545ed5). The value that is returned is selected as the
maximum among those two, which is 42 for our sample here.
To understand how these missed heartbeat counters are computed, we go to the DBN node
and look for database log entries that correspond to one of the hsGetMissCount calls. We look
at the hsGetMissCount call that was issued around 22:09:55.862, and the findings are shown
in Example 2-223.
------------
# uname -uL
IBM,0221282FW 2 usaxvia053ccpx1
# ls -latr /home/ios/logs/pg_sspdb/*.log
...
-rw-r--r-- 1 vpgadmin bin 20971642 Aug 27 21:36 pg_sspdb-27-21-15.log
-rw-r--r-- 1 vpgadmin bin 20971706 Aug 27 21:56 pg_sspdb-27-21-36.log
-rw-r--r-- 1 vpgadmin bin 20971701 Aug 27 22:14 pg_sspdb-27-21-56.log
-rw-r--r-- 1 vpgadmin bin 20971546 Aug 27 22:35 pg_sspdb-27-22-14.log
...
-rw-r--r-- 1 vpgadmin bin 20971710 Aug 28 13:52 pg_sspdb-28-13-31.log
drwxrwxr-x 8 bin bin 4096 Aug 28 14:00 ..
-rw-r--r-- 1 vpgadmin bin 20971545 Aug 28 14:12 pg_sspdb-28-13-52.log
drwx------ 2 vpgadmin bin 4096 Aug 28 14:30 .
-rw-r--r-- 1 vpgadmin bin 17718556 Aug 28 14:30 pg_sspdb-28-14-12.log
# cp /home/ios/logs/pg_sspdb/pg_sspdb-27-21-56.log pg_sspdb-27-21-56.log.missedHb
# more pg_sspdb-27-21-56.log.missedHb
...
2019-08-27 22:09:55.861 CUT|5d65aa33.19e0118|LOG: statement: RELEASE
_EXEC_SVP_20122a88;SAVEPOINT _EXEC_SVP_20122a88;select DISTINCT vmuuid from vioshs.map where
viosuuid='5c3196a3-c41f-4b5c-9861-dee6077081c6' OR
viosuuid='0ab5001f-2c3e-4fbb-97ea-41d3f8c4f898'
2019-08-27 22:09:55.862 CUT|5d65aa33.19e0118|LOG: statement: RELEASE
_EXEC_SVP_20122a88;SAVEPOINT _EXEC_SVP_20122a88;SELECT CAST ((EXTRACT( epoch FROM NOW():
:timestamp) - EXTRACT( epoch FROM vioshs.map.vmHbMissed::timestamp)) AS BIGINT) FROM vioshs.map
WHERE vioshs.map.vmuuid='61be20f0-09a8-4753-b5e5-564f8a545ed5' AND
vioshs.map.viosuuid='5c3196a3-c41f-4b5c-9861-dee6077081c6'
2019-08-27 22:09:55.862 CUT|5d65aa33.19e0118|LOG: statement: RELEASE
_EXEC_SVP_20122a88;SAVEPOINT _EXEC_SVP_20122a88;SELECT CAST ((EXTRACT( epoch FROM NOW():
:timestamp) - EXTRACT( epoch FROM vioshs.map.vmHbMissed::timestamp)) AS BIGINT) FROM vioshs.map
WHERE vioshs.map.vmuuid='61be20f0-09a8-4753-b5e5-564f8a545ed5' AND
vioshs.map.viosuuid='0ab5001f-2c3e-4fbb-97ea-41d3f8c4f898'
pg_sspdb-27-21-56.log.missedHb (74%)
So, the missed heartbeat counters are computed as the difference between the current
moment and the missed heartbeat time stamp that was reported by the HM in the HSDB map
table ((EXTRACT( epoch FROM NOW()::timestamp) - EXTRACT( epoch FROM
vioshs.map.vmHbMissed::timestamp)) AS BIGINT).
Example 2-224 shows an NA reply payload that is captured by the probing shown in
Example 2-221 on page 245 during an LPM operation that is performed by ksysmgr.
Example 2-224 NA reply payload during a lpm operation that is performed by ksysmgr
# ksysmgr lpm vm vmraix1
Running LPM on VM(S), this may take few minutes...
LPM has started for vmraix1
LPM has completed for vmraix1 on Target host Server-8408-E8E-SN21282FW
Waiting for rediscovery.
1 out of 1 VMs have been successfully performed LPM
root:vmrksys: /home/root >
#
------------------------
# uname -uL
IBM,02212424W 3 usaxvib063ccpx1
# while true; do sleep 2;date; /usr/ios/sbin/vioservice lib/libviopass/passthru <na.xml;done
...
Thu Aug 29 08:06:36 CUT 2019
<VIO><Response>
<needsAtt hsTag="-8403607219422560833">
<vmStatList>
<vmList machine_type="8408" model="E8E" serial="21282FW">
<vmStat uuid="61be20f0-09a8-4753-b5e5-564f8a545ed5"><VM state="STARTED" stateChg='1'
missedHMEntry="10875b47-d737-44f9-a745-554f4df4adf8" >
<discoveryInfo type='MOVED'/>
</VM>
</vmStat>
</vmList>
</vmStatList>
</needsAtt>
<needsAtt hsTag="-8403607219422560833">
<vmStatList>
<vmList machine_type="8408" model="E8E" serial="21282FW">
<vmStat uuid="61be20f0-09a8-4753-b5e5-564f8a545ed5"><VM state="STARTED" stateChg='1' >
</VM>
</vmStat>
</vmList>
</vmStatList>
</needsAtt>
</Response></VIO>
...
^C#
The NA request in Example 2-224 on page 249 leaves a trace in the health library log, as
shown in Example 2-225.
Example 2-225 Health library log entries for an NA request that is issued during a ksysmgr lpm operation
# alog -f /home/ios/logs/health/viohs.log -o > viohs.log.NeedsAttn.LPM
# more viohs.log.NeedsAttn.LPM
[START 17498526 83362155 08/29/19-08:06:36.346 ha_util.c 1.43 230]
[3 17498526 83362155 08/29/19-08:06:36.347 hs_util.c hs_libvio_traces 1.66 60] HEALTH:83362155
-- violibExchange.c initTransaction 1.27 646 Got cluster info from
vioGetClusterInfoFromODM! cluster name='KSYS_rbRMHA_1' id=1d1bbef6c72d11e98004fa2424246026
[3 17498526 83362155 08/29/19-08:06:36.348 hs_util.c hs_libvio_traces 1.66 60] HEALTH:83362155
-- violibDB.c _allocHandleDB 1.132.12.4 589 HDBC = 20019248
[3 17498526 83362155 08/29/19-08:06:36.364 hs_util.c hs_libvio_traces 1.66 60] HEALTH:83362155
-- violibDB.c _allocHandleDB 1.132.12.4 589 HDBC = 20019248
[3 17498526 83362155 08/29/19-08:06:36.392 hs_atten.c hs_get_all_VM_short_msg 1.108 1245] Got
zero short msg entries from trans!
[3 17498526 83362155 08/29/19-08:06:36.392 hs_atten.c hs_handle_all_VM_short_msg 1.108 1056] No
short msg entries found in trans table!
[3 17498526 83362155 08/29/19-08:06:36.399 hs_atten.c hsGetMissCount 1.108 338]
[1950678c-18bf-4c87-860c-f9e6ec24b513 72dee902-1210-4bd7-a35f-3a6c771c6453] - 0 sec
[3 17498526 83362155 08/29/19-08:06:36.399 hs_atten.c hsGetMissCount 1.108 338]
[1950678c-18bf-4c87-860c-f9e6ec24b513 10875b47-d737-44f9-a745-554f4df4adf8] - 0 sec
[3 17498526 83362155 08/29/19-08:06:36.399 hs_atten.c hsGetHighest 1.108 429] ====>>
1950678c-18bf-4c87-860c-f9e6ec24b513: count1:0 count2=0
[3 17498526 83362155 08/29/19-08:06:36.399 hs_atten.c hsGetHighest 1.108 514] ====>>
1950678c-18bf-4c87-860c-f9e6ec24b513: highCnt:0 whichVios=0
[3 17498526 83362155 08/29/19-08:06:36.400 hs_atten.c hs_check_and_segregate_vms 1.108 3727]
>>Missing HB from VIOS:0 cnt=0 noEntry=0
[3 17498526 83362155 08/29/19-08:06:36.400 hs_atten.c hsGetMissCount 1.108 338]
[61be20f0-09a8-4753-b5e5-564f8a545ed5 72dee902-1210-4bd7-a35f-3a6c771c6453] - 0 sec
[3 17498526 83362155 08/29/19-08:06:36.400 hm_time.c tsDiff 1.39 240] Error: Query SELECT CAST
((EXTRACT( epoch FROM NOW()::timestamp) - EXTRACT( epoch FROM vios
hs.map.vmHbMissed::timestamp)) AS BIGINT) FROM vioshs.map WHERE
vioshs.map.vmuuid='61be20f0-09a8-4753-b5e5-564f8a545ed5' AND
vioshs.map.viosuuid='10875b47-d737-44f9-a745-554f4df4adf8' found no entry in the DB table.
[3 17498526 83362155 08/29/19-08:06:36.400 hs_atten.c hsGetMissCount 1.108 343] Error (rc=-2)
getting miss HB info for
61be20f0-09a8-4753-b5e5-564f8a545ed5-10875b47-d737-44f9-a745-554f4df4adf8
[3 17498526 83362155 08/29/19-08:06:36.400 hs_atten.c hsGetHighest 1.108 419] There is no map
entry with vm 61be20f0-09a8-4753-b5e5-564f8a545ed5, vios 72dee902-1210-4bd7-a35f-3a6c771c6453
rc=-2
[3 17498526 83362155 08/29/19-08:06:36.400 hs_atten.c hsGetHighest 1.108 429] ====>>
61be20f0-09a8-4753-b5e5-564f8a545ed5: count1:0 count2=0
[3 17498526 83362155 08/29/19-08:06:36.400 hs_atten.c hsGetHighest 1.108 514] ====>>
61be20f0-09a8-4753-b5e5-564f8a545ed5: highCnt:9999 whichVios=2
Comparing to the similar trace in Example 2-222 on page 246, we observe that the same
actions are performed inside the vioHsNeedsAttention with more actions in the following
order:
1. Look for short message entries in trans table.
2. Check and report the VM missed heartbeats.
3. Look for VM state changes and report the state changes and short messages.
4. Look for VM discoveries.
To conclude our findings so far, we list the attributes that were encountered in the NA reply
payloads for the analyzed cases. These attributes describe either VM elements or adjacent
elements.
missedHb Gives the time elapsed in seconds since the HMs reported missed
heartbeats for the VM. It is returned only if both HMs report positive
values and the maximum value is chosen.
stateChg Indicates that the VM is undergoing a state change. An accompanying
state field indicates the current state. When this attribute is present,
the payload may include a tag that must be acknowledged.
shortMsg A short message that is sent by the VM toward KSYS to notify it that a
configuration or status change happened at the application level of our
VM.
state The VM can be in one of the following states:
ADDED The VM was added by the addVM flow.
STARTING The HMs are attempting to start monitoring of the VM in response
to a request from KSYS.
STARTED The KSYS has enabled the VM to be monitored, and both HMs on
the host where the VM is located received heartbeats from the
respective VMM.
STOPPING HM is attempting to stop monitoring of the VM on a managed
system.
As shown in Example 2-102 on page 139, elements and attributes describing a VIOS heath
state can appear also in NA reply payloads in special cases, like in QQ reply payload. The
information that was stated about the QQ reply payload attributes describing the VIOS heath
state in “QuickQuery flow” on page 234 applies also to this case, so we do not cover it again.
Now, we look at the NA request flow. We expect it to be similar to the steady state QQ request
flow in “QuickQuery flow” on page 234. The NA request flow starts with the request payload
that is sent by the host group FDE thread on the KSYS side to the VIOS side. It continues with
the response payload that is retrieved from HSDB and conveyed back to the FDE thread,
where it is processed and, depending on results, either triggers appropriate actions or notifies
users with appropriate messages. If nothing new or abnormal is detected in the reply payload
about a VM, such as a VM state change or application-level change or a missed heartbeat,
then we have a steady state happy case and the flow finishes without triggering any action.
The FDE thread enters a new 20-second sleep period after which it starts the next QQ
request.
Let us now approach the VM missed heartbeat scenario and present in detail its NA request
flow. Earlier in this section, we described how the missed heartbeat event is reported by the
HMs to the HSDB and how a NA request retrieves the missedHb attribute values for such a
case. Here, our focus is on the actions happening at the KSYS level. The event is enforced in
the same way by disabling the HA client adapters of the VM at the HMC level. Then, through
the probing in Example 2-226, we capture the effect at the ksysmgr level.
Example 2-226 Capturing the effect of the missed heartbeat at the ksysmgr level
hscroot@usaxhmc013ccpf1:~> chhwres -r virtualio -m Server-8408-E8E-SN212424W -o d --rsubtype eth
-p vmraix1 -s 7
hscroot@usaxhmc013ccpf1:~> chhwres -r virtualio -m Server-8408-E8E-SN212424W -o d --rsubtype eth
-p vmraix1 -s 8;date
Thu Aug 29 11:37:41 UTC 2019
hscroot@usaxhmc013ccpf1:~>
To get more insights about the actions at the KSYS level, we chose to check directly the most
recent NA request before 11:41:53 when our probing notified us that the VMFDT threshold was
reached. So, we look in the fdelong trace file for the records of this NA request, as shown in
Example 2-227.
Example 2-227 The fdelong trace records for the NA request with VMFDT threshold passed
# ksysmgr trace log=fdelong > fdelong.log.missedHb
# more fdelong.log.missedHb
[14] 08/29/19 T(34c) _VMR 11:41:15.477873 DEBUG FDEthread.C[709]: Sleep for 20 sec. sleepCounter
16343
[14] 08/29/19 T(34c) _VMR 11:41:35.477911 DEBUG FDEthread.C[127]: Monitoring Enabled for HG
rbHG.
[14] 08/29/19 T(34c) _VMR 11:41:35.477929 DEBUG FDEthread.C[145]: CEC is
33613bcd-7ca1-3558-874a-c1e1d3ceee32
[14] 08/29/19 T(34c) _VMR 11:41:35.477938 DEBUG FDEthread.C[158]: VIOS
72DEE902-1210-4BD7-A35F-3A6C771C6453 in CEC
[14] 08/29/19 T(34c) _VMR 11:41:35.477946 DEBUG FDEthread.C[208]: Use VIOS usaxvib053ccpx1:
72DEE902-1210-4BD7-A35F-3A6C771C6453 for polling
[14] 08/29/19 T(34c) _VMR 11:41:35.477967 DEBUG FDEthread.C[285]: Current scan [ 21522 ]
[14] 08/29/19 T(34c) _VMR 11:41:35.477970 DEBUG VMR_HG.C[11824]: FDE performing doNeedAttn
GLOBAL_DATA to 72DEE902-1210-4BD7-A35F-3A6C771C6453
Notice in Example 2-227 on page 253 the prior 20-second sleep interval, how the VIOS that
is used for polling is assigned, how the NA job request is submitted through the HMC to that
VIOS, and how it completed. The reply payload is shown in Example 2-228.
Example 2-228 Reply payload for the NA request as retrieved by the FDE thread
[14] 08/29/19 T(34c) _VMR 11:41:35.671676 DEBUG VMR_HMC.C[6902]: JobOutput
[14] 08/29/19 T(34c) _VMR 11:41:35.671676 DEBUG <VIO><Response>
<needsAtt>
<vmStatList>
<vmList machine_type="8408" model="E8E"
serial="21282FW">
</vmList>
</vmStatList>
</needsAtt>
<needsAtt>
<vmStatList>
<vmList machine_type="8408" model="E8E"
serial="212424W">
<vmStat
uuid="61be20f0-09a8-4753-b5e5-564f8a545ed5"><VM state="STARTED" missedHb='232' >
</VM>
</vmStat>
</vmList>
</vmStatList>
</needsAtt>
</Response></VIO>
[14] 08/29/19 T(34c) _VMR 11:41:35.671680 DEBUG VMR_retry.C[345]: In doRetry function, for
opCode = 21(VMDR_NEED_ATTN), rc = 0, retCode is 0, errstr is: ,retry flag is 22
[14] 08/29/19 T(34c) _VMR 11:41:35.671687 DEBUG VMR_HG.C[11832]: FDE doNeedAttn success
GLOBAL_DATA to 72DEE902-1210-4BD7-A35F-3A6C771C6453
[14] 08/29/19 T(34c) _VMR 11:41:35.671690 DEBUG needAttn.C[1484]: START handleVIOResponse scan [
21522 ].
We see how the final processing stage of the reply payload is performed in the
handleVIOResponse call, as shown in Example 2-229 on page 255.
The setHBmissed call inside the handleVIOResponse call evaluates the missed heartbeat
counter against the threshold and further ascertains that the VM is reachable by ping, so the
missed heartbeat is ignored, as logged before the handleVIOResponse call finish (setHBmissed
- Able to ping vmraix1, ignore HBmissed). We ignore the last local data step in this final
NA reply payload processing stage. No other action is performed afterward, and the NA
request flow is finished as marked by the logged record for the subsequent 20-second sleep
interval.
Example 2-230 Final processing stage for the NA requests issued before passing the VMFDT threshold
[14] 08/29/19 T(34c) _VMR 11:37:32.971597 DEBUG VMR_HG.C[11832]: FDE doNeedAttn success
GLOBAL_DATA to 72DEE902-1210-4BD7-A35F-3A6C771C6453
[14] 08/29/19 T(34c) _VMR 11:37:32.971600 DEBUG needAttn.C[1484]: START handleVIOResponse scan [
21510 ].
[14] 08/29/19 T(34c) _VMR 11:37:32.980301 DEBUG needAttn.C[1521]: FINISH handleVIOResponse.
[14] 08/29/19 T(34c) _VMR 11:37:32.980326 DEBUG VMR_HG.C[11848]: FDE handleVIOResponse success
GLOBAL_DATA
...
[14] 08/29/19 T(34c) _VMR 11:38:33.639717 DEBUG VMR_HG.C[11832]: FDE doNeedAttn success
GLOBAL_DATA to 72DEE902-1210-4BD7-A35F-3A6C771C6453
[14] 08/29/19 T(34c) _VMR 11:38:33.639719 DEBUG needAttn.C[1484]: START handleVIOResponse scan [
21513 ].
[14] 08/29/19 T(34c) _VMR 11:38:33.639826 DEBUG needAttn.C[341]: Working on VM vmraix1:
61BE20F0-09A8-4753-B5E5-564F8A545ED5
[14] 08/29/19 T(34c) _VMR 11:38:33.639832 DEBUG VMR_LPAR.C[16228]: setHBmissed 50 for vmraix1:
61BE20F0-09A8-4753-B5E5-564F8A545ED5 vlanid 0 notAvail 0 scan [ 21513 ]
[14] 08/29/19 T(34c) _VMR 11:38:33.642482 DEBUG VMR_LPAR.C[16309]: setHBmissed HBmissed(50) <
VM Monitoring interval(190) No Action
[14] 08/29/19 T(34c) _VMR 11:38:33.647120 DEBUG needAttn.C[1521]: FINISH handleVIOResponse.
[14] 08/29/19 T(34c) _VMR 11:38:33.647141 DEBUG VMR_HG.C[11848]: FDE handleVIOResponse success
GLOBAL_DATA
...
[14] 08/29/19 T(34c) _VMR 11:39:34.324399 DEBUG VMR_HG.C[11832]: FDE doNeedAttn success
GLOBAL_DATA to 72DEE902-1210-4BD7-A35F-3A6C771C6453
[14] 08/29/19 T(34c) _VMR 11:39:34.324401 DEBUG needAttn.C[1484]: START handleVIOResponse scan [
21516 ].
[14] 08/29/19 T(34c) _VMR 11:39:34.324538 DEBUG needAttn.C[341]: Working on VM vmraix1:
61BE20F0-09A8-4753-B5E5-564F8A545ED5
[14] 08/29/19 T(34c) _VMR 11:39:34.324544 DEBUG VMR_LPAR.C[16228]: setHBmissed 111 for vmraix1:
61BE20F0-09A8-4753-B5E5-564F8A545ED5 vlanid 0 notAvail 0 scan [ 21516 ]
[14] 08/29/19 T(34c) _VMR 11:39:34.327346 DEBUG VMR_LPAR.C[16309]: setHBmissed HBmissed(111) <
VM Monitoring interval(190) No Action
[14] 08/29/19 T(34c) _VMR 11:39:34.332367 DEBUG needAttn.C[1521]: FINISH handleVIOResponse.
[14] 08/29/19 T(34c) _VMR 11:39:34.332387 DEBUG VMR_HG.C[11848]: FDE handleVIOResponse success
GLOBAL_DATA
...
[14] 08/29/19 T(34c) _VMR 11:40:35.002247 DEBUG VMR_HG.C[11832]: FDE doNeedAttn success
GLOBAL_DATA to 72DEE902-1210-4BD7-A35F-3A6C771C6453
[14] 08/29/19 T(34c) _VMR 11:40:35.002249 DEBUG needAttn.C[1484]: START handleVIOResponse scan [
21519 ].
[14] 08/29/19 T(34c) _VMR 11:40:35.002356 DEBUG needAttn.C[341]: Working on VM vmraix1:
61BE20F0-09A8-4753-B5E5-564F8A545ED5
[14] 08/29/19 T(34c) _VMR 11:40:35.002362 DEBUG VMR_LPAR.C[16228]: setHBmissed 171 for vmraix1:
61BE20F0-09A8-4753-B5E5-564F8A545ED5 vlanid 0 notAvail 0 scan [ 21519 ]
[14] 08/29/19 T(34c) _VMR 11:40:35.005031 DEBUG VMR_LPAR.C[16309]: setHBmissed HBmissed(171) <
VM Monitoring interval(190) No Action
[14] 08/29/19 T(34c) _VMR 11:40:35.009624 DEBUG needAttn.C[1521]: FINISH handleVIOResponse.
[14] 08/29/19 T(34c) _VMR 11:40:35.009645 DEBUG VMR_HG.C[11848]: FDE handleVIOResponse success
GLOBAL_DATA
For completeness, we also simulated the case when ping is not working so that VM relocation
is decided, as shown in Example 2-231.
Example 2-231 Final processing stage for the NA request triggering the relocation
hscroot@usaxhmc013ccpf1:~> lshwres -r virtualio --rsubtype eth -m Server-8408-E8E-SN212424W
--level lpar -F lpar_name,slot_num,port_vlan_id,vswitch,mac_addr --filter lpar_names=vmraix1
vmraix1,7,101,rbRMHA_VSWITCH,6A6A0FFDF707
vmraix1,8,102,rbRMHA_VSWITCH,6A6A0FFDF708
vmraix1,32,2030,ETHERNET0,FA643EBA1D20
hscroot@usaxhmc013ccpf1:~> chhwres -r virtualio -m Server-8408-E8E-SN212424W -o d --rsubtype eth
-p vmraix1 -s 32
hscroot@usaxhmc013ccpf1:~> chhwres -r virtualio -m Server-8408-E8E-SN212424W -o d --rsubtype eth
-p vmraix1 -s 7
hscroot@usaxhmc013ccpf1:~> chhwres -r virtualio -m Server-8408-E8E-SN212424W -o d --rsubtype eth
-p vmraix1 -s 8;date
Thu Aug 29 14:27:08 UTC 2019
hscroot@usaxhmc013ccpf1:~>
-----
We further detail the flow for the NA request case when the shortMsg attribute in the reply
payload is set. Such an NA request flow shows up as part of the AR flow, as described in
2.3.3, “Application Reporting flow” on page 262.
Similar to the QQ request flow, we start our analysis by checking the start records in the fde
log, as shown in Example 2-232. From the time stamp and the sleepCounter value of 38 in
the first record, you see it is the immediate subsequent FDE polling request that is logged
after the QQ request that we examined in Example 2-213 on page 238.
Example 2-232 Excerpt of a NeedsAttention request trace from the fdelong log
# ksysmgr trace log=fdelong > fdelong.log.NAshortMsg
# more fdelong.log.NAshortMsg
[05] 12/22/18 T(9192) _VMR 16:25:29.913186 DEBUG FDEthread.C[709]: Sleep for 20
sec. sleepCounter 38
[05] 12/22/18 T(9192) _VMR 16:25:49.913226 DEBUG FDEthread.C[127]: Monitoring
Enabled for HG rbHG.
[05] 12/22/18 T(9192) _VMR 16:25:49.913280 DEBUG FDEthread.C[145]: CEC is
4462c37c-65c6-3614-b02a-aa09d752c2ee
[05] 12/22/18 T(9192) _VMR 16:25:49.913292 DEBUG FDEthread.C[158]: VIOS
315CE56B-9BA0-46A1-B4BC-DA6108574E7E in CEC
[05] 12/22/18 T(9192) _VMR 16:25:49.913304 DEBUG FDEthread.C[208]: Use VIOS
rt13v2: 315CE56B-9BA0-46A1-B4BC-DA6108574E7E for polling
[05] 12/22/18 T(9192) _VMR 16:25:49.913307 DEBUG FDEthread.C[285]: Current scan [
39 ]
[05] 12/22/18 T(9192) _VMR 16:25:49.913311 DEBUG VMR_HG.C[11584]: FDE performing
doNeedAttn GLOBAL_DATA to 315CE56B-9BA0-46A1-B4BC-DA6108574E7E
[05] 12/22/18 T(9192) _VMR 16:25:49.913315 DEBUG VMR_retry.C[1151]: Doing
operation with opCode: 21(VMDR_NEED_ATTN)
[05] 12/22/18 T(9192) _VMR 16:25:49.913336 DEBUG VMR_retry.C[178]: INFO: Trying
with HMC: rthmc3.
[05] 12/22/18 T(9192) _VMR 16:25:49.913344 DEBUG VMR_HMC.C[6793]: getNeedAttn:
Calling kriSubmitNeedAttn!. HMC:9.3.18.159, viosUuid:
315CE56B-9BA0-46A1-B4BC-DA6108574E7E
[05] 12/22/18 T(9192) _VMR 16:25:50.034284 DEBUG VMR_HMC.C[6813]: getNeedAttn: Job
submitted. Now doing WaitTillJobCompletion() ..
[05] 12/22/18 T(9192) _VMR 16:25:50.034287 DEBUG VMR_HMC.C[3426]: Calling
krigetJobResult(). HMC: 9.3.18.159, jobid: 1543584834865, retCnt = 1
[05] 12/22/18 T(9192) _VMR 16:25:51.123733 DEBUG VMR_HMC.C[3426]: Calling
krigetJobResult(). HMC: 9.3.18.159, jobid: 1543584834865, retCnt = 2
[05] 12/22/18 T(9192) _VMR 16:25:51.220926 DEBUG VMR_HMC.C[6827]: getNeedAttn
[315CE56B-9BA0-46A1-B4BC-DA6108574E7E] JobStatus: COMPLETED_OK, ReturnCode: 0
[05] 12/22/18 T(9192) _VMR 16:25:51.220935 DEBUG VMR_retry.C[345]: In doRetry
function, for opCode = 21(VMDR_NEED_ATTN), rc = 0, retCode is 0, errstr is: ,retry
flag is 22
[05] 12/22/18 T(9192) _VMR 16:25:51.220945 DEBUG VMR_HG.C[11592]: FDE doNeedAttn
success GLOBAL_DATA to 315CE56B-9BA0-46A1-B4BC-DA6108574E7E
Inside the doNeedAttn call, we recognize a krest HMC REST API job request pattern, which
is handled by kriSubmitNeedAttn and krigetJobResult() calls. This pattern is similar to the
case covered in “QuickQuery asynchronous job example” on page 105, so we do not need to
get into details. The NA request, as it is delivered at the VIOS level, together with its
corresponding response, obtained from the HSDB, are shown in Example 2-233.
Example 2-234 The vioHsNeedsAttention call performed by the vioservice command to query the
HSDB
# alog -f /home/ios/logs/health/viohs.log -o > viohs.log.NAshortMsg.txt
# grep "12058942 62259487" viohs.log.NAshortMsg.txt
[START 12058942 62259487 12/22/18-15:20:45.146 ha_util.c 1.43 230]
[3 12058942 62259487 12/22/18-15:20:45.146 hs_util.c hs_libvio_traces 1.66 60]
HEALTH:62259487 -- violibExchange.c initTransaction 1.27 646 Got
cluster info from vioGetClusterInfoFromODM! cluster name='KSYS_rbRMHA_1'
id=730caed6da2211e8800498be9454b8e0
[3 12058942 62259487 12/22/18-15:20:45.148 hs_util.c hs_libvio_traces 1.66 60]
HEALTH:62259487 -- violibDB.c _allocHandleDB 1.132.12.3 589
HDBC = 200176e8
[3 12058942 62259487 12/22/18-15:20:45.168 hs_util.c hs_libvio_traces 1.66 60]
HEALTH:62259487 -- violibDB.c _allocHandleDB 1.132.12.3 589
HDBC = 200176e8
...
[3 12058942 62259487 12/22/18-15:20:45.240 hs_atten.c vioHsNeedsAttention 1.107
4587] End, rc=0
[END 12058942 62259487 12/22/18-15:20:45.240 ha_util.c 1.43 253] exited with rc=0
#
The VIOS response payload that is retrieved on the KSYS side by the job request is then
parsed by the FDE thread. The handleVIOResponse call section in Example 2-235 contains
the XML parsing-related records that are logged in the fde log for our NA polling request
(scan [ 39 ]). The shortMsg="0x28" attribute for the rt13001 VM makes the FDE thread add
a SHORTMSG task for that VM.
We handle the continuation by using the SHORTMSG task, as described in 2.3.3, “Application
Reporting flow” on page 262. Here, we notice that immediately after the return from the
handleVIOResponse call that an acknowledgment job is submitted and completes successfully
for the tag that passed the NA request under the hsTag attribute.
We review the application-related KSYS functions that are available at the time of writing. As
shown in Example 2-236, the ksysmgr man page documents that a query command is
available on the KSYS side to check the application status and that the application
configuration can be done only by the ksysvmmgr command at the VM level.
Example 2-236 The application status check command available from the KSYS side
# man ksysmgr
...
* To display the health status of the registered
applications:
An application with the critical attribute set to yes makes the KSYS subsystem react even if
the repeated application restart cycle at the VM level fails and the application reaches the
FAILURE state, as documented by the ksysvmmgr man page shown in Example 2-237.
The KSYS subsystem reacts by notifying the user about the issue and restarting the VM on
the same host. If the application reaches the FAILURE (RED) state again, the local restart
repeats, but no more than a pre-determined counter value. Currently, this counter value is 3
and is hardcoded. If this counter value is reached, the KSYS subsystem restarts the VM on
another host within the host group of the host currently hosting the VM.
We encountered this kind of situation in Example 2-149 on page 183 and Example 2-151 on
page 184. You see in those two examples the AME log entries about an AME Notification
Vector being sent to AR by the doSendNotification2Ar() method.
To implement the VM Recovery Manager HA solution, you must review your current HA
recovery plan and consider how the VM Recovery Manager HA solution can be integrated
into your current environment.
You must meet the following requirements before you can install the VM Recovery Manager
HA solution:
Software requirements
Firmware (FW) requirements
Installation and configuration requirements
Hardware Management Console (HMC) requirements
Host group requirements
Networks requirements
GUI requirements
Logical partition (LPAR) Each host must have one of the following
operating systems:
AIX 6.1 or later
Red Hat Enterprise Linux (RHEL) (Little
Endian) Version 7.4 or later (kernel version
3.10.0-693)
SUSE Linux Enterprise Server (Little Endian)
Version 12.3 or later (kernel version
4.4.126-94.22)
Ubuntu Linux distribution Version 16.04
IBM i Version 7.2 or later
Virtual machine (VM) Agent At the moment, the VM Agent to monitor the VM
and applications can be installed only on the
following operating systems:
AIX Version 6.1 and later
RHEL (Little Endian) Version 7.4 or later
(kernel version 3.10.0-693)
SUSE Linux Enterprise Server (Little Endian)
Version 12.3 or later (kernel version
4.4.126-94.22)
The latest version of the OpenSSL software is also included in the AIX base media.
Chapter 3. Planning and deploying IBM VM Recovery Manager High Availability for IBM Power Systems 267
3.1.4 Host group requirements
The following section describes the host group requirements for the VM Recovery Manager
HA:
The host group can be named with a logical name that can be up to 64 characters long.
A single KSYS LPAR can manage up to four host groups. A host group can have up to 12
hosts.
All the hosts on the host group must be configured for network and storage such that any
VM from any host can be migrated to any other host within the host group.
For each host group, the KSYS subsystem requires two disks for health cluster
management. A disk of at least 10 GB, called the repository disk, is required for health
monitoring of the hosts, and another disk of at least 10 GB, called the HA disk, is required
for health data tracking for each host group. All these disks must be accessible to all the
VIOSs on each of the hosts on the host group, as shown in Figure 3-1.
Figure 3-2 VIO partition properties: Checking the Enable TIme Reference
To migrate an IBM i VM from the source host to the destination host, verify that the
Restricted I/O Partition check box for the IBM i LPAR is selected in the HMC. For more
information about the steps to verify the restricted I/O mode, see IBM Knowledge Center.
Ensure that the automatic reboot attribute is not set for any VM in the HMC. The KSYS
validates this attribute and notifies you to disable this attribute. If you set this attribute, it
can lead to unpredictable results, such as a VM restart on two hosts simultaneously.
When you add a host or manage a VM that is co-managed by the HMC and PowerVM
NovaLink, set the HMC to the master mode. Otherwise, the discovery operation fails, and
the VMs on the host are not monitored for HA.
Chapter 3. Planning and deploying IBM VM Recovery Manager High Availability for IBM Power Systems 269
The same virtual LAN (VLAN) must be configured across the hosts.
Ensure that redundant connections are established from the KSYS to HMC and from HMC
to VIOS LPARs, as shown in Figure 3-3. Any connectivity issues between KSYS, HMC,
and VIOS LPARs can lead to disruption in the regular data collection activity and DR
operations.
The VM Recovery Manager HA solution can coexist with other cluster technologies, such as
Oracle RAC and Veritas Cluster Server.
Chapter 3. Planning and deploying IBM VM Recovery Manager High Availability for IBM Power Systems 271
3.3 VM Recovery Manager HA restrictions
This section describes some restrictions of the VM Recovery Manager HA solution to take
into account.
The following sections describe the key components of the VM Recovery Manager HA
solution.
KSYS
KSYS is base product software that must be installed on an AIX LPAR. It provides the
technical foundation of VM Recovery Manager HA and command line-based administration
by using the ksysmgr command.
VM Agent
The VM Agent is an optional fileset that can be installed on the guest VMs that are running
either AIX or Linux operating systems. If you install the VM Agent, you can monitor the
individual VMs and the applications that are running in the VM. Otherwise, only host level
monitoring is supported.
AIX VM Agent fileset:
ksys.vmmon.rte
RHEL VM Agent package:
vmagent-1.3.0-1.0.el7.ppc64le
SUSE Linux Enterprise Server package:
vmagent-1.3.0-1.0.suse123.ppc64le
GUI
An optional fileset that can be installed on an AIX LPAR for accessing the VM Recovery
Manager HA solution by using a GUI. You can install the server on the KSYS LPAR also.
ksys.ui.agent: The GUI agent fileset that must be installed on the KSYS nodes.
ksys.ui.server: A GUI server fileset that must be installed on the system that manages
the KSYS nodes. This fileset can be installed on one of the KSYS nodes.
ksys.ui.common: A GUI common fileset that must be installed along with both the
ksys.ui.server (GUI server) fileset and the ksys.ui.agent (GUI agent) fileset.
Chapter 3. Planning and deploying IBM VM Recovery Manager High Availability for IBM Power Systems 273
3.4 Installing the VM Recovery Manager HA solution
The VM Recovery Manager HA solution provides HA management for IBM Power Systems
servers with PowerVM virtualization. After you plan the implementation of the VM Recovery
Manager HA solution, you can install the VM Recovery Manager HA software. The VM
Recovery Manager HA solution uses other subsystems such as HMC and VIOSs that must
exist in your production environment.
The VM Recovery Manager HA software can be enabled when you have the following
subsystems in your production environment:
VIOS Version 3.1.0.1 or later must be installed on all VIOS partitions that are part of the
host group. Host Monitor (HM), which is a key component of the VM Recovery Manager
HA solution, is installed on the VIOS by default. The HM component is enabled and used
when you install the VM Recovery Manager HA filesets. The VM Recovery Manager HA
solution requires two VIOS per host.
HMC V9 R 9.1.0 or later must be used to manage all hosts that are part of the cluster.
You can optionally install the VM Agents in the VMs that run AIX or Linux operating systems
to monitor the health of an individual VM and applications that run in the VMs. You can also
install the GUI server for the VM Recovery Manager HA solution to use the GUI by using a
browser.
Example 3-1 Listing the VIOS interim fix directory in the VM Recovery Manager HA package
# pwd
/mnt/GDR_BUILDS/1844B_VMR
# ls -la
total 16
drwxr-xr-x 5 root system 256 Nov 04 01:31 .
drwxrwxrwx 143 root system 8192 Nov 04 01:32 ..
drwxr-xr-x 3 root system 256 Nov 04 01:31 installp
drwxr-xr-x 3 root system 256 Nov 04 01:31 usr
drwxr-xr-x 2 root system 256 Nov 04 01:31 vios_3.1.0.10_ifixes
Important: Install the interim fix before you initialize the KSYS subsystem.
If you already have a shared pool that is configured in your environment, ensure that any
cluster services are not active. Stop any active cluster services by running the following
command:
clstartstop -stop -n clustername -m hostname
Run the command in each of the managed VIOS instances, as shown in Example 3-2.
*******************************************************************************
EFIX MANAGER PREVIEW START
*******************************************************************************
+-----------------------------------------------------------------------------+
Efix Manager Initialization
Chapter 3. Planning and deploying IBM VM Recovery Manager High Availability for IBM Power Systems 275
+-----------------------------------------------------------------------------+
Initializing log /var/adm/ras/emgr.log ...
+-----------------------------------------------------------------------------+
Processing Efix Package 1 of 1.
+-----------------------------------------------------------------------------+
Efix package file is: /tmp/fix/IJ10896m2a.181102.epkg.Z
MD5 generating command is /usr/bin/csum
MD5 checksum is 961dcf33ab5bcbd2d8b0adefcbc57f10
Accessing efix metadata ...
Processing efix label "IJ10896m2a" ...
Verifying efix control file ...
Example 3-3 shows the continuation of the installation of the VIOS interim fix.
+-----------------------------------------------------------------------------+
Operation Summary
+-----------------------------------------------------------------------------+
Log file is /var/adm/ras/emgr.log
ATTENTION: system reboot will be required by the actual (not preview) operation.
Please see the "Reboot Processing" sections in the output above or in the
/var/adm/ras/emgr.log file.
Example 3-4 shows the summary of the installation of the VIOS interim fix.
Note: The installation of the VIOS interim fix was cropped among these three examples.
Verify whether the installation of the interim fix is successful by running lssw, as shown in
Example 3-5.
STATE codes:
S = STABLE
If the cluster services were stopped, start the cluster services by running the following
command:
clstartstop -start -n clustername -m hostname
3. Verify whether the installation of filesets is successful by running the command that is
shown in Example 3-7.
Chapter 3. Planning and deploying IBM VM Recovery Manager High Availability for IBM Power Systems 277
ksys.hautils.rte 1.3.0.0 COMMITTED Base Server Runtime
ksys.main.cmds 1.3.0.0 COMMITTED Base Server Runtime
ksys.main.msg.en_US.cmds 1.3.0.0 COMMITTED Base Server Runtime
ksys.main.rte 1.3.0.0 COMMITTED Base Server Runtime
Path: /etc/objrepos
ksys.hautils.rte 1.3.0.0 COMMITTED Base Server Runtime
ksys.main.cmds 1.3.0.0 COMMITTED Base Server Runtime
ksys.main.rte 1.3.0.0 COMMITTED Base Server Runtime
5. After the successful installation of the KSYS filesets, check whether the class IDs are
reserved by running the command that is shown in Example 3-9.
6. If the IBM.VMR_APP class is not available in the output, manually add the IBM.VMR_APP 522
entry into the /usr/sbin/rsct/cfg/ct_class_ids file and refresh the Reliable Scalable
Cluster Technology (RSCT) subsystem by running the command that is shown in
Example 3-10.
3. To verify whether the installation of VM Agent is successful, run the lslpp command
shown in Example 3-12.
Path: /etc/objrepos
ksys.vmmon.rte 1.3.0.0 COMMITTED Base Server Runtime
Chapter 3. Planning and deploying IBM VM Recovery Manager High Availability for IBM Power Systems 279
4. Ensure that the ksysvmmgr command that is shown in Example 3-13 and the binary file for
the VM Agent daemon shown Example 3-14 both completed successfully.
5. To verify whether the VM Agent daemon is enabled, run the lssrc -s ksys_vmm command.
The status of the ksys_vmm subsystem must be Active in the output of this command, as
shown in Example 3-15.
For more information about configuring the repository to easily install those packages, see
IBM Knowledge Center.
2. Install the VM Agent RPM packages based on the following Linux distributions in the VM:
– In RHEL (Little Endian) VMs, run the command that is shown in Example 3-17.
Ensure that the Resource Monitoring and Control (RMC) connection between the VMs
and HMC exists. If the firewall is enabled on the RHEL VM, the RMC connection might
be broken. Modify the firewall on the VMs to allow the RMC connection with the HMC.
– In SUSE Linux Enterprise Server (Little Endian) VMs, run the command that is shown
in Example 3-18.
You can use the ksysmgr command or the VM Recovery Manager HA GUI (see Chapter 4,
“IBM VM Recovery Manager High Availability GUI deployment” on page 323) to interact with
the KSYS daemon to manage the entire environment for HA.
The VM Recovery Manager HA solution monitors the hosts and the VMs when you add
information about your environment to the KSYS configuration settings. Complete the
following steps to set up the KSYS subsystem:
1. Initialize the KSYS cluster.
2. Add HMCs.
3. Add hosts.
4. Create host groups.
5. Configure VMs.
6. Configure VIOS.
7. Set contacts for event notification.
Chapter 3. Planning and deploying IBM VM Recovery Manager High Availability for IBM Power Systems 281
8. Enable the HA monitoring.
9. Discover and verify the KSYS configuration.
10.Optional: Back. up the configuration data.
[<ksysclustername>]
type=<HA>
ksysnodes=<ksysnode>
[sync=<yes|no>]
To find the syntax and aliases of every action or parameter that is accepted by the ksysmgr
command, use the -h option, as shown in Example 3-19.
Example 3-19 Finding the syntax and aliases of every action or parameter that is accepted by ksysmgr
# ksysmgr -h
No command arguments found
Here is a list of available actions for ksysmgr:
Available action
add
delete
discover
help
manage
unmanage
modify
move
query
recover
restore
restart
resync
report
cleanup
sync
pair
verify
monitor
lpm
To create and initialize a KSYS cluster, complete the following steps in each of the KSYS
LPARs:
1. Configure a cluster and add the KSYS node to the cluster by running the command that is
shown in Example 3-20.
Example 3-20 Creating a cluster and adding a KYSYS node to the cluster
# ksysmgr add ksyscluster ITSO_HA ksysnodes=ksys7005rbdr type=HA
Adding node to current cluster configuration
Ksyscluster has been created, please run: "ksysmgr verify ksyscluster
<ksysclustername>"
2. Verify the KSYS cluster configuration by running the command that is shown in
Example 3-21.
3. Deploy the KSYS cluster by running the command that is shown in Example 3-22.
4. You can perform steps 1 - 3 by running the command that is shown in Example 3-23.
5. Verify that the KSYS cluster is created successfully by running one of the commands that
are shown in Example 3-24.
# lsrpdomain
Name OpState RSCTActiveVersion MixedVersions TSPort GSPort
Chapter 3. Planning and deploying IBM VM Recovery Manager High Availability for IBM Power Systems 283
ITSO_HA Online 3.2.4.0 No 12347 12348
# lssrc -s IBM.VMR
Subsystem Group PID Status
IBM.VMR rsct_rm 11731258 active
Important: The output message must display the state of the KSYS cluster as Online.
Note: The HMC user, whose user name and password details are provided to the KSYS,
must have at least hmcsuperadmin privileges and remote access. The KSYS subsystem
uses the Representational State Transfer (REST) API to communicate with the HMCs in
the environment. Therefore, ensure that your environment allows HTTPS communication
between the KSYS and HMC subsystems.
To add the HMCs to the KSYS configuration setting, complete the following steps in the KSYS
LPAR:
1. Add the HMC with user name hscroot and password xyz123 by running the command that
is shown in Example 3-25.
Name: rthmc3
Ip: 9.3.18.159
Login: hscroot
To add hosts to the KSYS configuration, complete the following steps in the KSYS LPAR:
1. Add the managed hosts rt11-8286-42A-0607585 and rt12-8286-42A-2100E5W to the KSYS
by running the command that is shown in Example 3-27.
Tip: If the host is connected to more than one HMC, you must specify the universally
unique identifier (UUID) of the host.
2. Repeat step 1 for all hosts that you want to add to the KSYS subsystem.
3. Verify the hosts that you added by running the command that is shown in Example 3-28.
Chapter 3. Planning and deploying IBM VM Recovery Manager High Availability for IBM Power Systems 285
Host_group: No host_group defined
VIOS: rt12v2
rt12v1
HMCs: rthmc6
HA_monitor: enable
VM_failure_detection_speed: normal
Name: rt11-8286-42A-0607585
UUID: d04d8a4a-99fa-3e4b-80bc-c9d9716bd8f8
FspIp: Must run discovery first to populate
Host_group: No host_group defined
VIOS: rt11v2
rt11v1
HMCs: rthmc3
HA_monitor: enable
VM_failure_detection_speed: normal
The KSYS subsystem creates a health monitoring Shared Storage Pool (SSP) cluster across
the VIOSs that are part of the host group. The health cluster monitors the health of all VIOSs
across the cluster and retains the health data that is available to the KSYS subsystem by
using a VIOS on the host group. The SSP cluster is used only by the KSYS. You must not use
this SSP cluster for any other purpose. You can continue to use virtual Small Computer
System Interface (vSCSI) or N_Port ID Virtualization (NPIV) modes of the cluster. However, if
an SSP cluster exists in your environment, the KSYS subsystem does not deploy any new
SSP clusters, but instead uses the existing SSP cluster for health management. However, if
an existing SSP cluster is used, the KSYS subsystem might not support VIOS management.
The KSYS subsystem requires two disks to create the health monitoring SSP cluster across
the VIOSs on the host group. A disk of at least 10 GB is required to monitor health of the
hosts, which called as repository disk, and another disk of at least 10 GB is required to track
health data, which is called a HA disk, for each host group. These disks must be accessible to
all the managed VIOSs on each of the hosts on the host group. You must specify the disk
details when you create the host group or before you run the first discovery operation. You
cannot modify the disks after the discovery operation is run successfully. If you want to modify
the disks, you must delete the host group and re-create the host group with the disk details.
To create host group in the KSYS subsystem, complete the following steps in the KSYS
LPAR:
1. Identify all VIOSs that are managed by KSYS, as shown in Example 3-29.
Name: rt11v1
UUID: 5F48ABC5-8188-4C70-8F9F-84403FB29DC3
Host: rt11-8286-42A-0607585
Version: VIOS 3.1.0.00
State: MANAGED
HM_versions: Unknown
Name: rt11v2
UUID: 55F05794-AF34-45E2-83DF-4DE40A7D6B7E
Host: rt11-8286-42A-0607585
Version: VIOS 3.1.0.00
State: MANAGED
HM_versions: Unknown
2. To identify all the available shared disks by VIOS so that you can designate the repository
disk and the HA disk for the SSP cluster, run the commands that are shown in
Example 3-30.
These are the shared free disks which appear on all VIOS in the list provided:
DiskNames are as they appear on VIOS rt12v2
It is possible also identify all available shared disks by host, as shown in Example 3-31.
Chapter 3. Planning and deploying IBM VM Recovery Manager High Availability for IBM Power Systems 287
These are the shared free disks which appear on all VIOS in the list provided:
DiskNames are as they appear on VIOS rt12v2
3. Create a host group and add the hosts and disks (by using ViodiskID information) that you
want in this host group by running the command that is shown in Example 3-32.
4. Repeat step 1 on page 286 for all host groups that you want to create in the KSYS
subsystem.
5. Verify the host groups that you created by running the command shown in Example 3-33.
You must enable HA monitoring for the KSYS subsystem to start monitoring the environment.
To check the System-Wide Persistent Attributes, run the ksysmgr query system command, as
shown in Example 3-34.
It is possible to check whether the HA monitor is enabled, as shown in the Example 3-36.
Example 3-36 Listing System-Wide Persistent Attributes after the HA monitor is enabled
# ksysmgr query system
System-Wide Persistent Attributes
auto_discovery_time: 00:00 hours
notification_level: low
dup_event_processing: yes
ha_monitor: enable
host_failure_detection_time: 90 seconds
vm_failure_detection_speed: normal
For example, when you add a host or when you run the LPM operation from one host to
another host that is outside of the current KSYS subsystem, the KSYS configuration settings
are updated in the next discovery operation.
By default, the KSYS subsystem automatically rediscovers sites once every 24 hours at
midnight. You can change this period by modifying the auto_discover_time system attribute.
After the KSYS subsystem discovers the resources, a verification is required to ensure that
the VMs can be restarted on another host without any errors during a failover operation. The
first discovery operation can take a few minutes because the SSP health cluster is deployed
during the first discovery operation.
To discover and verify the configuration for a specific host group, complete the following steps:
1. Discover the resources by running the following command:
ksysmgr discover host_group hg_name
2. Verify the resources by running the following command:
ksysmgr verify host_group hg_name
Important: You must run the discovery and verification commands each time you modify
the resources in the KSYS subsystem
Chapter 3. Planning and deploying IBM VM Recovery Manager High Availability for IBM Power Systems 289
To perform both the discovery and verification operations, run the command that is shown in
Example 3-37.
Chapter 3. Planning and deploying IBM VM Recovery Manager High Availability for IBM Power Systems 291
rt12004 verification has completed
ERROR: Verify has encountered an error for VM rt11007
ERROR: Verify has encountered an error for VM rt11006
Verification has finished for ITSO_HG
13 out of 15 VMs have been successfully verified
Unverified VMs:
rt11007
rt11006
Example 3-37 on page 290 shows that the VMs rt11006 and rt11007 have no RMC
connection problems.
Configuring VIOS
When you add hosts to the KSYS subsystem, all the VIOSs in the hosts are also added to the
KSYS subsystem. The VM Recovery Manager HA solution monitors the hosts and VMs by
using VIOSs on the host.
The VM Recovery Manager HA solution requires at least two VIOSs per host. You can have a
maximum of 24 VIOSs across different hosts in a single host group. If a host has more than
two VIOSs, you can exclude specific VIOS partitions from the HA management.
To exclude specific VIOS partitions from HA management, run the following command:
ksysmgr unmanage vios viosname
In Example 3-38, the VMs rt11006, rt11007, rt11008, rt1009, and rt11010 were unmanaged.
To list all VMs and query all managed and unmanaged VMs, run the ksysmgr query vm
command, as shown in Example 3-39.
Chapter 3. Planning and deploying IBM VM Recovery Manager High Availability for IBM Power Systems 293
rt11001
rt12004
rt12003
rt12002
rt12001
rt11004
rt11003
Unmanaged VMs:
rt11009
rt11007
rt11010
rt11008
rt11006
After you change the host group ITSO_HG, perform discovery and verification, as shown in
Example 3-40.
After installing the VM Agent for HA monitoring, you start the daemon agent by running the
ksysvmmgr start command in each VM.
Example 3-41 on page 295 shows how to start VM monitor (VMM) on AIX VMs.
Now that VMM is enabled on the VMs, you must enable HA monitoring on KSYS on the VM
level for each VM by running the following command:
ksysmgr modify vm vm1[,vm2,...] ha_monitor=enable
In Example 3-44, the VMs rt11001, rt11002, rt11003, rt11004, rt11005, rt12001, rt12002,
rt12003, rt12004, and rt12005 were changed to enable the ha_monitor.
Chapter 3. Planning and deploying IBM VM Recovery Manager High Availability for IBM Power Systems 295
For VM rt11002 attribute(s) 'ha_monitor' was successfully modified.
For VM rt12005 attribute(s) 'ha_monitor' was successfully modified.
For VM rt11005 attribute(s) 'ha_monitor' was successfully modified.
For VM rt11001 attribute(s) 'ha_monitor' was successfully modified.
For VM rt12004 attribute(s) 'ha_monitor' was successfully modified.
For VM rt12003 attribute(s) 'ha_monitor' was successfully modified.
For VM rt12002 attribute(s) 'ha_monitor' was successfully modified.
For VM rt12001 attribute(s) 'ha_monitor' was successfully modified.
For VM rt11004 attribute(s) 'ha_monitor' was successfully modified.
For VM rt11003 attribute(s) 'ha_monitor' was successfully modified.
This session describes the VM Recovery Manager HA solution options that you can
customize.
Note: You must run the discovery and verification command after you set any policy.
You can enable HA monitoring for VMs only after you install the VM Agent on each VM and
start the VM Agent successfully. If you do not set up the VM Agent, the KSYS subsystem
might return error messages for HA monitoring at the VM level.
Example 3-46 shows the ksysmgr query system command that checks the HA monitoring for
the KSYS.
Chapter 3. Planning and deploying IBM VM Recovery Manager High Availability for IBM Power Systems 297
To check HA monitoring for the host group, run the ksysmgr query host_group command that
is shown in Example 3-47.
To check the HA monitoring for the VM rt11001, run the ksysmgr query vm command that is
shown in Example 3-48.
Restart policy
The restart policy notifies the KSYS subsystem to restart the VMs automatically during a
failure. This attribute can have the following values:
auto If you set this attribute to auto, the KSYS subsystem automatically
restarts the VMs on the destination hosts. The KSYS subsystem
identifies the most suitable host based on free CPUs, memory, and
other specified policies. In this case, the KSYS subsystem also notifies
the registered contacts about the host or VM failure and the restart
operations. This is the default value of the restart_policy attribute.
For example, to check the restart policy of the host group ITSO_HG, we run the command
ksysmgr query host_group, as shown in Example 3-49.
To set the host failure detection time, run the following command:
ksysmgr modify system|host_group name host_failure_detection_time=time_in_seconds
To check the KSYS host failure detection time, we run the command ksysmgr query system,
as shown in Example 3-50.
Chapter 3. Planning and deploying IBM VM Recovery Manager High Availability for IBM Power Systems 299
host_failure_detection_time: 90 seconds
vm_failure_detection_speed: normal
To check the host group failure detection time, we use the ksysmgr query host_group
command, as shown in Example 3-51.
Example 3-51 Checking the host group host failure detection time
# ksysmgr query host_group
Name: ITSO_HG
Hosts: rt12-8286-42A-2100E5W
rt11-8286-42A-0607585
Memory_capacity: Priority Based Settings
high:100
medium:100
low:100
CPU_capacity: Priority Based Settings
high:100
medium:100
low:100
Skip_power_on: No
HA_monitor: enable
Restart_policy: auto
VM_failure_detection_speed: normal
Host_failure_detection_time: 90
For example, you can define the following flexible capacity values at the host group level:
100% CPU and 100% memory for high priority VMs; 70% CPU and 80% memory for medium
priority VMs; and 60% CPU and 75% memory for low priority VMs. When a medium priority
VM is migrated from its home host to another host on the host group, its capacity is adjusted
to 70% CPU and 80% memory. If the VM is restored back to its home host, the VM is restored
with 100% resources.
The flexible capacity policy does not consider I/O slots, adapters, and resources that are
available in the hosts. You must ensure that all the I/O virtualization requirements of the VMs
are met within the host group environment. Also, the flexible capacity policy is applicable only
to VM relocation that is based on restart operations. LPM operations do not follow the flexible
capacity policy.
Example 3-52 on page 301 shows the default settings of the flexible capacity policy for the
host group ITSO_HG.
Example 3-54 shows the host group ITSO_HG after changing its flexible capacity policies.
Example 3-54 Checking the flexible capacity policies after changing them for ITSO_HG
# ksysmgr query host_group
Name: ITSO_HG
Hosts: rt12-8286-42A-2100E5W
rt11-8286-42A-0607585
Memory_capacity: Priority Based Settings
high:100
medium:60
low:50
CPU_capacity: Priority Based Settings
high:100
medium:70
low:50
Chapter 3. Planning and deploying IBM VM Recovery Manager High Availability for IBM Power Systems 301
Affinity policies
An affinity policy specifies affinity rules for a set of VMs that defines how the VMs must be
placed within a host group during a relocation. The following affinity policies are supported if
all VMs in an affinity group have the same priority.
Collocation
A collocation policy indicates that the set of VMs must always be placed on the same host
after relocation, as shown in Figure 3-5.
Example 3-55 shows the collocation policy policy_1 for the VMs rt12001, r12002, and 12003.
To list the collocation policies that are available, run the command ksysmgr query
collocation, as shown in Example 3-56.
Example 3-58 shows the anticollocation policy policy_2 with the VMs rt12001 and rt12002.
To list the anticollocation policies that are available, run the command ksysmgr query
anticollocation, as shown in Example 3-59.
Chapter 3. Planning and deploying IBM VM Recovery Manager High Availability for IBM Power Systems 303
Workgroup
A workgroup policy indicates that the set of VMs must be prioritized based on the assigned
priority.
Example 3-61shows the creation of a workgroup policy with the VMs rt11001, rt11002, and
rt11003.
To list the workgroup policies that are available, run the ksysmgr query workgroup command,
as shown in Example 3-62.
Note: When you set affinity policies, ensure that the host group has sufficient capacity for
the policies to be implemented during host failure, VM failure, or application failure. For
example, if the host group contains only two hosts, you cannot set an anticollocation policy
on VMs of a specific host because the host group does not contain multiple target hosts to
restart the VMs.
Host blacklist
The host blacklist policy specifies the list of hosts that must not be used for relocating a
specific VM during a failover operation. For a VM, you can add hosts within the host group to
the blacklist based on performance and licensing preferences, as shown in the Figure 3-7.
Example 3-64 shows that VM rt11004 do not have the blacklist policy.
Chapter 3. Planning and deploying IBM VM Recovery Manager High Availability for IBM Power Systems 305
Example 3-65 shows that host rt12-8286-42A-2100E5W in VM rt1004 was set as the host
blacklist.
Note: The warning appeared only because this test environment is composed only of two
hosts on the host group.
To list the blacklist host that is configured on VM rt11004, run the command ksysmgr query
vm rt11004, as shown in Example 3-66.
Example 3-67 shows how to delete the blacklist policy from VM rt11004.
Failover priority
The failover priority policy specifies the order of processing multiple VMs restart operations.
For example, if a host fails and all the VMs must be relocated to other hosts on the host group,
the priority of the VM determines which VM is processed first. The supported values for this
attribute are High, Medium, or Low. You can set this attribute at the VM level only. You must
specify the UUID of the VM if you have two or more VMs with the same name. By default, all
VMs on the host group have a priority of Medium.
Example 3-70 shows that all VMs on the host rt12-8286-42A-2100E5W were set to High
priority.
Example 3-70 Setting all VMs on the host rt12-8286-42A-2100E5W to High priority
# ksysmgr modify vm ALL host=rt12-8286-42A-2100E5W priority=High
For VM rt12005 attribute(s) 'host', 'priority' was successfully modified.
For VM rt12004 attribute(s) 'host', 'priority' was successfully modified.
For VM rt12003 attribute(s) 'host', 'priority' was successfully modified.
For VM rt12002 attribute(s) 'host', 'priority' was successfully modified.
For VM rt12001 attribute(s) 'host', 'priority' was successfully modified.
Home host
The home host policy specifies the home host of the VM. By default, the KSYS subsystem
sets this value initially to the host where the VM was first discovered. You can change the
home host value of a VM even when the VM is running on another host. In such case, the
specified home host is used for all future operations. This attribute is useful when you get a
host repaired after failure and you want to restart the VMs in its home host.
To check that the VM rt11001 is with the home host rt12-8286-42A-2100E5W, run the ksysmgr
query vm command, as shown in Example 3-71.
Chapter 3. Planning and deploying IBM VM Recovery Manager High Availability for IBM Power Systems 307
In Example 3-72, there is a request to change the home host of the VM rt11001 to
rt11-8286-42A-0607585.
In Example 3-73, you check that VM rt11001 has the home host rt11-8286-42A-0607585.
You can install the VM Agent to monitor the VM and applications on the VMs that run only the
following operating systems:
AIX Version 6.1 or later
PowerLinux:
– RHEL (Little Endian) Version 7.4 or later
– SUSE Linux Enterprise Server (Little Endian) Version 12.3 or later
Chapter 3. Planning and deploying IBM VM Recovery Manager High Availability for IBM Power Systems 309
Currently, the HA feature at the VM level or application level is not supported for the IBM i
operating system or other Linux distributions. However, you can enable these VMs for
host-level health management. In addition, you can perform manual restart and LPM
operations on these VMs.
Note: You can configure the VM Agent only by using the ksysvmmgr command. You cannot
configure the VM Agent by using the VM Recovery Manager HA GUI.
After you install the VM Agent filesets successfully, complete the procedures in the following
sections to set up the VMM in each guest VM.
# ksysvmmgr status
Subsystem Group PID Status
ksys_vmm 6488526 active
When you start the VMM, the VMM daemon sends heartbeats to the HM when requested by
the HM so that the KSYS subsystem can monitor the VMs.
Chapter 3. Planning and deploying IBM VM Recovery Manager High Availability for IBM Power Systems 311
OS Specifies that the applications must be started by the operating
system or by a user after the VM restarts on a new host. If the
application is not started by the operation system or a user, the VM
Agent starts the applications.
To register applications in the VMM daemon so that they are monitored for HA, run the
command that is shown in Example 3-77.
To check the attributes of all applications, run the command ksysvmmgr query app, as shown
in Example 3-78.
When you mark an application as critical, if the application fails or stops working correctly, the
VM Agent attempts to restart the application for the number of times that are specified in the
max_restart attribute for the VM. If the application is still not working correctly, the KSYS
subsystem notifies you about the issue and attempts to restart the VM on the same host. If
the application is still not working correctly, that is, if the application status is displayed as RED
when you run the ksysmgr query app command, the KSYS restarts the VM on another host
within the host group based on the specified policies.
Application class
The application class contains the following mandatory attributes:
monitor_script: A mandatory script that is used by the VM Agent to verify application
health. This script is run regularly (based on the monitor_period attribute value) and the
result is checked for the following values:
– 0: Application is working correctly.
– Any value other than 0: Application is not working correctly or failed.
After several successive failures (based on the monitor_failure_threshold attribute
value), the application is declared as failed. Based on the specified policies, the KSYS
subsystem determines whether to restart the VM.
stop_script: A mandatory script that is used by the VM Agent to stop the application if the
application must be restarted.
start_script: A mandatory script that is used by the VM Agent to start the application if
the application must be restarted.
Chapter 3. Planning and deploying IBM VM Recovery Manager High Availability for IBM Power Systems 313
monitor_failure_threshold: Specifies the number of successive failures of the
monitor_script script that is necessary before the VMM restarts the application. A restart
operation is performed by successively calling the stop_script and start_script scripts.
stop_stabilization_time: Specifies the waiting time in seconds to receive a response
from the stop_script script. The default value is 25 seconds, which means that the VMM
waits for 25 seconds to receive a response from the stop_script script, after which the
script is considered as failed.
stop_max_failures: Specifies the number of successive failures of the stop_script script
that is necessary before the VMM considers that it cannot stop the application. The default
value is set to 3.
start_stabilization_time: Specifies the waiting time in seconds to receive a response
from the start_script script. The default value is 25 seconds, which means the VMM
waits for 25 seconds to receive a response from the start_script script, after which the
script is considered as failed.
start_max_failures: Specifies the number of successive failures of the start_script
script that is necessary before the VMM considers that it cannot start the application. The
default value is set to 3.
max_restart: Specifies the number of cycles of successive VM restart operations that
result in a monitoring failure before the daemon pronounces that restarting at VM level is
insufficient. By default, this attribute is set to 3.
status: Specifies the dynamic status of application that is returned by AME. This attribute
cannot be modified.
version: Specifies the application version. This attribute does not have a default value.
critical: Marks the application as critical. The valid values are Yes and No (default). If you
mark an application as critical, the failure of the application is sent to the KSYS subsystem
for further action.
type: Specifies the type of application. By default, the type attribute does not have any
value that indicates general applications. Other supported values are ORACLE, IBM DB2, and
SAPHANA. This attribute is case-sensitive and you must use uppercase characters. For
these types of applications, if you do not specify start, stop, and monitor scripts, internal
scripts of the VMM are used.
instancename: Specifies the instance name for applications. This attribute is applicable
only for applications whose scripts need an instance name as an argument. For example:
– If the application type is ORACLE, the instancename attribute must be specified with the
Oracle instance name.
– If the application type is DB2, the instancename attribute must be specified with the
IBM DB2® instance owner.
– If the application type is SAPHANA, the instancename attribute must be specified with the
SAP HANA instance name.
database: Specifies the database that the applications must use. This attribute is
applicable only for applications whose scripts require database as an argument. For
example:
– If the application type is ORACLE, the database attribute must be specified with the
Oracle system identifier (SID).
– If the application type is DB2, the database attribute is not required.
– If the application type is SAPHANA, the database attribute must be specified with the SAP
HANA database.
Table 3-2 Version support matrix for application types and operating systems
Application type AIX operating system Linux operating system
(RHEL and SUSE Linux
Enterprise Server)
Specific parameters for built-in supported application agents are shown in Table 3-3.
version Taken from application Taken from application Taken from application
Chapter 3. Planning and deploying IBM VM Recovery Manager High Availability for IBM Power Systems 315
Application dependencies
If some applications have dependencies, for example, if you must specify a sequence of
applications to start or stop, specify the dependencies as shown in Figure 3-10.
In Example 3-80, the dependencies type start_sequence and stop_sequence were added to
applications app1, app2, and app3.
Example 3-81 uses the command ksysvmmgr q dependency to list the dependencies.
Example 3-81 Listing dependencies after the start_sequence and stop_sequence dependencies
# ksysvmmgr q dependency
Dependency depuuid=1542239347575681000
dependency_type=start_sequence
dependency_list=app1,app2,app3
strict=YES
In Example 3-82, the dependency parent_child was added between the applications app4
and app5.
To list the dependencies after adding the parent_child dependency, run the command
ksysvmmgr q dependency, as shown in Example 3-83.
Chapter 3. Planning and deploying IBM VM Recovery Manager High Availability for IBM Power Systems 317
Application dependency class
The application dependency class contains the following mandatory attributes:
dependency_type: Specifies the type of dependency between applications. This attribute
can have the following values:
– start_sequence: Specifies the order in which the applications must be started as
mentioned in the dependency_list attribute. The dependency_list attribute must have
more than one application for this dependency type.
– stop_sequence: Specifies the order in which the applications must be stopped as
mentioned in the dependency_list attribute. The dependency_list attribute must have
more than one application for this dependency type.
– parent_child: Specifies the parent-child relationship of the two specified applications
in which one application is the parent and the other is the child. The parent application
must start first and then the child application must start. You must stop the child
application first and then stop the parent application. If the parent application fails, the
child application stops automatically.
dependency_list: Specifies the list of applications that have a dependency among them.
The dependency class also contains the following optional attributes:
– strict: Specifies whether to continue the script or command if the dependency policy
cannot be followed. If the strict attribute is set to Yes, the next application is not started
until the previous application starts and is in the normal state. If the strict attribute is set
to No, the next application is started immediately after the first application is started
regardless of the state of the first application. This attribute is applicable only for the
start_sequence dependency.
You can add the following contact information for a specific user:
Email address
Phone number with the phone carrier email address
Pager email address
Note: The KSYS node must have a public IP address to send the event notifications
successfully.
To register contact details so that you can receive notification from KSYS, run the following
commands in the KSYS LPAR:
To add the email address of a specific user to receive notifications, run the following
command:
ksysmgr add notify user=username contact=email_address
Example 3-84 Adding and listing an email address for a specific user
# ksysmgr add notify user=Francisco contact=francisco.almeida@testamail.com
successfully added user info
You can add multiple email addresses for a specific user. However, you cannot add
multiple email addresses simultaneously. You must run the command multiple times to add
multiple email addresses, as shown in Example 3-85.
User: Francisco
Contact: francisco.almeida@mailtes.com
To add a specific user to receive a System Management Services (SMS) notification, run
the following command:
ksysmgr add notify user=username
contact=10_digit_phone_number@phone_carrier_email_address
Example 3-86 shows how to add a specific user to receive an SMS notification.
You must specify the phone number along with the email address of the phone carrier to
receive an SMS notification. To determine the email address of your phone carrier, contact
the phone service provider.
To add a specific user to receive a pager notification, run the following command:
ksysmgr add notify user=username contact=pager_email_address
Example 3-87 shows how to add a specific user to receive a pager notification.
Chapter 3. Planning and deploying IBM VM Recovery Manager High Availability for IBM Power Systems 319
Contact: 1234567890@tomymail.com
User: Dayana
Contact: 1234567890@skytel.com
Note: You must have root authority to perform any uninstallation tasks.
Example 3-89 Removing the KSYS cluster before removing the KSYS filesets
# ksysmgr remove ksyscluster ITSO_HA
WARNING: This action will remove all configuration and destroy the KSYS setup, its
recommended to create a backup "ksysmgr add snapshot -h"
Do you want to a backup to be created now ? [y|n]
y
Created: /var/ksys/snapshots/oldclust_BASIC_2018-11-27_15:23:46.xml.tar.gz
Successfully created a configuration snapshot:
/var/ksys/snapshots/oldclust_BASIC_2018-11-27_15:23:46.xml.tar.gz
Do you wish to proceed? [y|n]
y
Tip: If you reinstall the filesets and re-create the environment later, consider creating a
snapshot before removing the cluster so that you can restore the snapshot after reinstalling
the filesets.
Chapter 3. Planning and deploying IBM VM Recovery Manager High Availability for IBM Power Systems 321
(0) root @ rt11001: /
# ksysvmmgr status
Subsystem Group PID Status
ksys_vmm inoperative
2. Uninstall the VM Agent filesets from the AIX VM by running the command that is shown in
Example 3-92.
# ksysvmmgr stop
ksys_vmm has been requested to stop.
# ksysvmmgr status
ksys_vmm daemon is currently inoperative.
2. Uninstall the VM Agent package from the Linux VM, as shown in Example 3-94.
Example 4-1 shows how to install the GUI server filesets on the KSYS node.
Example 4-1 Installing the GUI server filesets on the KSYS node
# installp -acFXYd . ksys.ui.server
.
.
.
#######################################################################
#######################################################################
##
## The IBM VMRestart for AIX graphical user interface
## (GUI) server installation is starting. To complete the process,
## you must install additional files when the installation completes.
## These additional files were not included in the server fileset
## because they are licensed under the General Public License (GPL).
## However, you can automatically download the required files by
## running the following script:
##
## /opt/IBM/ksys/ui/server/dist/server/bin/vmruiinst.ksh
##
#######################################################################
#######################################################################
.
.
.
Installation Summary
--------------------
Name Level Part Event Result
-------------------------------------------------------------------------------
ksys.ui.server 1.3.0.1 USR APPLY SUCCESS
ksys.ui.server 1.3.0.1 ROOT APPLY SUCCESS
Example 4-2 Installing open source software packages on LPAR to connect to the internet
# /opt/IBM/ksys/ui/server/dist/server/bin/vmruiinst.ksh
The files that are required for Node.js are used on the server, and are also
installed on each cluster node automatically during the cluster discovery
operation.
IBM does not offer support for these files if the files are used outside the
context of the VM Restart GUI.
https://9.x.x.x:3000/login
After you log in, you can add existing clusters in your environment to the
VM Restart GUI.
If the GUI server LPAR is configured to use an HTTP/HTTPS proxy to access the internet,
run the following command in the GUI server LPAR to specify the proxy information:
/opt/IBM/ksys/ui/server/dist/server/bin/vmruiinst.ksh [-p <HTTP_PROXY>] [-P
<HTTPS_PROXY>]
Tip: You can also specify the proxy information by using the http_proxy environment
variable.
If the GUI server LPAR is not connected to the internet, complete the following steps:
a. Copy the vmruiinst.ksh file from the GUI server LPAR to a system that is running the
AIX operating system and that has internet access.
b. Run the vmruiinst.ksh -d /directory command where /directory is the location
where you want to download the remaining files. For example, /vmruiinst.ksh -d
/tmp/vmrui_rpms.
c. Download the following prerequisite packages for the GUI server:
• info-4.13-3.aix5.3.ppc.rpm
• cpio-2.11-2.aix6.1.ppc.rpm
• readline-6.2-2.aix5.3.ppc.rpm
• libiconv-1.13.1-2.aix5.3.ppc.rpm
• bash-4.2-5.aix5.3.ppc.rpm
• gettext-0.17-6.aix5.3.ppc.rpm
• libgcc-4.9.2-1.aix6.1.ppc.rpm
• libgcc-4.9.2-1.aix7.1.ppc.rpm
• libstdc++-4.9.2-1.aix6.1.ppc.rpm
• libstdc++-4.9.2-1.aix7.1.ppc.rpm
d. Copy the downloaded files to a directory in the GUI server LPAR.
e. In the GUI server LPAR, run the vmruiinst.ksh -i /directory command where
/directory is the location where you copied the downloaded files.
f. After the vmruiinst.ksh command completes, a message displays a URL for the VM
Recovery Manager HA GUI server. Enter the specified URL in a web browser in the
GUI server LPAR and start using the VM Recovery Manager HA GUI.
After logging in to the VM Recovery Manager HA GUI, the window that is shown in
Figure 4-2 opens.
3. In the Host Group Details window, add the host group name, as shown in Figure 4-4.
5. In the Host Selection window, there is a request to select the host that will be part of this
host group. In this case, select host rt12, as shown in Figure 4-6.
6. In the Virtual I/O Server (VIOS) Selection window, select a VIOS from the hosts, as shown
in Figure 4-8.
7. In the Disk Section window, enter the disks to use as the Repository Disk and HA Pool
Disk, as shown in Figure 4-9 on page 331.
8. In the VM Selection window, you are requested to select the virtual machine (VM) that will
be managed on this host group. Select the VM and click Policies to change the
VM-specific policies, as shown in Figure 4-10.
The summary shows details of the host group that is configured, as shown in Figure 4-13
on page 333.
9. Click Submit & Deploy. The deployment of the host group configuration starts, as shown
in Figure 4-14.
10.Click Go in the dashboard to access the management options, as shown in Figure 4-16.
To unregister the KSYS and set the Notifications Preferences, go to the Settings window and
click Notification Preferences, as shown in Figure 4-18.
To check the ITSO_HG host group components and policies, click the ITSO_HG host group, as
shown in Figure 4-20.
To add or remove a host from the host group, click Edit, as shown in Figure 4-21.
Figure 4-23 shows the configuration of the Anti-Collocation policy for the VMs rt11005 and
rt12005.
Clicking Discovery & Verify shows three discovery and verify options: Discovery Host Group,
Verify Host Group, and Discovery and Verify Host Group, as shown in Figure 4-24.
Click Discovery and Verify and all running activities are displayed, as shown in Figure 4-26.
To filter the event logs by Critical, Warning, Maintenance, and Informational event, click
Events, as shown in Figure 4-27.
As shown in Figure 4-20 on page 336 (left side), you can also check the host group, HMC,
and hosts, as shown in Figure 4-29.
Click Policies, and the policies information from host rt11 appear, as shown in Figure 4-31 on
page 340.
Figure 4-32 Checking the VMs that are related to host rt11
Click Policies, and the policies information from host rt12 appear, as shown in Figure 4-34.
You can check all VMs that are related to host rt12, as shown in Figure 4-35.
When you click SSP Cluster for the rt11v1 VIOS, the cluster name and status appear, as
shown in Figure 4-37.
Figure 4-37 Checking the SSP Cluster information from the VIO server
Click Policies on the rt11v1 VIOS, and the information policies appear, as shown in
Figure 4-38.
To check the application that is managed by the VM monitor (VMM) daemon, click
Application, as shown in Figure 4-40.
Select VM rt11001 and click Migrate → Migrate to New Host Using LPM, as shown in
Figure 4-42.
After the LPM operation finishes, the dashboard shows VM rt11001 at host rt12, as shown in
Figure 4-46.
Figure 4-48 Signal that is shown while migrating back to the home host
Select VM rt12001, and then select Restart → Restart & Bring to New Host, as shown in
Figure 4-50.
During the restart activity, a red signal appears, as shown in Figure 4-52.
Now, we test the option to Restart at Home Host, as shown in Figure 4-54.
Figure 4-55 Signal that is shown while migrating back to the home host
After the restart operation finishes, the dashboard shows VM rt12001 at host rt12, as shown
in Figure 4-56.
Select host rt12, and then click Migrate → Migrate to New host using LPM, as shown in
Figure 4-57.
Figure 4-57 Migrating all VMs for one host by using LPM
After the LPM operation finishes, the dashboard shows that all VMs that are managed from
host rt12 are migrated to host rt11, as shown in Figure 4-60.
Figure 4-61 Migrating back all VMs for one host by using LPM
The message restore has started for vm appears, as shown in Figure 4-62.
The HMCs that are used in this environment are listed in Example 5-2.
Name: hmc1
Ip: 129.40.180.24
Login: hscroot
The host group that is used in this scenario is shown in Example 5-3.
On HMC p18vhmc1, VM p18lnx02 changed the reference code, as shown in Example 5-9.
p18lnx02:B200A101 LP=00009
6. Monitor the KSYS to see the VM restart of p18lnx0, as shown in Example 5-11.
6. Monitor the KSYS to see the restart of VM p18lnx02, as shown in Example 5-22.
8. Check the home host of VM p18lnx02 by running the ksysmgr command, as shown in
Example 5-25.
3. Simulate a host crash by shut downing immediately both Virtual I/O Servers (VIOSs) from
host Server-8284-22A-SN10EE85P, as shown in Example 5-28.
5. List the VMs on host Server-8284-22A-SN10EE85P. You see that all partitions that are
monitored from the KSYS from the crashed host Server-8284-22A-SN101AFDR restarted
on host Server-8284-22A-SN10EE85P, as shown in Example 5-31.
6. List the VMs on host Server-8284-22A-SN101AFDR. You see that the referenced VMs
restarted, as shown in Example 5-32.
8. Perform discovery and verification on host group HG_TCHU, as shown in Example 5-34.
After the discovery and verification of host group HG_TCHU, the VMs that are referenced on
host Server-8284-22A-SN101AFDR are cleared, as shown in Example 5-35.
Search for SG248426, select the title, and then click Additional materials to open the directory
that corresponds with the IBM Redbooks form number, SG248426.
The publications that are listed in this section are considered suitable for a more detailed
description of the topics that are covered in this book.
IBM Redbooks
The following IBM Redbooks publications provide more information about the topics in this
document. Some publications that are referenced in this list might be available in softcopy
only.
IBM PowerVM Enhancements What is New in 2013, SG24-8198
A Practical Guide for Resource Monitoring and Control (RMC), SG24-6615
You can search for, view, download, or order these documents and other Redbooks,
Redpapers, web docs, drafts, and additional materials at the following website:
ibm.com/redbooks
Online resources
These websites are also relevant as further information sources:
IBM Knowledge Center: Hardware Management Console (HMC) REST APIs
https://ibm.co/2Xol4Gf
IBM Knowledge Center: Preparing for partition mobility
https://ibm.co/2KrXsdQ
Service and Productivity Tools
https://ibm.co/2x3aPbt
SG24-8426-00
ISBN 0738458104
Printed in U.S.A.
®
ibm.com/redbooks