Aix Various Notes
Aix Various Notes
Aix Various Notes
(802.3ad)
on a VI/O Server as padmin :
Sometimes after a system update (say a technology level upgrade), you dont get the right TL version with
oslevel. For example I tried to update my AIX 6.1 TL5 to TL6, but here it is :
Keyword:Fileset:ReqLevel:InstLevel:Status:Abstract
Weve got our answer ! We need to update the Java SDK in order to have the right oslevel output.
----------------------------------------------------------------------------
Path: /usr/lib/objrepos
Path: /etc/objrepos
Lets update it !
smitty install_latest (yeah I know, smit)
PS : if your NIM master is nicely configured, you can even try to update with nim command on your client :
perform a validation :
perform a migration :
virtual_fc_mappings="5/VIOS/1,6/VIOS3/3"' -v
virtual_fc_mappings:
Comma-separated list of virtual fibre channel adapters. Each item in this list has the format
slot_num/vios_lpar_name/vios_lpar_id. For example, 4/vios2/3 specifies a virtual fibre channel adapter on a
client logical partition with a virtual slot number of 4, a VIOS partition name of vios2, and the ID of the
destination VIOS logical partition of 3
IBM Systems Director : Regenerate Tivguid
root@lpar /root #> cp /etc/ibm/director/twgagent/twgagent.uid
/etc/ibm/director/twgagent/twgagent.uid.ORIG
cas_agent inoperative
Guid:4a.96.a7.78.78.d0.55.e1.b4.a4.66.7c.73.ef.46.07
0513-059 The cas_agent Subsystem has been started. Subsystem PID is 23003600.
machstat -f
0 0 0
The third digit equals to the version (version of what exactly ? epow ?)
As padmin:
rmtcpip -all
mkvdev -sea ent4 -vadapter ent3 -default ent3 -defaultid 999 -attr ha_mode=auto
ctl_chan=ent2
771001801410af03.010506
woops, we are like 2 years late ! we can check that on IBMs fix central, we need to download the latest version.
This will commit all uncommitted updates to the Virtual I/O Server.
updateios -commit
updateios will apply any fix present in the directory specified, where we put the previously downloaded fixes:
Lets go root!
oem_setup_env
Now we need to unconfigure any logical devices attached to the physical adapter that we want to update:
rmdev -l ent6 ent6 Defined # on passe en defined l'interface de type VLAN (car on est en
mode trunk)
rmdev -l ent4 ent4 Defined # on passe en defined l'interface etherchannel (mode 802.3ad)
Now that the physical adapter is freed , we can update its microcode.
====================================================================================
====================================================================================
#shutdown -Fr
This command will boot up a lpar (power it off if its on, so be careful), set up an ip address and will try to ping
another ip (like the NIM master server), to test the network connectivity (thus to know if we can launch an
install/restore on it)
# Connecting to infvio2.
# Connected
# Power on complete.
# Type Location Code MAC Address Full Path Name Ping Result Device Type
ROM Level.(alterable).......010506
> We need to update the Firmware of the 5708 FCoE adapter, which is 2 levels late (last one is 010522):
In order to get LPM to work properly, we need to put the no_reserve attribute to all disks (here we have the
hitachi disk example) :
/usr/DynamicLinkManager/bin/dlmchpdattr -a reserve_policy=no_reserve -s
/usr/DynamicLinkManager/bin/dlmchpdattr -o
uniquetype = disk/fcp/Hitachi
reserve_policy : no_reserve
max_transfer : 0x40000
queue_depth : 8
rw_timeout : 60
dlmpr -k
dlmpr -c
Suppress the reserve on the rootvg disk (Im actually not so sure about this one, given the fact that I had some
problems during the boot of some lpars > boot error code 555, so please be very cautious !) :
dlmpr -c hdisk0
Extra command (free of charge) : Changing the queue_depth parameter on all disks managed by HDLM
/usr/DynamicLinkManager/bin/dlmchpdattr -a queue_depth=8 -s
22 TCP Inbound
If the following command doesnt show you anything, then you either have a problem with the agent youre
trying to make contact with, or the common ports
12
62
URL: service:management-software.IBM:platform-agent://10.240.122.11
URL: service:management-software.IBM:platform-agent://10.246.58.21
URL: service:management-software.IBM:platform-agent://10.246.70.11
URL: service:wbem:http://10.246.70.11:5988
URL: service:wbem:https://10.246.58.21:5989
URL: service:wbem:https://10.246.70.11:5989
URL: service:wbem:https://10.240.122.11:5989
URL: service:wbem:http://10.240.122.11:5988
URL: service:wbem:http://10.246.58.21:5988
A new update is out since a few days for Systems Director (here are the new
things : http://publib.boulder.ibm.com/infocenter/director/pubs/index.jsp?topic=
%2Fcom.ibm.director.main.helps.doc%2Ffqm0_r_whats_new_in_release_631.html ) and the update method is
quite simple !
ATKUPD573I Running compliance for all new updates that were found.
ATKUSC209I The install needed task found updates that need to be installed for system
"ISD_server":
com.ibm.director.core.manager.aix_6.3.1
com.ibm.lwi.activemq.feature_8.1.1.1-LWI
com.ibm.lwi.eclipse.help.feature_8.1.1.1-LWI
ATKUSC210I This operation will install the updates listed above. To continue, type "1" for
yes or "0" for no.
ATKUPD633I The Installation Staging task has finished processing system "ISD_server".
ATKUPD795I You must manually restart the IBM Systems Director management server after
this install completes for the updates to take effect.
The starting process may take a while. Please use smstatus to check if the server is
active.
Error
Starting
Updating
Starting
Active
Unfortunately this doesnt work with AIX 5.3, you should check this out
instead : https://www.ibm.com/developerworks/community/blogs/glukacs/entry/zoning_info_script_to_follow_t
he_vscsi_mapped_luns1?lang=en
NPIV
# echo "vfcs" |kdb
(0)> vfcs
VSCSI
# echo "vfcs" |kdb | grep -E "NAME|vscsi"
The lwilog.sh script dynamically enables or disables the trace utility in the running application.
On ISD Server, try this :
# cd /opt/ibm/director/lwi/bin/
[ERROR|WARNING|INFO|VERBOSE|FINE|FINER|FINEST]
/opt/ibm/director/lwi/logs/error-log.html
example:
On a virtual client, this command changes the profile , by adding to it an adapter with the WWN1 (NPIV) and
the WWN2 (used for LPM).
Get started with the profile
Be careful, when you want to include these commands in a shell script, be sure to double check your
double/triple quotes around the WWNs, it is a real pain in the ass when you write the command through ssh
For some unknown reason, IBM decided to put the same field separator (the comma) for the specifications of
WWNs AND every attribute you can set within the same commandincluding the attribute setting the WWNs
Weird.
MPIO : Nice and easy way to check if all paths are consistent and available
Sometimes on a server which had some fiber channel difficulties, I always check if there are some missing or
failed paths by issuing this command (sometimes the numbers are not equal, and we have to reconfigure
missing paths) :
18 Enabled fscsi0
6 Enabled fscsi1
12 Failed fscsi1
If there are some failed paths, maybe you should try to re-enable them (quick and painless, cant do no harm)
with this one-liner :
paths Changed
paths Changed
paths Changed
paths Changed
paths Changed
paths Changed
paths Changed
paths Changed
paths Changed
paths Changed
paths Changed
paths Changed
18 Enabled fscsi1
You can also delete some paths before re-discovering them if you have some missing or defined paths :
We had at work a situation where we could not perform concurrent live partition migration between our two
p795, however or whatever we tried (multiple HMC/SDMC, multiple MSPs, cross-site migrations), and
opened a PMR at IBM about that.
The IBM guy told me about the max_virtual_slots value (which is set in the partition profile), which could
cause some problems if greater than 1000. A Fix is on the way, according to IBM :
http://www-01.ibm.com/support/docview.wss?uid=isg1IV20409
Indeed, for various internal reasons (numbering adapters ID policy), we had set this value to 2048 or even 4096
on our VIO servers. Big mistake. No concurrent migration could be performed.
Workaround
First of all we need to decrease the max_virtual_slots value to lets say 256 on all our VIOS profiles :
But we have to stop the partition in order to load the profile with the new max_virtual_slots value.
More important, we need to change all the adapter IDs greater than 1000 (we numbered our FC adapters ID
according to the client ID times 100 , which grows rapidly on a p795 a client ID 44 could give a FC adapter
ID of 4400 to allow that , you need to increase the max_virtual_slots value) to smaller values, which is a
pain in the ass, because you have to delete/recreate the virtual FC adapter on the VIO (system and profile), do a
cfgmgr, remap the vfchost to the physical fcs, and modify the profile on the VIO AND the virtual server in
order to get things proper. And pray that your multipathing is fully working. Good luck.
So we came with a different method, using fake virtual adapters and LPM, in order to get smaller device IDs for
our Fibre Channel virtual adapters, and shutdown/restart the VIOS , one by one), without any downtime at all :
To modify the max_virtual_slots, I have to change the id on the client side (e.g. 2101 and 2102 as server
adapter ID), then on the VIOs side.
But without stopping the partition and alter the profile, there is only one solution :
I created fake virtual FC adapters on the VIO Servers on the other frame (target frame), with the same
ID (2101 and 2102), with a partner adapter ID of 99, just to keep an eye on it.
I migrate the virtual server (LPM)
At the arrival on the target frame, there is a cfgmgr and a check : if the adapter ID is not already used, it
keeps the same ID.
If it is actually already used, it takes the first IDs available (lets say 5 and 6 are available)
Migration is complete, my client is now connected to VIO server adapter IDs 5 and 6, instead of 2101
and 2102.
I can now migrate back, with my new tiny IDs (even if on the source side, 5 and 6 are taken, it wont
get back to 2101, it will again be set to the next ID available, like 7, 10, or 23, whatever.)
Now (and only now that the virtual server is gone on the other side) I can change the profile on the target
VIO Servers, and set the max_virtual_slots to a more IBM bug-free compliant value, like 256, instead
of 4096 as it used to be.
I can also delete the fake adapter used to spoof the server adapter ID (preceded with a rmdev on the
vfchost discovered by the cfgmgr executed on the VIOS when the partition migrated)
Last thing I need to do is shutdown the VIO server (one after another, of course) and restart it, loading
the new profile I just modified. With a full working redundancy between my VIO servers, it should not
be a problem.
> I also changed all the max_virtual_slots for the virtual servers , it was set to 255 and I changed that to 32
(default value, shouldnt be higher anyway)
Results
Indeed, it changes everything : We can now have 8 concurrent migrations on each p795 (4 for each
MSP which is the current limitation, Im confident that it will grow up one day), as it was expected to do
in the first place.
It also speeds up the duration of the migration (we had before a 10-15 minutes duration for each
migration, now it is closer to about 2-6 minutes )
I also discovered that event if the source/target VIO server (not MSP) are set with adapters IDs >1024, BUT the
MSPs are with good values (256), concurrent migrations are possible, so the problem of concurrent migration is
only caused by the MSPs max_virtual_slots value (I thought that every VIOs should be lower for this problem,
not only the MSPs).
Besides, I also discovered that with high IDs, on the VIOs side (not MSP), it affects the duration of the
migration, even if we do concurrent migration.
So with our high IDs policy, we were having 2 problems in one : same cause, multiple consequences !
Hints :
If you need to know if your frame is LPM capable :
> Here, we can achieve LPM migrations (active AND inactive, which means even if the lpar is shut down)
If you want more information about the LPM capabilities of your pseries :
If you wanna know which of your VIOs is a MSP (sorry for the ugliest grep Ive ever done) :