home / infca / mq_hacmp (navigation links) "Avui comença tot"

MQ | MB | Install checklist (ext) | hastop | halink | HACMP best practices and recommendations | Perl | HACMP migration | HACMP tips Multi-Instance | Comparacio | Links | End

HACMP


Amunt! Top Amunt!
HACMP configuratoins

Cluster Configurations :

A standby configuration is the most basic cluster configuration in which one node performs work whilst the other node acts only as standby. The standby node does not perform work and is referred to as idle; this configuration is sometimes called "cold standby".

A takeover configuration is a more advanced configuration in which all nodes perform some kind of work and critical work can be taken over in the event of a node failure. A "one sided takeover" configuration is one in which a standby node performs some additional, non critical and non movable work. This is rather like a standby configuration but with (non critical) work being performed by the standby node. A "mutual takeover" configuration is one in which all nodes are performing highly available (movable) work. This type of cluster configuration is also sometimes referred to as "Active/Active" to indicate that all nodes are actively processing critical workload.

HACMP, VCS, ServiceGuard , Heartbeat and MSCS all use a "shared nothing" clustering architecture. A shared nothing cluster has no concurrently shared resources, and works by transferring ownership of resources from one node to another, to work around failures or in response to operator commands. Resources are things like disks, network addresses, or critical processes.

Configuration :

All HA products have the concept of a unit of failover. This is a set of definitions that contains all the processes and resources needed to deliver a highly available service and ideally should contain only those processes and resources.

In HACMP, the unit of failover is called a resource group. On other HA products the name might be different, but the concept is the same. On VCS, it is known as a service group, on MC/ServiceGuard it is a package, in Heartbeat is a resource group and in MSCS is a group. The smallest unit of failover for WMQ is a queue manager, since you cannot move part of a queue manager without moving the whole thing. It follows that the optimal configuration is to place each queue manager in a separate resource group, with the resources upon which it depends. The resource group should therefore contain the shared disks used by a queue manager, which should be in a volume group or disk group reserved exclusively for the resource group, the IP address used to connect to the queue manager (the service address) and an object which represents the queue manager.

Failover - Invoking a secondary system to take over when the primary system fails.

HACMP software samples

Required HA cluster software examples:

When not to use HA WebSphere MQ queue manager clusters

HA WebSphere MQ queue manager clusters require additional proprietary HA hardware (shared disks) and external HA clustering software (such as HACMP). This increases the administration costs of the environment because you also need to administer the HA components. This approach also increases the initial implementation costs because extra hardware and software are required. Therefore, balance these initial costs with the potential costs incurred if a queue manager fail and messages become trapped.
If trapped messages are not a problem for the applications (for example, the response time of the application is irrelevant or the data is updated frequently), then HA WebSphere MQ queue manager clusters are probably not required.

General recommendations

Some of the advice pertinent to an HA environment in general is:

Highly Available WebSphere Business Integration Solutions, SG24-6328-00, chapter 8.2, page 122.


Amunt! Top Amunt!
MC91 - HA for MQ

This SupportPac has now been withdrawn. The support is now included in the WebSphere MQ V7.0.1 product and documentation.

url MC91 - high availability for MQ on Unix. Install into /MQHA/bin : samples use it.

This SupportPac provides notes and sample scripts to assist with the installation and configuration of WebSphere MQ (WMQ) V6 and V7 in High Availability (HA) environments. Three different platforms and environments are described here, but they share a common design and this design can also be extended for many other systems.

Specifically this SupportPac deals with the following HA products:

MC91 installation :

16/03/2009 20:40 1.310.445 mc91.tar.Z {mqm - /MQHA/bin/ } $ uncompress mc91.tar.Z {mqm - /MQHA/bin/ } $ tar -xvf mc91.tar

MC91 configuration :

[1] Configure the HA Cluster
  1. Configure TCP/IP on the cluster nodes for HACMP. Remember to configure ~root/.rhosts, /etc/rc.net, etc.
  2. Configure the cluster, cluster nodes and adapters to HACMP as usual.
  3. Synchronise the Cluster Topology.
[2] Configure the shared disks

This step creates the volume group (or disk group) and filesystems needed for the queue manager. So that this queue manager can be moved from one node to another without disrupting any other queue managers, you should designate a group containing shared disks which is used exclusively by this queue manager and no others. For performance, it is recommended that a queue manager uses separate filesystems for logs and data. The suggested layout therefore creates two filesystems within the volume group.

You can optionally protect each of the filesystems from disk failures by using mirroring or RAID.

Mount points must all be owned by the mqm user.

You will need the following filesystems:

The steps are :

  1. Create the volume group that will be used for this queue manager's data and log files.
  2. Create the /MQHA/<qmgr>/data and /MQHA/<qmgr>/log filesystems using the volume group created above.
  3. For each node in turn, import the volume group, vary it on, ensure that the filesystems can be mounted, unmount the filesystems and varyoff the volume group.
[3] Create the Queue Manager
  1. Select a node on which to perform the following actions
  2. Ensure the queue manager's filesystems are mounted on the selected node.
  3. Create the queue manager on this node, using the hacrtmqm script
  4. Start the queue manager manually, using the strmqm command
  5. Create any queues and channels
  6. Test the queue manager
  7. End the queue manager manually, using endmqm
  8. On the other nodes, which may takeover the queue manager, run the halinkmqm script
[4] Configure the movable resources

The resource group will use the IP address as the service label. This is the address which clients and channels will use to connect to the queue manager.

  1. Create a resource group and select the type as discussed above.
  2. Configure the resource group in the usual way adding the service IP label, volume group and filesystem resources to the resource group.
  3. Synchronise the cluster resources.
  4. Start HACMP on each cluster node in turn and ensure that the cluster stabilizes, that the respective volume groups are varied on by each node and that the filesystems are mounted correctly.
[5] Configure the Application Server or Agent

The queue manager is represented within the resource group by an application server or agent.

  1. Define an application server which will start and stop the queue manager. The start and stop scripts contained in the SupportPac may be used unmodified, or may be used as a basis from which you can develop customized scripts. The examples are called hamqm_start and hamqm_stop.
  2. Add the application server to the resource group definition created in the previous step.
  3. Optionally, create a user exit in /MQHA/bin/rc.local
  4. Synchronise the cluster configuration.
  5. Test that the node can start and stop the queue manager, by bringing the resource group online and offline.
[6] Configure a monitor

You can configure an application monitor which will monitor the health of the queue manager and trigger recovery actions as a result of MQ failures, not just node or network failures. Recovery actions include the ability to perform local restarts of the queue manager or to cause a failover of the resource group to another node.

To benefit from queue manager monitoring you must define an Application Monitor. If you created the queue manager using hacrtmqm, then one of these will have been created for you, in the /MQHA/bin directory, and is called hamqm_applmon.$qmgr.

  1. To enable queue manager monitoring, define a custom application monitor for the Application Server created in previous step, providing the name of the monitor script and tell HACMP how frequently to invoke it. Set the stabilisation interval to 10 seconds, unless your queue manager is expected to take a long time to restart. This would normally be if your environment has long-running transactions that might cause a substantial amount of recovery/replay to be required.
  2. To configure for local restarts, specify the Restart Count and Restart Interval.
  3. Synchronise the cluster resources.
  4. Test the operation of the application monitoring, and in particular verify that the local restart capability is working as configured. A convenient way to provoke queue manager failures is to identify the Execution Controller process (called amqzxma0) associated with the queue manager, and kill it.

Conclusion : the files we have to copy into /hacmp/, and adapt for our system, are :

Then, using smitty, we have to


Amunt! Top Amunt!
IC91 - HA for MB

url IC91 - high availability for MB on distributed platforms. Install into /MQHA/bin : samples use it.

A broker runs as a pair of processes, called bipservice and bipbroker. The latter in turn creates the execution groups that run message flows. It is this collection of processes which are managed by HA Software.

When creating the queue manager, don't configure the application server or application monitor described in SupportPac MC91. You will create an application server that covers the broker, queue manager and broker database instance.

When creating channels between queue managers, the sender channel should use the service address of the broker resource group and the broker queue manager's port number.

[0] Configure the HA Cluster
  1. Configure TCP/IP on the cluster nodes as described in your cluster software documentation.
  2. Configure the cluster, cluster nodes and adapters to HA Software as usual.
  3. Synchronise the Cluster Topology.
  4. Now would be a good time to create and configure the user accounts that will be used to run the database instances, brokers and UNS. Home directories, (numeric) user ids, passwords, profiles and group memberships should be the same on all cluster nodes.
[1] Create and configure the queue manager
  1. On one node, create a clustered queue manager as described in SupportPac MC91, using the hacrtmqm command. Use the volume group that you created for the broker and place the volume group and queue manager into a resource group to which the broker will be added. Don't configure the application server or application monitor described in SupportPac MC91 - you will create an application server that covers the broker, queue manager and broker database instance.
  2. Set up queues and channels between the broker queue manager and the Configuration Manager queue manager:
    • On the Configuration Manager queue manager create a transmission queue for communication to the broker queue manager. Ensure that the queue is given the same name and case as the broker queue manager. The transmission queue should be set to trigger the sender channel.
    • On the Configuration Manager queue manager create a sender and receiver channel for communication with the broker queue manager. The sender channel should use the service address of the broker resource group and the broker queue manager's port number.
    • On the broker queue manager create a transmission queue for communication to the Configuration Manager queue manager. Ensure that the queue is given the same name and case as the Configuration Manager queue manager. The transmission queue should be set to trigger the sender channel.
    • On the broker queue manager create sender and receiver channels to match those just created on the Configuration Manager queue manager. The sender channel should use the IP address of the machine where the Configuration Manager queue manager runs, and the corresponding listener port number.
  3. If you are using a UNS, set up queues and channels between the broker queue manager and the UNS queue manager:
    • On the broker queue manager create a transmission queue for communication to the UNS queue manager. Ensure that the queue is given the same name and case as the UNS queue manager. The transmission queue should be set to trigger the sender channel.
    • On the broker queue manager create a sender and receiver channel for communication with the UNS queue manager. If the UNS is clustered, the sender channel should use the service address of the UNS resource group and the UNS queue manager's port number.
    • On the UNS queue manager create a transmission queue for communication to the broker queue manager. Ensure that the queue is given the same name and case as the broker queue manager. The transmission queue should be set to trigger the sender channel.
    • On the UNS queue manager create a sender and receiver channel for communication with the broker queue manager, with the same names as the receiver and sender channel just created on the broker queue manager. The sender channel should use the service address of the broker resource group and the broker queue manager's port number.
  4. Test that the above queue managers can communicate regardless of which node owns the resource groups they belong to.
[2] Create and configure the broker database

There are two options regarding where the broker database is run, either inside or outside the cluster. If you choose to run the database outside the cluster then simply follow the instructions in the WMB documentation for creating the broker database but ensure that you consider whether the database is a single point of failure and make appropriate provision for the availability of the database.

[3] Create the message broker
  1. Create the broker on the node hosting the logical host using the hamqsicreatebroker command.
  2. Ensure that you can start and stop the broker manually using the mqsistart and mqsistop cmmands.
  3. On any other nodes in the resource group's nodelist (i.e. excluding the one on which you just created the broker), run the hamqsiaddbrokerstandby command to create the information needed by these nodes to enable them to host the broker.
[4] Place the broker under cluster control
  1. Create an application server which will run the broker, its queue manager and the database instance, using the example scripts provided in this SupportPac. The example scripts are called hamqsi_start_broker_as and hamqsi_stop_broker_as.
  2. You can also specify an application monitor using the hamqsi_applmon.<broker> script created by hamqsicreatebroker. An application monitor script cannot be passed parameters, so just specify the name of the monitor script. Also configure the other application monitor parameters, including the monitoring interval and the restart parameters you require.
  3. Synchronise the cluster resources.
  4. Ensure that the broker, queue manager and database instance are stopped, and start the application server.
  5. Check that the components started and test that the resource group can be moved from one node to the other and that they run correctly on each node.
  6. Ensure that stopping the application server stops the components.
  7. With the application server started, verify that the HACMP local restart capability is working as configured. A convenient way to cause failures is to identify the bipservice for the broker and kill it.

MQ and HA

HisCock HA MQ whitepaper

HA in Clustering

A key problem in using MQ clustering for high availability is the problem of stuck messages (in Xmit queue).

Message expiry. If the message isn't delivered to it's target by the time the end-user would have timed out, get it to self destruct.

(fjb_saper) Easy answer: MQ clustering => load balancing. Hardware clustering => HA (high availability).

If you've got an HA set up there are 2 main options:

  1. Use HA software so "the queue manager" is presented on a single IP/port no matter where it happens to be running so the value in the TAB file is always true.
  2. Use the TAB file to define multiple instances of "THEQM" which identify QMA, QMB, etc

Complete (MQ&MB) Schema

First squema is like this one :

.-------------------. .-------------------. | | | | | AIX-1 (active) | | AIX-2 (pasive) | | | | | | .---------. | | .---------. | | | | | | | | | | | MB1(a) | | | | MB1(p) | | | | | | | | | | | .---------. | | .---------. | | | | | | .---------. | | .---------. | | | | | | | | | | | QM1(a) | | | | QM1(p) | | | | | | | | | | | .---------. | | .---------. | | | | | .-------------------. .-------------------.

We have an active machine, AIX-1, running QM1 and MB1, and a passive machine, AIX-2, which is almost always stopped.

So, in order to improve this second machine utilization, we can create a second queue manager and a second broker on AIX-2, and place its backup replicas in AIX-1 :

.--------------------------------. .--------------------------------. | | | | | AIX-1 (active) | | AIX-2 (active) | | | | | | .---------. | | .---------. |- | | | | | | | | \ | | MB1(a) | | | | MB1(p) | | | | | | | | | | | | | .---------. | | .---------. | | | | | | | => Service address 1 | .---------. | | .---------. | | | | | | | | | | | | | QM1(a) | | | | QM1(p) | | | | | | | | | | | / | .---------. | | .---------. |- | | | | | .---------. | | .---------. |- | | | | | | | | \ | | MB2(p) | | | | MB2(a) | | | | | | | | | | | | | .---------. | | .---------. | | | | | | | => Service address 2 | .---------. | | .---------. | | | | | | | | | | | | | QM2(p) | | | | QM2(a) | | | | | | | | | | | / | .---------. | | .---------. |- | | | | .--------------------------------. .--------------------------------.

Finaly, we join QM1 and QM2 in a MQ cluster, so while one machine in moving to its backup image, the source messages are still processed.

A n+1 arquitecture is also possible : to have "n" machines running, and 1 more being the backup of all those "n" machines - we guess they will fail one at a time !

Se instala MC91 (HACMP para MQ) y luego IC91 (HACMP para MB).


Complete list : MB_HACMP (ext, ***)

Install checklist : MB_HACMP


/MQHA/bin/hamqproc

/MQHA/bin/hamqproc contains the list of processes to be killed by hamqm_stop_su :

for process in `cat /MQHA/bin/hamqproc` do ps -ef | grep $process | grep -v grep | \ egrep "$srchstr" | awk '{print $2}'| \ xargs kill -9 done

{bestp}
Amunt! Top Amunt!
WMQ in HA Clusters - best practices

TMM04 {BCN}


Perl
You will need to insure the shebang line points to it.
So you may need to set it to #!/usr/bin/perl or #!/bin/perl or even #!/usr/perl - whatever is local standard.

How to know the "local standard" ?


Amunt! Top Amunt!
HACMP "stop" script
#!/bin/ksh # DESCRIPTION: # /MQHA/bin/ha_mqm_stop_su <qmname> # # Stops the QM. # Check to see if the QM is already stopped. # If so, just make sure no processes are lying around. online=`/MQHA/bin/hamqm_running ${QM}` if [ ${online} != "1" ] then # QM is reported as offline; ensure no processes remain # Note that this whole script should be executed under su, which is why there's no su in the following loop. # The regular expression in the next line contains a tab character. Edit only with tab-friendly editors. srchstr="( |-m)$QM[ ]*.*$" for process in runmqlsr amqpcsea amqhasmx amqharmx amqzllp0 \ amqzlaa0 runmqchi amqrrmfa amqzxma0 do ps -ef | grep $process | grep -v grep | \ egrep "$srchstr" | awk '{print $2}'| \ xargs kill -9 done exit 0 fi

It can be done (newer stop_su) providing the names in a file : see here.

Amunt! Top Amunt!
HACMP "link" script

The core is :

# Args: # $1: Qmgr name # $2: Mangled qmgr directory name -- may or may not be the same as qmgr # $3: Shared Prefix -- e.g. /MQHA//data if [ -r $3/qmgrs/$2/qm.ini ] then # We're running on the master node that owns the queue manager # so we will create symlinks back to /var/mqm/ipc subdirs for topdir in @ipcc @qmpersist @app do for subdir in esem isem msem shmem spipe do rm -fr $ipcorig/$subdir rm -fr $ipcorig/$topdir/$subdir ln -fs $ipcbase/$subdir $ipcorig/$subdir ln -fs $ipcbase/$topdir/$subdir $ipcorig/$topdir/$subdir done done rm -rf $ipcorig/qmgrlocl ln -fs $ipcbase/qmgrlocl $ipcorig/qmgrlocl else # We're running on a standby node, so all we have to do is to # update the config file that tells us where the queue manager lives cat >> /var/mqm/mqs.ini <<EOF QueueManager: Name=$1 Prefix=$3 Directory=$2 EOF fi
HACMP "MQ monit" script
"simple" one

Just does a "ping qmgrname" :

dy0608:/MQHA/bin # more hamqm_applmon.QMPROD01 #!/bin/ksh su mqm -c /MQHA/bin/hamqm_applmon_su QMPROD01 dy0608:/MQHA/bin # more hamqm_applmon_su #!/bin/ksh QM=$1 # Test the operation of the QM. echo "ping qmgr" | runmqsc ${QM} > /dev/null 2>&1 pingresult=$? # pingresult will be 0 on success; non-zero on error (man runmqsc) if [ $pingresult -eq 0 ] then # ping succeeded echo "hamqm_applmon: Queue Manager ${QM} is responsive" result=0 else # ping failed result=$pingresult fi exit $result

New alternative : dspmq -n <qmname> | grep "RUNNING"

"complex" one

Verifies few processes are still running :

Check_qmgr: # Check for the main processes for pid in amqzxma0 amqhasmx amqzllp0 do if ps -u mqm -o pid,args | eval /usr/xpg4/bin/grep -E '$PATTERN' |\ grep -w $pid > /dev/null then rc=0 else rc=1 fi

Gracias, Vicente !

HACMP "MB monit" script
STATE="stopped" # cnt=`ps -ef | grep db2sysc | grep -v grep | grep $DBINST | wc -l` if [ $cnt -gt "0" ] then # Found one or more db2sysc process, so database instance assumed to be running normally echo "hamqsi_monitor_broker_as: Broker database is running" STATE="started" else # Did not find a db2sysc process, but check to see whether db2start is still running and only report error if there is not one. cnt=`ps -ef | grep db2start | grep -v grep | grep $DBINST | wc -l` if [ $cnt -gt "0" ] then echo "hamqsi_monitor_broker_as: Broker database is starting" STATE="starting" else echo "hamqsi_monitor_broker_as: Broker database is not running correctly" STATE="stopped" fi fi # Decide whether to continue or to exit case $STATE in stopped) echo "hamqsi_monitor_broker_as: Database instance ($DBINST) is not running correctly" exit 1 ;; starting) echo "hamqsi_monitor_broker_as: Database instance ($DBINST) is starting" echo "hamqsi_monitor_broker_as: WARNING - Stabilisation Interval may be too short" echo "hamqsi_monitor_broker_as: WARNING - No test of broker $BROKER will be conducted" exit 0 ;; started) echo "hamqsi_monitor_broker_as: Database instance ($DBINST) is running" continue # proceed by testing broker ;; esac # ------------------------------------------------------------------ # Check the MQSI Broker is running # # Re-initialise STATE for safety STATE="stopped" # # The broker runs as a process called bipservice which is responsible for starting and re-starting the admin agent process (bipbroker). # The bipbroker is responsible for starting any DataFlowEngines. # If no execution groups have been assigned to the broker there will be no DataFlowEngine processes. # There should always be a bipservice and bipbroker process pair. # This monitor script only tests for bipservice, because bipservice should restart bipbroker if necessary # - the monitor script should not attempt to restart bipbroker and it may be premature to report an absence of a bipbroker as a failure. cnt=`ps -ef | grep "bipservice $BROKER" | grep -v grep | wc -l` if [ $cnt -eq 0 ] then echo "hamqsi_monitor_broker_as: MQSI Broker $BROKER is not running" STATE="stopped" else echo "hamqsi_monitor_broker_as: MQSI Broker $BROKER is running" STATE="started" fi # Decide how to exit case $STATE in stopped) echo "hamqsi_monitor_broker_as: Broker ($BROKER) is not running correctly" exit 1 ;; started) echo "hamqsi_monitor_broker_as: Broker ($BROKER) is running" exit 0 ;; esac

HA logs

An easy way to monitor cluster events and messages is by tailing the following HACMP log files:

Application monitoring in HACMP has its own set of log files.


HA sanity tests

Manual system start up tests :

  1. Node 1 and node 2 are both active:
    1. disable the cluster on both nodes by using smitty clsstop
    2. stopping the cluster unmounts the shared drives. Mount the shared drives.
    3. start QM1 on node 1
    4. start QM2 on node 2
    5. observe the results to verify that no errors are reported during MQ operation.
  2. Node 2 is the only active node:
    1. start QM1 on node 2
    2. start QM2 on node 2
    3. observe the results to verify that no errors are reported during MQ operation.
  3. Node 1 is the only active node:
    1. start QM2 on node 1
    2. start QM1 on node 1
    3. observe the results to verify that no errors are reported during MQ operation.
Verify HA configuration

The tests cases for verifying the HA configuration are:

  1. Shared files - the following files should be located in shared directories:
    • MQ logs
    • Queue manager data for every queue manger should reside in a shared location.

/MQHA/<qmgr>/data and /MQHA/<qmgr>/log

Separate disks for data files and logs - while not essential for HA reasons, it is recommended for performance

Test HA control

The objective of this test suite is to verify if HACMP is able to start, restart, and monitor all the individual applications that are part of the cluster.

  1. Automatic system startup/restart under HACMP control
    • end message broker gracefully : "mqsistop -i MB_NAME"
    • end message broker abruptly
      ps -ef | grep bipservice | grep -v grep | \ awk '{print $2}' | xargs kill -9
    • end queue manager gracefully : "endmqm -i QM_NAME"
    • end queue manager abruptly
      ps -ef | grep AMQXSSVN.EXE | grep -v grep | \ awk '{print $2}' | xargs kill -9
    • end queue manager abruptly : kill "AMQZXMA0", Execution Controller process. url
    • disable qmgr restarting by:
      • changing the owner of AMQERR01.LOG
      • chmod 400 of active log file S0000000.LOG
      • renaming general mqs.ini or specific qm.ini

  2. Restart attempts setting

    Verify the number of retry attempts. Each WebSphere Business Integration application will be restarted three times before a resource group failover is initiated. The number of retry attempts can be configured in HACMP.

  3. Failover

  4. Fallback

    Fallback refers to the movement of a resource group from a secondary or a failover node to the primary node, which is being reintegrated into the cluster. In the current WebSphere Business Integration cluster, automatic fallback is disabled. However, manual reintegration should be validated:

    1. Bring node 1 down:
      shutdown -r now
    2. Node 1 resource group fails over to node 2.
    3. Verify failover by looking at the HACMP log file:
      tail -f /tmp/hacmp.out
    4. Start up cluster on node 1 after node 1 is back up:
      smitty clstart
    5. Observe that the resource groups are still running on node 2 even though the cluster on node 1 is backed up.
    6. Repeat the above test for node 2 fallback.

Migration / maintenance

Assuming a two-node active/active cluster, the steps are

  1. Select one machine to upgrade first
  2. At a suitable time, when the moving of a queue manager will not cause a serious disruption to service, manually force a migration of the active queue manager to its partner node
  3. On the machine that is now running both queue managers, disable the failover capabilities for the queue managers.
  4. Upgrade the software on the machine that is not running any queue managers
  5. Re-enable failover, and move both queue managers across to the newly upgraded machine
  6. Disable failover again
  7. Upgrade the original box
  8. Re-enable failover
  9. When it will cause least disruption, move one of the queue managers across to balance the workload

Amunt! Top Amunt!
HACMP tips

Amunt! Top Amunt!
FileSystem Requirements

What are the requirements for a HACMP (MQ) filesystem ?

And using Multi-Instance ? NFS v4 !


Amunt! Top Amunt!
Multi-instance MQ & MB

multi-instance queue managers - good intro.

In multi-instance terminology, there is an Active qmgr and a Standby qmgr that are both running and looking at file locks. Read nfsv4 specs, RFC 3530 : lease period

Increase messaging availability : url

Creating a multi-instance qmgr on Linux

Both machines have a diferent IP !
So, "do NOT use multi-instance queue managers as full repositories", page 19, but here it says "if you still need better availability, consider hosting the full repository queue managers as multi-instance queue managers" !

CONNAME has been expanded to support more than one "ipaddress(port)" combination, across all channel types that use it :

define channel(CH_NAME) chltype(SDR) trptype(TCP) xmitq(XQN) conname('<ip>(<port>)') replace sample : DEFINE CHANNEL(CHANNEL1) CHLTYPE(CLNTCONN) TRPTYPE(TCP) CONNAME('server1(2345),server2(2345)') QMNAME(QM1) REPLACE

Mind the 48 character limit for "CONNAME" !

developerWorks : complete sample, part 2 ; creating a multi-instance queue manager for MQ on Linux.

  1. create shared directories : url
  2. create multi-instance MQ : url
  3. create multi-instance MB : url

When you intend to use a queue manager as a multi-instance queue manager, create a single queue manager on one of the servers using the WebSphere MQ crtmqm command, placing its queue manager data and logs in shared network storage. On the other server, rather than create the queue manager again, use the WebSphere MQ addmqinf command to create a reference to the queue manager data and logs on the network storage.

You can now run the queue manager from either of the servers. Each of the servers references the same queue manager data and logs; there is only one queue manager, and it is active on only one server at a time.

You can swap the active instance to the other server, once it has started, by stopping the active instance using the switchover option to transfer control to the standby.

The active instance of QM1 has exclusive access to the shared queue manager data and logs folders when it is running. The standby instance of QM1 detects when the active instance has failed, and becomes the active instance. It takes over the QM1 data and logs in the state they were left by the active instance, and accepts reconnections from clients and channels. The active instance might fail for various reasons that result in the standby taking over:

You can add the queue manager configuration information to multiple servers, and choose any two servers to run as the active/standby pair.

A multi-instance queue manager is one part of a high availability solution. You need some additional components to build a useful high availability solution.

WebSphere MQ Clients and channels reconnect automatically to the standby queue manager when it becomes active. Reconnection, and the other components in a high availability solution are discussed in related topics. Automatic client reconnect is not supported by WebSphere MQ classes for Java.

MQ, MB.

NFS specs and samples

Filesystem requisites : The storage must be accessed by a network file system protocol which is Posix-compliant and supports lease-based locking. Network File System version 4 (NFS v4) satisfies this requirement. Also "NAS" or "GPFS".

Probeid ZX155001 : If you are using the NFS V4 file system as the shared file system, you must use hard mounts, synchronous writing and disable write caching, to fulfill these requirements.

Verification tool : amqmfsck

Required highly available network-attached storage (NAS) examples:

2-instance creation summary

Summary:

  1. Set up shared filesystems for QM data and logs
  2. Create the queue manager on machine1
    crtmqm -md /shared/qmdata -ld /shared/qmlog QM1
  3. Define the queue manager on machine2 (or edit mqs.ini)
    addmqinf -vName=QM1 -vDirectory=QM1 -vPrefix=/var/mqm -vDataPath=/shared/qmdata/QM1
  4. Start an instance on machine1 - it becomes Active
    strmqm -x QM1
  5. Start another instance on machine2 - it becomes Standby
    strmqm -x QM1
Filesystem verification tool

Mind "File System Check tool" : amqmfsck, ( applies only to UNIX and IBM i systems ). Details here

Verifying the multi-instance queue manager on Linux

Use the sample programs amqsghac, amqsphac and amqsmhac to verify a multi-instance queue manager configuration.

url

Conversion (unix)

Implementation considerations for Multi-Instance queue managers in MQ cluster environment : How to convert queue manager to be multi-instance

Windows domains and multi-instance queue managers

The only way to ensure each of the servers running queue manager instances use the same local mqm group with the same SID as the owner of the queue manager data and log directories on the file server is to make the local mqm group a domain local group.

In order to use domain local groups, you must run multi-instance queue managers on a domain controller. On a domain controller all local groups are implicitly domain local groups.

url

Failover mechanism

How does the standby queue manager take it over ?

Actions that cause a failover. Failover of a multi-instance queue manager can be triggered by hardware or software failures, including networking problems which prevent the queue manager writing to its data or log files. To be confident that a shared file system will provide integrity and work with a multi-instance queue manager when such a problem occurs unexpectedly, test all possible failure scenarios. A list of actions that would cause a failover includes:

Two IPs

Remote client cann access the multi-instance qmgr by

DEFINE CHANNEL(CHANNEL1) CHLTYPE(SVRCONN) TRPTYPE(TCP) MCAUSER('mqm') REPLACE DEFINE CHANNEL(CHANNEL1) CHLTYPE(CLNTCONN) TRPTYPE(TCP) CONNAME('ipaddr1(1414),ipaddr2(1414)') QMNAME(QM1) REPLACE START CHANNEL(CHANNEL1)
Multi-instance MB

MB starts/stops as a MQ service ...

Configuring a multi-instance Message Broker for High Availability support :
A multi-instance broker is created using the mqsicreatebroker command, with an additional -e option that specifies the location in shared network storage of the broker registry and other configuration data. Additional instances of the broker can then be created on other machines in the network using a new command called mqsiaddbrokerinstance, using the -e option to target the same location in shared network storage. Broker logging, error handling and shared Java Classes remain local to the machine that hosts the broker or broker instance.

Configuring a WebSphere Message Broker to run in multi-instance mode

Active/active multi-instance MB

You can create an ACTIVE/ACTIVE scenario using multi-instance brokers/qms. This would be setup similar to your description

QMA & QMB working in a cluster to provide load balancing.

HA manager

Using a broker with an existing high availability manager, using a broker with an existing Windows cluster


Amunt! Top Amunt!
Multi-instance or HA cluster?

Multi-instance queue manager advantatges

HA cluster advantatges

Storage distinction


Amunt! Top Amunt!
Links

KC : IIB Active/passive HA, IIB Active/active HA for HTTP, MQ HA Cluster configurations, MQDev blog on attaching MQ clients to active/active qmgrs, Testing and support statement for multi-instance

HACMP-MB es IC91 : High Availability for WebSphere Message Broker on Distributed Platforms

Redbooks : Highly Available WebSphere Business Integration Solutions. High Availability in WebSphere Messaging Solutions.

All Support Packs

HACMP on AIX : complete list (***), migration checklist

HACMP install, etc

Impact 2008 HA.

HA & disaster recovery chat


Ep ! Valid HTML 4.01!   Valid CSS! Escriu-me !
Updated 20141223 (a)  
Uf !