MQ clustering

Concepts

Purpose : workload balancing & simplified administration & scalability

Problem : stuck message(s) @ Xmit Q(s)!

Requirement : "define ql(nom) DEFBIND(NOTFIXED)" or MQOO_BIND_NOT_FIXED, instead of "DEFBIND(OPEN)"

Arquitecture

A MQ cluster is ... You don't define a cluster as such; you define cluster attributes in each queue manager, and each queue manager becomes a member of a logical entity that is referred to as a "queue manager cluster".

It has 2 FRs and lots of PRs.

FR holds info about the cluster topology = participant qmgrs and shared queues.

How do you add a queue manager as a partial repository except by creating a cluster sender and a cluster receiver channel? A qmgr becomes a partial repository (PR) when an object (queue, channel, etc.) is defined with the CLUSTER() attribute name of the cluster in which the object is to be known.

Cluster routing and operation

At SMQ qmgr we can display the cluster sender channels, listening on one transmit queue :

display qstatus(SYSTEM.CLUSTER.TRANSMIT.QUEUE) type(handle) APPLTAG CHANNEL 6 : display qstatus(SYSTEM.CLUSTER.TRANSMIT.QUEUE) type(handle) APPLTAG CHANNEL AMQ8450: Display queue status details. QUEUE(SYSTEM.CLUSTER.TRANSMIT.QUEUE) TYPE(HANDLE) APPLTAG(C:\MQ\bin\amqrmppa.exe) CHANNEL(TO.SMQ2) AMQ8450: Display queue status details. QUEUE(SYSTEM.CLUSTER.TRANSMIT.QUEUE) TYPE(HANDLE) APPLTAG(C:\MQ\bin\amqrmppa.exe) CHANNEL(TO.SMQ3) AMQ8450: Display queue status details. QUEUE(SYSTEM.CLUSTER.TRANSMIT.QUEUE) TYPE(HANDLE) APPLTAG(C:\MQ\bin\amqrmppa.exe) CHANNEL(TO.IB9QMGR)

SYSTEM.CLUSTER.TRANSMIT.QUEUE

holds outboud administrative messages
holds outbound user messages
CorrelId in MQMD added on transmission queue will contain the name of the channel that the message should be sent down

Para optimizar su vaciado se puede introducir el parámetro PipeLineLength=2 en qm.ini
Pipelinelength=2 enable overlap of putting messages onto TCP while waiting for acknowledgment of previous batch. This enables overlap of sending messages while waiting for Batch synchronization at remote system.

URL : To allow an MCA to transfer messages using multiple threads, type the number of concurrent threads that the channel will use. The default is 1; if you type a value greater than 1, it is treated as 2. Make sure that you configure the queue manager at both ends of the channel to have a Pipeline length that is greater than 1. Pipelining is effective only for TCP/IP channels.

Curiós:

dis qstatus(SYSTEM.CLUSTER.TRANSMIT.QUEUE) type(handle) all 5 : dis qstatus(SYSTEM.CLUSTER.TRANSMIT.QUEUE) type(handle) all AMQ8101: WebSphere MQ error (18EBF0) has occurred.

Hem de engegar els canals i obtenim:

dis qstatus(SYSTEM.CLUSTER.TRANSMIT.QUEUE) type(handle) all 11 : dis qstatus(SYSTEM.CLUSTER.TRANSMIT.QUEUE) type(handle) all AMQ8450: Display queue status details. QUEUE(SYSTEM.CLUSTER.TRANSMIT.QUEUE) TYPE(HANDLE) APPLDESC(WebSphere MQ Channel) APPLTAG(C:\MQ\bin\amqrmppa.exe) APPLTYPE(SYSTEM) BROWSE(YES) CHANNEL(TO.SMQ2) CONNAME(127.0.0.1(2417)) ASTATE(NONE) HSTATE(ACTIVE) INPUT(SHARED) INQUIRE(YES) OUTPUT(YES) PID(9060) QMURID(0.155) SET(YES) TID(9) URID(XA_FORMATID[] XA_GTRID[] XA_BQUAL[]) URTYPE(QMGR)

Top

What data do I need to join a cluster

cluster name [1]
repository qmgr ip & port [2]
cluster-channel name [3]

DEFINE CHANNEL ('CLUSTER-NAME.MY-QM-NAME') + CHLTYPE(CLUSRCVR) + TRPTYPE(TCP) + CLUSTER('OUR-CLUSTER') + [1] DEFINE CHANNEL ('CLUSTER-NAME.FR-QM-NAME') + [3] CHLTYPE(CLUSSDR) + TRPTYPE(TCP) + CLUSTER('OUR-CLUSTER') + [1] CONNAME('remotehost.domain(1482)') + [2]

Real sample :

*** Se crea el canal CLUSTER RCVR *** def channel(TO.QM01) + chltype(CLUSRCVR) + trptype(TCP) + conname('my.hostname(1414)') + cluster(CLUSTERNAME) + [1] maxmsgl(104857600) + replace *** Se crea el canal CLUSTER SDR *** def channel(CLUSTERNAME.QMFR) + [3] chltype(CLUSSDR) + trptype(TCP) + conname('host.remoto.fr(1415)') + [2] cluster(CLUSTERNAME) + [1] maxmsgl(104857600) + replace

Para recibir información de la configuración del cluster - las colas que hay ofrecidas / visibles al cluster - solo nos hace falta [1], el nombre del cluster
Pero para activar el canal CLUSRCVR hace falta activar el canal CLUSSDR, o sea que hace falta la ip/port remotos [2], por lo que hace falta el nombre del canal [3]

Preventing queue managers joining a cluster

It is difficult to stop a queue manager that is a member of a cluster from defining a queue. Therefore, there is a danger that a rogue queue manager can join a cluster, learn what queues are in it, define its own instance of one of those queues, and so receive messages that it should not be authorized to receive.

To prevent a queue manager receiving messages that it should not, you can write:

a channel exit program on each cluster-sender channel, which uses the connection name to determine the suitability of the destination queue manager to be sent the messages.
a cluster workload exit program, which uses the destination records to determine the suitability of the destination queue and queue manager to be sent the messages
a channel auto-definition exit program, which uses the connection name to determine the suitability of defining channels to the destination queue manager

MQ v7 queue manager clusters, csqzah09.pdf, pg 77

Top

Naming / nomenclatura

First idea is to name

TO.LOCAL_QMGR_NAME = CLUSRCVR, cluster receiver channel
TO.REMOTE_QMGR_NAME = CLUSSDR, cluster sender channel

but when it comes to overlapping clusters, it is not good enough {see PDF}

Advanced naming convention

[TR] Naming convention I try to use instead is <cluster name>.<qmgr name>, as CLUSNAME.QMGRNAME, meaning "only one cluster per channel"

MQ cluster best practices {sagpdf}

Top

Clustering demo

See commands in "\\MQ\Eines\Clustering_Demo\"

instalació MQ - producte as is
configuració del cluster [administració] - facilitat incorporacio cues noves - no cal QREMOTE
funcionament bàsic del cluster - as MQ
incorporació de un nou QM [escalabilitat] - sols 2 canals (+ cues que ofereix)
prova de càrrega - 200 msg/segon, de 1 KB
prova de balanceig de càrrega [balanceig de càrrega] - sense / amb coeficients - CLWLWGHT/CLWLPRTY/CLWLRANK
caiguda de un servidor [alta disponibilitat] - caiguda de QM o cua no disponible (Put disabled)
accés al cluster des un MQ Client, exterior al cluster [MQ client] - QALIAS at entry node

Minimum actions - create a cluster

LO qm ALTER QMGR REPOS(INVENTORY) NY qm ALTER QMGR REPOS(INVENTORY) LO qm DEFINE CHANNEL(TO.LONDON) CHLTYPE(CLUSRCVR) TRPTYPE(TCP) CONNAME(LONDON.CHSTORE.COM) CLUSTER(INVENTORY) NY qm DEFINE CHANNEL(TO.NEWYORK) CHLTYPE(CLUSRCVR) TRPTYPE(TCP) CONNAME(NEWYORK.CHSTORE.COM) CLUSTER(INVENTORY) LO qm DEFINE CHANNEL(TO.NEWYORK) CHLTYPE(CLUSSDR) TRPTYPE(TCP) CONNAME(NEWYORK.CHSTORE.COM) CLUSTER(INVENTORY) NY qm DEFINE CHANNEL(TO.LONDON) CHLTYPE(CLUSSDR) TRPTYPE(TCP) CONNAME(LONDON.CHSTORE.COM) CLUSTER(INVENTORY) NY qm DEFINE QLOCAL(INVENTQ) CLUSTER(INVENTORY) Add a QM (Paris) to the Cluster. PA qm DEFINE CHANNEL(TO.PARIS) CHLTYPE(CLUSRCVR) TRPTYPE(TCP) CONNAME(PARIS.CHSTORE.COM) CLUSTER(INVENTORY) // clusRCVR must go first PA qm DEFINE CHANNEL(TO.LONDON) CHLTYPE(CLUSSDR) TRPTYPE(TCP) CONNAME(LONDON.CHSTORE.COM) CLUSTER(INVENTORY) // clusSDR must go second Add a QM+Q (Toronto + INVENTQ) to the Cluster. TO qm DEFINE CHANNEL(TO.TORONTO) CHLTYPE(CLUSRCVR) TRPTYPE(TCP) CONNAME(TORONTO.CHSTORE.COM) CLUSTER(INVENTORY) TO qm DEFINE CHANNEL(TO.NEWYORK) CHLTYPE(CLUSSDR) TRPTYPE(TCP) CONNAME(NEWYORK.CHSTORE.COM) CLUSTER(INVENTORY) TO qm DEFINE QLOCAL(INVENTQ) CLUSTER(INVENTORY) Verify NY qm DIS QCLUSTER(*) CLUSTER (INVENTORY) NY qm DIS CLUSQMGR(*) CLUSTER (INVENTORY) TO qm DIS QCLUSTER(*) CLUSTER (INVENTORY) TO qm DIS CLUSQMGR(*) CLUSTER (INVENTORY) Load Balance ( LA gets twice as many messages as NY ) LA qm DEFINE CHANNEL(TO.LA) CHLTYPE(CLUSRCVR) TRPTYPE(TCP) CONNAME(LA.CHSTORE.COM) CLUSTER(INVENTORY) CLWLWGHT(2) NY qm ALTER CHANNEL(TO.NEWYORK) CHLTYPE(CLUSRCVR) CLWLWGHT(1)

Minimum actions - display the cluster

[fr/pr?] dis clusqmgr(*) conname qmtype status

Minimum actions - delete a cluster

You don't, as such. A Cluster isn't an "entity" that can be deleted.
Once you have altered the clustered objects to remove the cluster atribute, you can issue the REFRESH CLUSTER command on that qmgr.

Minimum actions - Remove a queue from a cluster

Sample cluster : 2xFR, 1 GW, 1 external qmgr

echo "DEFINE QLOCAL(INVENTQ) CLUSTER(INVENTORY) DEFBIND(NOTFIXED)" | runmqsc TQM4 echo "DEFINE QLOCAL(INVENTQ) CLUSTER(INVENTORY) DEFBIND(NOTFIXED)" | runmqsc TQM2 echo "DEFINE QREMOTE(ANY.INVENTQ) RNAME(' ') RQMNAME(' ')" | runmqsc TQM1 echo "ALTER QREMOTE(INVENTQ) RNAME(INVENTQ) RQMNAME(ANY.INVENTQ) XMITQ(TQM1)" | runmqsc TQM3

You have to be able to deduce that QM2 and 4 are the FR's, QM1 is the gateway, and QM3 is external to the cluster ...

Complete cluster : 2xFR, Nx ENT, Nx MB

See GNF

Top

Cluster resource's availability

endmqm -p qmgrname
- first, messages go to the other cluster queue manager
- finally, messages get stuck in SYSTEM.CLUSTER.TRANSMIT.QUEUE
alter ql(qname) put(disabled)
- first, messages go to the other cluster queue manager
- when all cluster queues are "put-disabed", client application gets compcode '2' ('MQCC_FAILED') reason '2268' ('MQRC_CLUSTER_PUT_INHIBITED')

Top

Cluster commands

Available commands are :

DISPLAY QCLUSTER(*) CLUSQMGR - displays queues in cluster
1 : display qcluster(*) clusqmgr AMQ8409: Ver detalles de la cola. QUEUE(QL.CLSAG.CLFR1.SEBAS) TYPE(QCLUSTER) CLUSQMGR(CLFR1) QUEUE(QL.CLSAG.CLFR2.SEBAS) TYPE(QCLUSTER) CLUSQMGR(CLFR2) QUEUE(QL.DELPHI.GRAW.IN) TYPE(QCLUSTER) CLUSQMGR(QMAS) QUEUE(QL.DELPHI.GRAW.IN) TYPE(QCLUSTER) CLUSQMGR(CLFR1) QUEUE(QL.DELPHI.GRAW.IN) TYPE(QCLUSTER) CLUSQMGR(CLFR2)

DISPLAY CLUSQMGR(*) CONNAME QMTYPE STATUS - display queue managers in cluster
1 : DISPLAY CLUSQMGR(*) CONNAME QMTYPE STATUS AMQ8441: Ver detalles del gestor de colas de clster. CLUSQMGR(CLFR1) CHANNEL(SAGCLUSTER.CLFR1) CLUSTER(SAGCLUSTER) CONNAME(99.137.164.25(2401)) QMTYPE(REPOS) STATUS(RUNNING) AMQ8441: Ver detalles del gestor de colas de clster. CLUSQMGR(CLFR2) CHANNEL(SAGCLUSTER.CLFR2) CLUSTER(SAGCLUSTER) CONNAME(99.137.164.153(2401)) QMTYPE(REPOS) STATUS(RUNNING) AMQ8441: Ver detalles del gestor de colas de clster. CLUSQMGR(QMAS) CHANNEL(SAGCLUSTER.QMAS) CLUSTER(SAGCLUSTER) CONNAME(6q(1491)) QMTYPE(NORMAL) STATUS(RUNNING)

SUSPEND QMGR - use the SUSPEND QMGR command to remove a queue manager from a cluster temporarily, for example for maintenance
Syntax is : SUSPEND QMGR CLUSTER (cluster_name) [ MODE( QUIESCE | FORCE ) ]

I always conclude removing a QM from a cluster by issuing the REFRESH CLUSTER command on that QM leaving the cluster.
RESUME QMGR - use the RESUME QMGR command to reinstate a queue manager to a cluster, after temporarily having removed it
Syntax is : RESUME QMGR CLUSTER (cluster_name)

REFRESH CLUSTER
Issue the REFRESH CLUSTER command from a queue manager to discard all locally held information about a cluster.
Using REFRESH CLUSTER(clustername) REPOS(YES) specifies that in addition to the default behavior, objects representing full repository cluster queue managers are also refreshed. This option may not be used if the queue manager is itself a full repository.
Issuing REFRESH CLUSTER is disruptive to the cluster.
It is strongly recommended that all cluster sender channels for the cluster are stopped before the REFRESH CLUSTER command is issued.

RESET CLUSTER - used to forcibly remove a queue manager from a cluster. You can do this from a full repository queue manager by issuing either the command:

RESET CLUSTER(clustername) QMNAME(qmname) ACTION(FORCEREMOVE) QUEUES(NO) or the command RESET CLUSTER(clustername) QMID(qmid) ACTION(FORCEREMOVE) QUEUES(NO)

publib RESET CLUSTER
Using the RESET CLUSTER command is the only way to delete auto-defined cluster-sender channels

Chapter 6, "Queue Manager Clusters", SC34-6589-00.

UK 2013 :

RESET CLUSTER(PC.ECOMM) QMID(QMN.USR.2_2009-08-24_13.20.39) ACTION(FORCEREMOVE) QUEUES(YES) // on FR REFRESH CLUSTER(PC.ECOMM) REPOS(YES) // on QMN

Resum by mr Saper

RESET is used to forcibly remove the information from the Full repository.
SUSPEND leaves the information in the Full repository intact and merely signals that you do no longer want to be included in the load balancing.

The same way: REFRESH checks the information in the FR and tries to add you if you are not there.
RESUME tells the FR that you a ready to rejoin the load balancing.

Cluster specific actions

Com es fa per convertir un FR en PR ?

alter qmgr repos(' ')

Com es fa per convertir un PR en FR ?

alter qmgr repos('mycluster')

Perform a "cold-start" of the cluster, this is, refresh cluster config (use on PR qmgr)

REFRESH CLUSTER REPOS(YES)

Top

Cluster troubleshooting

(one of the) Full Repository QM fails. When back, it does not see the remote cluster queues.

Sol : SUSPEND qmgr + RESUME qmgr

Com saber qui es/son el "Full Repository" de un cluster ?

Use the DISPLAY CLUSQMGR command to display cluster information about queue managers in a cluster. If you issue this command from a queue manager with a full repository, the information returned pertains to every queue manager in the cluster. If you issue this command from a queue manager that does not have a full repository, the information returned pertains only to the queue managers in which it has an interest. That is, every queue manager to which it has tried to send a message and every queue manager that holds a full repository.

Use the SUSPEND QMGR command and RESUME QMGR command to remove a queue manager from a cluster temporarily, for example for maintenance, and then to reinstate it.

In an emergency where a queue manager is temporarily damaged, you might want to inform the rest of the cluster before the other queue managers try to send it messages. RESET CLUSTER can be used to remove the damaged queue manager. Later when the damaged queue manager is working again, you can use the REFRESH CLUSTER command to reverse the effect of RESET CLUSTER and put it back in the cluster again.

Use the DISPLAY QCLUSTER(*) command to display all queues visible from a given cluster queue manager.

The DISPLAY QUEUE or DISPLAY QCLUSTER command returns the name of the queue manager that hosts the queue (or the names of all queue managers if there is more than one instance of the queue). It also returns the system name for each queue manager that hosts the queue, the queue type represented, and the date and time at which the definition became available to the local queue manager.

Cluster symptoms and solutions :

Symptom - Applications get rc=2085 MQRC_UNKNOWN_OBJECT_NAME when trying to open a queue in the cluster.
Description : The queue manager where the object exists or this queue manager may not have successfully entered the cluster. Make sure that they can each display all of the full repositories in the cluster. Also make sure that the CLUSSDR channels to the full repositories are not in retry state.
Solution : issue display clusqmgr(*) qmtype status command.
Symptom - Messages are not appearing on the destination queues.
Description : The messages may be stuck at their origin queue manager. Make sure that the SYSTEM.CLUSTER.TRANSMIT.QUEUE is empty and also that the channel to the destination queue manager is running.
Solution : issue display ql(SYSTEM.CLUSTER.TRANSMIT.QUEUE) curdepth command
Symptom - No changes in the cluster are being reflected in the local queue manager.
Description : The repository manager process in not processing repository commands. Check that the SYSTEM.CLUSTER.COMMAND.QUEUE is empty.
Solution : issue display ql(SYSTEM.CLUSTER.COMMAND.QUEUE) curdepth command

"Queue Manager Clusters", SC34-6589-00, csqzah07.pdf, apendix A.

Cluster change propagation - RESET CLUSTER

Quite often what we have seen occurring is that a queue manager is removed without first deleting its cluster resources. This leaves a situation where the rest of the cluster thinks the queue manager still exists. If you find this has occurred, you will need to use the RESET CLUSTER command to force the removed queue managers definitions out of the cluster.

TMM10 - Introduction to WMQ Clustering.

2189 MQRC CLUSTER RESOLUTION ERROR

url

The queue is being opened for the first time and the queue manager cannot make contact with any full repositories. Make sure that the CLUSSDR channels to the full repositories are not in retry state.

1 : display clusqmgr(*) qmtype status AMQ8441: Display Cluster Queue Manager details. CLUSQMGR(QM1) CLUSTER(DEMO) CHANNEL(TO.QM1) QMTYPE(NORMAL) AMQ8441: Display Cluster Queue Manager details. CLUSQMGR(QM2) CLUSTER(DEMO) CHANNEL(TO.QM2) QMTYPE(REPOS) STATUS(RUNNING) AMQ8441: Display Cluster Queue Manager details. CLUSQMGR(QM3) CLUSTER(DEMO) CHANNEL(TO.QM3) QMTYPE(REPOS) STATUS(RUNNING)

url

Qmgr (new) values not updated in Cluster

If a queue manager has some values (as listener port) at the moment the cluster is created, a change in those values shall not be propagated to the Cluster (repository), unless the following procedure is used :

alter QM1 to be a Partial Repository (suposing it was FR)
REFRESH CLUSTER with the REPOS(YES) option
make QM1 to be Full Repository again (if needed)

Repeat with QM2, the other FR.

Problems with clustering when changing IP

pending to expand

Problem : display CLUSQMGR shows SYSTEM.TEMPQMGR.*

This is temporary type of situation in that this temporary name goes away once the repositories are brought in sync with each other. This is documented in MQ Queue Managers Clusters manual.

url, url

1 : DISPLAY CLUSQMGR(*) CONNAME QMTYPE STATUS AMQ8441: Display Cluster Queue Manager details. CLUSQMGR(P7029) CHANNEL(SAGCLUSTER.P7029) CLUSTER(SAGCLUSTER) CONNAME(9.137.166.87(2415)) QMTYPE(NORMAL) STATUS(INACTIVE) AMQ8441: Display Cluster Queue Manager details. CLUSQMGR(SYSTEM.TEMPQMGR.9.137.164.25(2401)) CHANNEL(SAGCLUSTER.CLFR1) CLUSTER(SAGCLUSTER) CONNAME(9.137.164.25(2401)) QMTYPE(REPOS) STATUS(RUNNING) One MQSC command read.

Let the cluster settle down while you verify all cluster channel status

Top

Heartbeat and Keep Alive

When you are defining cluster-sender channels and cluster-receiver channels choose a value for HBINT or KAINT that will detect a network or queue manager failure in a useful amount of time but not burden the network with too many heartbeat or keep alive flows.

MQ v 5.3, "Clustering", SC34-6061-02, page 79 [95/183]

On platforms other than z/OS, if you need the functionality provided by the KAINT parameter (Keep Alive), use the Heartbeat Interval (HBINT) parameter,

MQ v 6.0, "MQSC Reference", SC34-6597-00, page 130 [150/501]

Top

What about my applications?

You need not alter any of your applications if you are going to set up a simple MQ cluster. The applications name the target queue on the MQOPEN(queue_name) call as usual and need not be concerned about the location of the queue manager [MQCONNECT(qmgr_name)]

Top

Using clusters for workload management + more than one instance of a queue

Clustering, SC34-6061-02, page 63/183

You can organize your cluster such that the queue managers in it are clones of each other, able to run the same applications and have local definitions of the same queues.

The advantages of using clusters in this way are:

increased availability of your queues and applications
faster throughput of messages
more even distribution of workload in your network

Any one of the queue managers that hosts an instance of a particular queue can handle messages destined for that queue. This means that applications need not explicitly name the queue manager when sending messages. A workload management algorithm determines which queue manager should handle the message.

Top

Workload balancing

When you have clusters containing more than one instance of the same queue, MQ uses a workload management algorithm to determine the best queue manager to route a message to. The workload management algorithm selects the local queue manager as the destination whenever possible. If there is no instance of the queue on the local queue manager, the algorithm determines which destinations are suitable. Suitability is based on the state of the channel (including any priority you might have assigned to the channel), and also the availability of the queue manager and queue. The algorithm uses a round-robin approach to finalize its choice between the suitable queue managers.

If an application opens a target queue so that it can write messages to it, the MQOPEN call chooses between all available instances of the queue. Any local version of the queue is chosen in preference to other instances. This might limit the ability of your applications to exploit clustering.

If it is not appropriate to modify your applications to remove message affinities, there are a number of other possible solutions to the problem. For example, you can

Name a specific destination on the MQOPEN call.
One solution is to specify the remote-queue name and the queue manager name on each MQOPEN call. If you do this, all messages put to the queue using that object handle go to the same queue manager, which might be the local queue manager.
Return the queue-manager name in the reply-to queue manager field.
Use the MQOO_BIND_ON_OPEN option on the MQOPEN call.

Clustering, SC34-6061-02, page 65 to 70/183
v6, pg 60 [78/201]

Top

The cluster workload management algorithm

If a local queue within the cluster becomes unavailable while a message is in transit, the message is forwarded to another instance of the queue but only if the queue was opened (MQOPEN) with the MQOO_BIND_NOT_FIXED open option, of the MQ_Open() specified "MQOO_BIND_AS_Q_DEF" and DEFBIND queue param value is NOTFIXED.

MQ 6.0 Queue Manager Clusters, csqzah07.pdf, SC34-6589-00, page 51

To route all messages put to a queue using MQPUT to the same queue manager by the same route, use the MQOO_BIND_ON_OPEN option on the MQOPEN call. To specify that a destination is to be selected at MQPUT time, that is, on a message-by-message basis, use the MQOO_BIND_NOT_FIXED option on the MQOPEN call.

MQ 6.0 Programming Guide, page 96 [116/601]

The workload management algorithm selects the local queue manager as the destination whenever possible.

from MQ 5.3 Clustering, SC34-6061-02, page 49

On v6 you can change the workload balancing algorithm so that it does not use a preffered-local strategy.

On v5.x, you can use a cluster workload exit, or you can use a different queue manager for your PUTS than you do for your GETS, and this other qmgr would be in the cluster but not have a qlocal X.

CLWLUSEQ := ANY ; { Local, Any, Queue Manager }
The queue manager treats the local queue as another instance of the cluster queue for the purposes of workload distribution.

MQ v6 "MQSC" SC34-6587-00, pg 50 [70/501]

WorkLoad algorithm detailed

[4] (CLWLRANK)
All queues (not queue manager aliases) with a rank (CLWLRANK) less than the maximum rank of all remaining queues are eliminated.
[11] (CLWLPRTY)
If a queue is being chosen: all queues other than those with the highest priority (CLWLPRTY) are eliminated, and channels are kept.

WorkLoad algorithm

Top

Client access to a cluster

SET MQSERVER=QMS3.SVRCONN/tcp/localhost(1423) DEFINE QALIAS(QSAGCLU) TARGQ(QSEBAS) amqsputc QSAGCLU ... .. [ server_1 ] [ client ] <---> . [ gw ] .. [ server_2 ] ...

Com codificar

Top

Straight WLB

Let's make it run !

TQM1 has shared queue WLMQ1
TQM2 has shared queue WLMQ1
TQM3 sees "two" remote cluster queues WLMQ1
an external application (WLG.EXE) does :
- MQCONN() to queue manager TQM3
- MQOPEN() to queue WLMQ1
- MQPUT() messages addressed to WLMQ1 on TQM3
mesages get split between TQM1 and TQM2 (better if larger that 512 bytes)

Let's use Alias

If TQM3 has an alias queue WLMAQ, whose TARGETQ is WLMQ1, the WLG.EXE can write to it, and the messages still get to (split) queues.

External access (fail)

If another (external to the cluster) qm TQM4 writes into RMQ99, a remote queue pointing to queue WLMAQ and manager TQM3, the messages go into TQM3DLQ, TQM3's Dead Letter Queue, with Reason d'2082 = MQRC_UNKNOWN_ALIAS_BASE_Q in Dead-Letter Header, because the message carries the destination Queue Manager field ... and there is no such queue there !

Solution : in the Gateway queue manager (TQM3), set a queue manager alias

See Put & Destination !

Top

SAGCLUSTER

The cluster I have for testing is like this

hostname IP Port Op Sys MQ version Qmgr Name MB version MB name FR/PR --------- -------- ----- ------------------ ----------- ---------- ----------- -------- ---------------------- patan .164.249 2401 wxp 7.5.0.1 CLSPATAN - - PR (server) lab005 .164.25 2401 wxp 7.5.0.1 CLFR1 - - FR (main) 6Q .164.234 1491 w2008 SR2 {64-bit} 7.5.0.1 QMAS 7.0.0.1 BKAS PR (moves to BISC net) p9111 .166.86 2416 SLES 10 (ppc) 7.5 P9111 - - PR p7029 .166.87 2415 SLES 10 (ppc) 7.5 P7029 - - PR labss2 . . RH v4 7.5 CLSSS2 - - PR rhv6-64b .164.32 2401 RH v6.1 {64-bit} 7.5.0.1 CLFR2 8.0.0.2 MB64B FR (mix 32/64 bits) t400 .165.248 1491 wxp 7.5.0.1 (MB7QMGR) 7.0.0.1 MB7BROKER PR (or Client)

Shared queues are QL.DELPHI.GRAW.IN & QL.DELPHI.GRAW.OUT, user is MQ_USER_RAW of group MQ_GROUP_RAW

.---- ( SAGCLUSTER ) -------. | | | .-------. | | p9111 | QL.IN | .-------. | | | .-------. .--------. .----------. | p7029 | QL.IN | MQ | | | .-------. QR.MH | client | ------- | patan | | | T400 | | CLSPATAN | .------. .--------. .----------. | 6q | QL.RSP | | QMAS | | .------. | | | .-------. .-------. | .---| 005 |----| rh64b |--. | CLFR1 | | CLFR2 | .-------. .-------.

Some definitions I have

p7029 : DEFINE QREMOTE(QR.MH) RNAME(QL.RSP) RQMNAME(' ') p7029 : define ql(QL.IN) CLUSTER(SAGCLUSTER) QMAS : define ql(QL.RSP) CLUSTER(SAGCLUSTER) p9111 : define qalias(QL.DADES.TRIG) target(QL.P9111) cluster(SAGCLUSTER) defbind(notfixed) replace

Message sent by T400 into cluster is addressed to queue QL.IN so gets to p7029 using cluster. It has ReplyToQueue(QR.MH) and ReplyToQmgr(PATAN), so we get "mqrc = 2087", as there is no QL.RSP at PATAN qmgr

Peter idea :
Have the putting application, the psuedo requester, specify the real reply queue name in the Reply To Queue field of the MQMD of the 'request' message, and fill in the Reply To QM field with a value called VITOR_WUZ_HERE, or any other value you like. Just don't leave it blank or don't fill it in with the name of a real QM.
The message will arrive at the 'replying' app with the reply to queue field filled in with the real reply q name, and the Reply To QM filled in with VITOR_WUZ_HERE. When the app 'replies', it opens the reply queu specifying both the destination queue (the real reply q) and the destination QM (VITOR_WUZ_HERE).
Insure there is a QM Alias called VITOR_WUZ_HERE that routes messages to an XMITQ that gets you back to a Queue Manager in the cluster. I'm assuming the replying app is connected to a QM outside the cluster. On the QM in the cluster that has the RCVR channel from the QM outside the cluster create a QM Alias called VITOR_WUZ_HERE that has a blank Remote Q, blank Remote QM Name and blank XMITQ attribute. As messages arrive destined for a QM called VITOR_WUZ_HERE, this alias will blank out the destination QM and MQ name resolution kicks in looking for that reply queue without a specific QM, and the message will load balance inside the cluster.

Top

Cluster Scripting - discovering cluster params

On any qmgr:

On FR:

set QMN=QMFR1 echo DISPLAY CLUSQMGR(*) CONNAME QMTYPE STATUS | runmqsc %QMN% - display queue managers in cluster echo DISPLAY QCLUSTER(*) CLUSQMGR | runmqsc %QMN% - d isplay queues in cluster

Top

Cluster monitoring

as all messages going off-queue manager will pass through the SYSTEM.CLUSTER.TRANSMIT.QUEUE, monitor its depth appropriatelly :

DIS CHSTATUS(*) WHERE(XQMSGSA GT 1)
SYSTEM.CLUSTER.COMMAND.QUEUE depth should tend to 0
monitor the depths of the SYSTEM.CLUSTER.* queues

display ql(SYSTEM.CLUSTER.*) CURDEPTH 1 : display ql(SYSTEM.CLUSTER.*) CURDEPTH AMQ8409: Display Queue details. QUEUE(SYSTEM.CLUSTER.COMMAND.QUEUE) TYPE(QLOCAL) CURDEPTH(0) AMQ8409: Display Queue details. QUEUE(SYSTEM.CLUSTER.HISTORY.QUEUE) TYPE(QLOCAL) CURDEPTH(0) AMQ8409: Display Queue details. QUEUE(SYSTEM.CLUSTER.REPOSITORY.QUEUE) TYPE(QLOCAL) CURDEPTH(4) AMQ8409: Display Queue details. QUEUE(SYSTEM.CLUSTER.TRANSMIT.QUEUE) TYPE(QLOCAL) CURDEPTH(0)
monitor the status of all SYSTEM.CLUSTER.* channels - all of them must be "running"

display chstatus(*) STATUS 6 : display chstatus(*) STATUS AMQ8417: Display Channel Status details. CHANNEL(TO.IB9QMGR) CHLTYPE(CLUSSDR) CONNAME(127.0.0.1(2415)) CURRENT RQMNAME(IB9QMGR) STATUS(RUNNING) SUBSTATE(MQGET) XMITQ(SYSTEM.CLUSTER.TRANSMIT.QUEUE) AMQ8417: Display Channel Status details. CHANNEL(TO.SMQ) CHLTYPE(CLUSRCVR) CONNAME(127.0.0.1) CURRENT RQMNAME(IB9QMGR) STATUS(RUNNING) SUBSTATE(RECEIVE)

MQ cluster best practices {sagpdf}, publib

Cluster Health monitoring tool (Delphi, of course)

Input - a file with

SET CNAME= // cluster name SET FR1NAME= // full repository (1) qmgr name SET FR2NAME= // full repository (2) qmgr name SET NUMQMCL= // number of queue managers in cluster (apart of FR's) :: "N" times SET QM01NM= // 1-st queue manager - name SET QM01LS= // 1-st queue manager - listener port SET QM02NM= // 2-nd queue manager - name SET QM02LS= // 2-nd queue manager - listener port

Output shall be

Cluster connectivity & availability monitoring tool

In all PR's of the cluster, we can install a "responder" waiting on a specific queue.

The monitor program shall have a list of queue managers and shall send a msg to all of them, verifying a msg can reach there and come back.

This shall assert the cluster shared objects availability to some level.

AMQSCLM - the cluster queue monitoring sample program

*** publib ***

Top

Multiple cluster XMITQ

All you do to use multiple cluster transmission queues is to change the default cluster transmission queue type on the gateway queue manager. Change the value of the queue manager attribute DEFCLXQ

Changing the default to separate cluster transmission queues to isolate message traffic

The default cluster transmission queue is set as a queue manager attribute, DEFCLXQ. Its value is either SCTQ or CHANNEL. New and migrated queue managers are set to SCTQ. You can alter the value to CHANNEL.

Cluster transmission queues and cluster-sender channels

The values of DefClusterXmitQueueType are MQCLXQ_SCTQ or MQCLXQ_CHANNEL.

MQCLXQ_SCTQ
All cluster-sender channels send messages from SYSTEM.CLUSTER.TRANSMIT.QUEUE.
The correlID of messages placed on the transmission queue identifies which cluster-sender channel the message is destined for.
SCTQ is set when a queue manager is defined. This behavior is implicit in versions of WebSphere MQ, earlier than version 7.5. In earlier versions, the queue manager attribute DefClusterXmitQueueType was not present.
MQCLXQ_CHANNEL
Each cluster-sender channel sends messages from a different transmission queue.
Each transmission queue is created as a permanent dynamic queue from the model queue SYSTEM.CLUSTER.TRANSMIT.MODEL.QUEUE.

DefClusterXmitQueueType (MQLONG)

You have some choices to make when you are planning how to configure a queue manager to select a cluster transmission queue.

Clustering: Planning how to configure cluster transmission queues

If you set the queue manager attribute DEFCLXQ to CHANNEL, a different cluster transmission queue is created automatically from SYSTEM.CLUSTER.TRANSMIT.MODEL.QUEUE for each cluster-sender channel.

Cluster queues

display qmgr 1 : display qmgr AMQ8408: Display Queue Manager details. QMNAME(SMQ) ACCTCONO(DISABLED) DEADQ(QL.DLQ) DEFCLXQ(SCTQ)

Top

{bestp}

Clustering best practices, hints, etc

all full repositories must have senders manually defined to all others (if more than 2)
"Time to settle" - a good hint that updates have been processed is no messages waiting on the SYSTEM.CLUSTER.COMMAND.QUEUE on any repositories (full or partial)
if taking a queue manager down temporarily (e.g. for this kind of maintenance) remember to use SUSPEND first
If it is a FR, release it from that role first : alter qmgr repos(' ')
[TR] use a dedicated queue manager when you are unable to obtain a dedicated host (for a FR)
[TR] only one explicitly defined CLUSSDR (on all PR's)
ReplyToQueue can not be a clustered queue
Solution (Peter Pokay@mqseries.net) :
1. on Request message, specify a fake qmgr name, as "VITOR_WUZ_HERE"
2. on responding qmgr, define QREMOTE so Response message uses clustering again :
  define qremote(VITOR_WUZ_HERE) RNAME(' ') RQMNAME(' ') XMITQ(' ') replace
MQ cluster best practices {sagpdf}, as
- general cluster "hygiene"
- performance
- avoiding problems before they arise
Also "moving full repositories"
Read this:
Never pretend that two different installations are the same queue manager (by trying to give a new installation the same QMGR name, IP address etc)
to stop gracefully a cluster queue manager, follow this procedure:
- disable cluster queues
- stop cluster channels
- if you have to modify it, remove qmgr from cluster (make FR first)

Repository Query command

If you want to have a look into cluster repository, use this command:

c:\> amqrfdm /? WebSphere MQ Repository Query Program written by Paul Clarke Usage : amqrfdm [-m QMgrName] [-d]

Interesting summary

#1 Regardless of how many FRs you have, each FR should have a manual CLUSSNDR defined to every other FR.

#2 If every FR has a CLUSSNDR to every other FR, each FR will know about every cluster attribute on every QM in the cluster.

#3 A PR will only ever publish info to 2 FRs. A PR will only ever subscribe to 2 FRs. Period. It doesn't matter how many manual CLUSSNDRs you define on that PR. A PR will only ever send its info (publish) to 2 FRs and will only get updates (subscribe) from 2 FRs.

#4 You should only define one CLUSSNDR to one FR from a PR.

#5 If 2 FRs go down in your cluster, your cluster will be able to send messages just fine. But any changes to cluster definitions become a problem. Any PRs that used both of these down FRs will still function for messaging, but they will not be made aware of any changes in the cluster because both of it's FRs are N/A.

#6 If two of your FRs are down, and you still have other FRs, you could go to your PRs and delete the CLUSSNDR to the down FR, define a CLUSSNDR to an available FR and issue REFRESH CLUSTER(*) REPOS(YES). This would cause your PR to register with an available FR and thus pick up cluster changes.

#7 In a properly designed system the likelihood of 2 FRs being down is next to zero, so the need for more than 2 FRs is next to zero. And even if both FRs are down it doesn't mean your cluster will come to a screeching halt.

Just use 2 FRs.

Replace FR steps

If you want to keep IP or QMGR name, keep in mind QMID (includes CRDATE and CRTIME) will certainly be different.

On the local qmgr, use DISPLAY Q(*) WHERE (CLUSTER NE ' ') to see which queues are shared in the cluster.

stop sharing objects by setting CLUSTER(' ')
make sure all queues are empty (as probably this qmgr will never come back)
(pend) stop CLUSSDR channel (?)
remove qmgr from cluster, using SUSPEND command
take down qmgr
add new qmgr (with same name) to cluster
set CLUSTER('cluster-name') property on objects that need to be shared

Garbage collector problems

When objects in the cluster repository cache are modified (for example, changing an attribute on a cluster queue), the details for that object are republished to the cluster. Previous records for the object may persist for some time in the cluster cache, so that applications currently using them (for instance having opened the queue for output) can continue processing without interruption.
Periodically, the repository process attempts to 'garbage collect' these older records, checking whether they are still in use. Where multiple such records exist for a particular cluster queue manager object (the record in the cache which stores information about the channel definition to reach a remote queue manager), and these are held in use for a prolonged period, an error in the logic leads to the possibility that the storage for parts of these queue manager records can be reused (for example overwritten to hold another object) while actually still required.

Solution:

REFRESH CLUSTER(*) REPOS(YES)

wwqa

Few Q&A

how many FR ? 2
DISCINT=0 ? no
2 qmgrs just to be FR ? yes, on a separate server
does the "response" message use the cluster ? No

Els meus dubtes del clustering

QUEUE(SYSTEM.CLUSTER.REPOSITORY.QUEUE) - mante la configuracio ?
quina diferencia hi ha entre SUSPEND i RESET CLUSTER ? Treuen el gestor del cluster ...
Similarment, diferencia entre RESUME i REFRESH ...
com saber quants gestors hi ha al cluster ... i quina IP tenen ?
Resp.- "DISPLAY CLUSQMGR(*) CONNAME QMTYPE STATUS"
com saber quins objectes s'estan oferint al cluster i qui és el seu propietari ?
Resp.- "DISPLAY QCLUSTER(*) CLUSQMGR" per veure les cues
si tenim una mateixa cua oferta al cluster des diversos gestors, i una d'elles es posa PUT_INHIBITED, les dades es distribuiran entre les cues restants ?
com treure un gestor del cluster ? i si és un FR ?
Resp.- "suspend qmgrname"

Top

Links

advanced tasks, as how to modify a simple cluster in various ways
task 10: Removing a queue manager from a cluster [*****]
A. Beardsmore "migrating v6 qmgr to v7" (developerWorks)
good intro : getting started with queue manager clusters
good article : cluster design and operation
hints and tips, as
- stay current on MQ maintenance
- remember to include the cluster parameter on all of the cluster object definitions (queues, channels, queue manager, and so on)
- verify that your cluster channels connecting to the repository queue manager go to running status when the queue manager starts
T Rob's cluster design and operation, cluster health check, as
- unique CLUSRCVR names per cluster : dont use "TO.<qmn>" but "<cluster name>.<qmgr name>"
- use MCAUSER on CLUSRCVR channel to restrict administrative access
- also on RCVR, RQSTR, CLUSRCVR or SVRCONN channel, including the ones named SYSTEM.DEF.* and SYSTEM.AUTO.*, even on queue managers where channel auto-definition is disabled
MD05 - design considerations for large clusters
configuring request/reply to a cluster
collect and analyze WebSphere MQ data to solve problems with clusters
use Omegamon and Tivoli to set PUT(DISABLED) and change the cluster workload algorithm
I think managing CLWLPRTY queue attribute is far better - you never end up with "no queue available" situation
We need a MB admin agent flow !

Books

SC34-6061 : Queue Manager Clusters. Online url (v 5.3)
[\\MQ\Books\V6] SC34-6589 : Queue Manager Clusters. PDF (v 6.0)
[\\MQ\MQ_V7\Llibres] Clustering_and_HA_in_ESB_22491360.pdf
[\\MQ\MQ_V7\Llibres] WebSphere_MQ_7.0_Creating_a_Cluster.pdf
[\\dep03x\SWU\BCN_WSTC_2008\Materials] TMM02 - Advanced_Clustering.pdf
[\\dep03x\SWU\BCN_WSTC_2008\Materials] TMM10 - Introduction to WMQ Clustering.pdf