don't 'tweak' the Workflow Server Group without really checking

Rickayzen, Alan alan.rickayzen at sap.com
Thu Oct 12 08:46:01 EDT 2006


Hi Mike,

That's an absolute peach of a post :-)

Any chance of you putting it on the NetWeaver BPM forum on
www.sdn.sap.com ?

Sue, I know I'm poaching but I hope you'll forgive me this once bearing
in mind that everyone stands to benefit from this advice.

It's too late for Teched US now but I can repay in kind by setting up a
birds-of-a-feather session at Teched Amsterdam and Bangalore just so
that some of you get the chance to meet each other face-to-face.

Best regards,

Alan Rickayzen

-----Original Message-----
From: sap-wug-bounces at mit.edu [mailto:sap-wug-bounces at mit.edu] On Behalf
Of Mike Gambier
Sent: Mittwoch, 11. Oktober 2006 11:26
To: sap-wug at mit.edu
Subject: Tip: don't 'tweak' the Workflow Server Group without really
checking

Hi,

This is not a question so feel free to ignore it if you're not
interested.

Recently our Basis people decided to play with the Server Group for the 
Event Delivery job in Workflow (SWEQSRV) without keeping a proper eye on
the 
results. (The settings are visible in SWEQADM for those who don't know.)

We have millions of Workflow instances running in our system (that's no 
joke) and deal with about 50,000 triggering events every day. That's
okay 
though because we have 16 Application Servers on tap and Workflow had
access 
to four of them (before the changes) and 50 dialog processes on each,
all 
adding up to a whopping 200 dialog processes on demand. We used to be
able 
to deliver up to 200 events per minute (12,000 per hour) depending on
how 
the nature of the events being delivered.

The Basis guys decided to increase the number of App Servers in the
Server 
Group but decreased the number of processes in each. Logical thinking
you 
may argue, as the number of dialog processes remainined at 200, but what

happened was Workflow started to lose the battle for dialog processes on
ALL 
servers during the online day because we have thousands of users hogging

them.

You may or may not know that if Workflow fails to secure a dialog
process at 
runtime it creates a tRFC entry for WORKFLOW_LOCAL_100 in table
ARCFSSTATE 
so that a batch job called RSARFCEX can run later on and pick up the
slack. 
It does this for Events, Method Calls, New Tasks...pretty much
everything.

Had the Basis people looked a bit harder they would have spotted a
sudden 
surge of entries being written to ARFCSSTATE following their changes.

It's been a month now and Workflows have 'stalled' everywhere...........
We 
have 20 MILLION tRFCs for Workflows that cannot be processed by RSARFCEX

because SAP's code simply can't cope with that many records (it blows
its 
own internal storage even trying to process a single day's work!!).

Basis have reverted their changes and the tRFC queue is hardly being hit
by 
Workflow at all anymore, so we're almost back to where we were. We are
also 
trying to reprocess the tRFC queue using our own tools and asking SAP
for 
help with theirs via OSS.

Granted the numbers we are dealing with is extreme, but I urge Workflow
and 
Basis people alike to keep an eye on tRFCs for Workflow whenever they
meddle 
with the SWEQADM settings for Parallel Event Delivery.

MGT


_______________________________________________
SAP-WUG mailing list
SAP-WUG at mit.edu
http://mailman.mit.edu/mailman/listinfo/sap-wug




More information about the SAP-WUG mailing list