Tip: don't 'tweak' the Workflow Server Group without really checking

Mike Gambier madgambler at hotmail.com
Wed Oct 11 05:26:09 EDT 2006


Hi,

This is not a question so feel free to ignore it if you're not interested.

Recently our Basis people decided to play with the Server Group for the 
Event Delivery job in Workflow (SWEQSRV) without keeping a proper eye on the 
results. (The settings are visible in SWEQADM for those who don't know.)

We have millions of Workflow instances running in our system (that's no 
joke) and deal with about 50,000 triggering events every day. That's okay 
though because we have 16 Application Servers on tap and Workflow had access 
to four of them (before the changes) and 50 dialog processes on each, all 
adding up to a whopping 200 dialog processes on demand. We used to be able 
to deliver up to 200 events per minute (12,000 per hour) depending on how 
the nature of the events being delivered.

The Basis guys decided to increase the number of App Servers in the Server 
Group but decreased the number of processes in each. Logical thinking you 
may argue, as the number of dialog processes remainined at 200, but what 
happened was Workflow started to lose the battle for dialog processes on ALL 
servers during the online day because we have thousands of users hogging 
them.

You may or may not know that if Workflow fails to secure a dialog process at 
runtime it creates a tRFC entry for WORKFLOW_LOCAL_100 in table ARCFSSTATE 
so that a batch job called RSARFCEX can run later on and pick up the slack. 
It does this for Events, Method Calls, New Tasks...pretty much everything.

Had the Basis people looked a bit harder they would have spotted a sudden 
surge of entries being written to ARFCSSTATE following their changes.

It's been a month now and Workflows have 'stalled' everywhere........... We 
have 20 MILLION tRFCs for Workflows that cannot be processed by RSARFCEX 
because SAP's code simply can't cope with that many records (it blows its 
own internal storage even trying to process a single day's work!!).

Basis have reverted their changes and the tRFC queue is hardly being hit by 
Workflow at all anymore, so we're almost back to where we were. We are also 
trying to reprocess the tRFC queue using our own tools and asking SAP for 
help with theirs via OSS.

Granted the numbers we are dealing with is extreme, but I urge Workflow and 
Basis people alike to keep an eye on tRFCs for Workflow whenever they meddle 
with the SWEQADM settings for Parallel Event Delivery.

MGT





More information about the SAP-WUG mailing list