Setting number of executors to a jobflow

CloverDX Customer Portal
Forum
Setting number of executors to a jobflow

hewills

April 18, 2017 00:00

Answered

I have a jobflow with 8 graphs, that I'm running on Clover Server. Each graph is truncating a target table, selecting data from my source, and loading the target table. They are connected and running synchronously in the jobflow.
I created a "global" parameter called EXECUTORS, with a value of 5.

I set the jobflow "number of executors" to this parameter, but the graphs still only run one at a time.
I also tried....
Setting this value on each of the graphs.
Setting all the graphs, plus the jobflow, with this parameter.
Making the parameter a string.
Making the parameter an integer.
Combination of all these things.

Every time it still only runs one graph at a time. From reading the documentation, I thought the jobflow would run more than 1.
What am I doing wrong? Pic of jobflow below.

jobflow.png

jobflow.png

Comments 5

admin

April 21, 2017 08:49
0

Comment actions Permalink
Hi,

Actually, the "Number of Executors" attribute is applied to a situation when a single ExecuteGraph component executes one child graph multiple times. For example, you can have a ListFiles component at the beginning and the ExecuteGraph then starts a child graph for each file (for each record coming from ListFile component to the ExecuteGraph component). Then you can setup what is the maximum number of runs of the same child graph at the same time (by setting up the Number of Executors attribute).

In your design, the reason why the graphs are not running simultaneously is the fact that each ExecuteGraph component starts (executes its graph) when it receives the input data from the previous component to the connected input port. As long as the components need data from the previous ones, they have to be executed one after another and cannot run simultaneously.

My question would be: Is there a business reason to connect the ExecuteGraph components to each other? Could you, by any chance, let them be parallel with no input edge connected and just connect outputs to the TokenGather? This way they would all work at the same time. See the example picture below (in a design like that the components work in parallel).

Capture.PNG

I hope this helps. Have a nice day.

Eva
- Capture.PNG
hewills

April 21, 2017 16:51
0

Comment actions Permalink
Thanks for the reply Eva, that makes sense.

You're right that in my example the jobs could be run in parallel, except if too many run at a time, it causes issues.
We have about 500 graphs that need to be executed, so I'm trying to figure out the best way to organize them.

I was hoping that there was a way to place ~100 graphs in one jobflow, and the jobflow itself would limit the number that run in parallel, using the 'number of executors' parameter.
hewills

April 24, 2017 18:50
0

Comment actions Permalink
I decided to try setting up the "dynamic table load", where I feed a list of table names and parameters to one ExecuteGraph. This looks like a better solution for us.
admin

April 25, 2017 06:39
0

Comment actions Permalink
Hi,

You are right, this is definitely one of the options. This way you can use the "Number of Executors" to control a number of simultaneously running instances of the child graph. However, similarly to the "Number of Executors", you can also setup maximum number of running instances of the same graph/jobflow in case you don't execute it from a single component. This can be managed by CloverETL Server.

To do so, go to the CloverETL Server application -> Sandboxes -> Config Properties. The properties that you might be looking for are as follows:
1. max_running_concurrently (max number of concurrently running instances of the job) and
2. enqueue_executions (boolean value; if it is true, executions above max_running_concurrently are enqueued, if it is false executions above max_running_concurrently fail).

Navigate to any sandbox, jobflow or graph in the above-mentioned menu on the Server, in the "Create new config property" section, choose a parameter from a list, enter the desired value and add the parameter to the selected location.

Please note that it is applied per file, it means that it will work only if you call always the same graph (with a different parameter, for example).

Please give this a try and let me know if this is what you have been looking for.

Eva
hewills

April 25, 2017 22:42
0

Comment actions Permalink
Now that I understand how the executors work, I don't think the 'max_running_concurrently' parameter is needed. But it's nice to know in case we need it for a different scenario.
I setup the ExecuteGraph with dynamic table parameters and metadata, and so far it's working great. Thanks!

Please sign in to leave a comment.

Quick links

Access my products

SUPPORT & SERVICES

Community

RESOURCES