Distributed Processing mode
callas pdfToolbox Server/CLI can be used in distributed processing mode in which all tasks are distributed over the network to as many "satellites" as present and results are sent back to the 'origin of processing'. Therefore, pdfToolbox Server/CLI may be started in different modes:
- "Dispatcher" controls which tasks are to be processed by which machines: the "Satellites". There must be atleast one 'Dispatcher' at all times in the network.
- "Satellite" receives tasks from the "Clients" or directly from the Dispatcher (if the Dispatcher is run with hotfolders), processes them and sends them back to the Clients.
- "Client" asks the Dispatcher for Satellites and after receiving an available Satellite, it sends the tasks to the Satellites and receives the results after processing.
- "Monitor" monitors the Dispatcher and displays the current situation.
All of these modules can run on the same or on different machines. There needs to be at least one Dispatcher and at least one Satellite in the network. In order to submit tasks, at least one Client is required.
Distributed processing is supported for Windows, MacOS, Linux and SunSolaris. It is not available on AIX.
- Clients send a request for Satellite to Dispatcher
- Dispatcher assigns a Satellite and sends the address to the Client
- Client sends the task to the Satellite
- Satellite sends the result back to the Client
Starting a Dispatcher
--dispatcher [--port=<port number>]
Here, 'port' is the port number on which the dispatcher can be called over the network. This port is set to 1200 as default.
Starting a Dispatcher using the ServerUI
There is also a possibility to start the server as a dispatcher on Windows and MacOS using the user interface (Desktop). Also hotfolder-processing can be set up here. In this mode, the dispatcher will also distribute tasks which are sent by other clients.
Starting a Satellite
--satellite --endpoint=<Dispatcher IP number>[:<dispatcher port]> [--port=<port number>] [--connections=<number of concurrent connections]
--satellite --endpoint=10.0.0.100:1200 --port=1201
In order to process tasks, at least one Satellite is required.
Here, 'endpoint' is the IP number and the port of the Dispatcher. Default is 1200, but it can be changed at the start of the Dispatcher (see above).
'port' is the one that the Satellite is using in order to communicate with the Clients. The port of the Satellite is 1201 as default and can be defined optionally to another one port at the startup.
It is highly recommended to use separate port numbers for the communication between Satellite and Dispatcher than for Satellite and Client.
Starting a Satellite using the ServerUI
There is also a possibility to start the server as a Satellite on Windows and MacOS using the user interface (Desktop). In this mode, the Satellite will not process any 'hotfolder jobs' on the computer.
A Satellite will always use the number of CPUs on the respective machine as the number of concurrent connections/processes. To limit this number, the Satellite has to be started by CLI with the --connections parameter.
The number of connections should not exceed the number of CPUs, as this might reduce the performance per process and could result in system instability.
Assign more than one Dispatcher to a Satellite
In order to connect a Satellite with more than 1 Dispatcher, it is possible to define more than one (--) endpoint. Please refer to the end of this chapter "Fallback for Dispatcher".
Distribute a process using a Client
The client is called using any regular pdfToolbox command line call.
In order to distribute the call over the network, the command line parameters --dist and --endpoint are added. The client will first ask the dispatcher to receive a satellite connection and then send the command to the satellite and wait until the result is sent back from the satellite.
pdfToolbox --dist --endpoint=<dispatcher IP number>[:<dispatcher port>] <any regular pdfToolbox call>
pdfToolbox --dist --endpoint=10.0.0.100:1200 <anyProfile.kfpx> <myPDF.pdf> pdfToolbox --dist --endpoint=10.0.0.100:1200 --redistill <myPDF.pdf>
Variables and resources with Distributed processing
When using normal Profiles, nothing has to be considered when processing a file. All needed resources (like ICC profiles or "Place content"-Templates are included in the Profile, the kfpx-file.
But sometimes, some enhanced scripting of, e.g. a Template, requires external resources, which are defined/referenced by a Variable.
To ensure that these resources are transferred to the Satellite during Distributed processing, a variant of the
--setvariable=<variable> option can be used:
--setvariablepath=<path to ressources file or folder>
Set the type of satellite (Optional)
As some kind of tasks shall only be processed on a defined type of Satellite, it is possible to start a Satellite with one or more types set.
Every CLI call for the processing of a task can be adapted to one or more types of allowed Satellites.
Set typification for Satellite:
pdfToolbox --satellite --endpoint=<dispatcher IP number> --satellite_type=<type> [--satellite_type=<type>]
pdfToolbox --satellite --endpoint=10.0.0.100 --satellite_type=A pdfToolbox --satellite --endpoint=10.0.0.100 --satellite_type=A --satellite_type=B
Set typification for Client:
pdfToolbox --dist --endpoint=<dispatcher IP number> --satellite_type=<type> [--satellite_type=<type>] <any regular pdfToolbox call>
pdfToolbox --dist --endpoint=10.0.0.100 --satellite_type=A <any regular pdfToolbox call>
• If a Satellite has been started with a typification, only Client calls with the same type set will be send to this satellite.
• If a Client call contains a number of typifications, all typifications must match with those set for a satellite.
• If a Client call has no set typification, it can be processed on all satellites, even if they have been started with a typification.
• The <type>-string has to be alpha-numeric and is case sensitive.
Avoid local processing
As a fallback, processing might happen locally (on the Client or on Dispatcher if run in hotfolder mode) if an action cannot be distributed, a Satellite cannot be assigned within a timeframe or if no Dispatcher is available.
Local processing might not be desired for several reasons.
To avoid such local processing, the Client call as well as the start of a Dispatcher (when used as a server for hotfolders) can be amended with the option:
Example for Client:
pdfToolbox --dist --endpoint=<dispatcher IP number> --nolocal <any regular pdfToolbox call>
Local processing will be disabled and tasks will fail if no Satellite is ready for processing.
Example for Dispatcher:
pdfToolbox --dispatcher --nolocal
Here --nolocal is forwarded to child processes for hotfolder jobs. It has no effect on the processing of non-hotfolder files from a Client distributed by the Dispatcher.
If a Client wants to disable local processing, the --nolocal setting has to be set in each CLI call of the Client.
Fallback for Dispatcher
In some workflow systems, a fallback for a Dispatcher might be required to ensure production stability.
To cover this, a number of Dispatchers can be set up, which will run individually.
One or multiple Dispatchers can be assigned to a Satellite.
Define multiple Dispatcher to a Satellite
Connects a satellite to two (or more) Dispatchers.
pdfToolbox --satellite --endpoint=<dispatcher 1 IP> [--endpoint=<dispatcher 2 IP> [--endpoint=<dispatcher IP>]
Set multiple Dispatcher in a Client call
Distributes a Client call via two (or more) Dispatcher. First reachable Dispatcher with free satellite will process the task.
pdfToolbox --dist --endpoint=<dispatcher 1 IP> --endpoint=<dispatcher 2 IP> [--endpoint=<dispatcher IP>] <any regular pdfToolbox call>
Define a timeout for processing
In some workflow systems, long running processes might not be allowed and shall be cancelled if a given timeframe is reached.
Due to the flexibility of distributed processing, a variety of timeouts for the individual parts can be set:
- for the Client call
- for the Satellite
- for the Dispatcher
Timeout for processing on a Satellite
- When defining a timeout for the Client call, the execution will be cancelled after the given period.
- When defining a timeout when starting a Satellite, all tasks processed by this Satellite will be cancelled after the given period.
- If both are defined, the shorter timeframe will be used.
Example for Client:
pdfToolbox --dist --endpoint=<dispatcher IP> --timeout_satellite=<seconds> <any regular pdfToolbox call>
Example for Satellite:
pdfToolbox --satellite --endpoint=<dispatcher IP> --timeout=<seconds>
Timeout for local processing of Dispatcher or Client
A processing timeout (if no satellite is available or if the type of task cannot be distributed) for the fallback to local processing on the Client or the Dispacher (when used as a server for hotfolders) can also be defined.
If both are defined, the shorter defined timeframe will be used.
Example for Client:
pdfToolbox --dist --endpoint=<dispatcher IP> --timeout=<seconds> <any regular pdfToolbox call>
Example for Dispatcher:
pdfToolbox --dispatcher --timeout=<seconds>
Timeout for Dispatcher to search for Satellites
Additionally, also a timeout for the Dispatcher can be set, which will define the timeframe in which a Satellite is searched.
This can be set individually for every Client call.
Example for Client:
pdfToolbox --dist --endpoint=<dispatcher IP> --timeout_dispatcher=<seconds> <any regular pdfToolbox call>
When running the Dispatcher in hotfolder-mode, the setting can be defined when starting the Dispatcher (will have effect on all distributed files from hotfolders then):
Example for Dispatcher:
pdfToolbox --dispatcher --timeout_dispatcher=<seconds>
If a timeout for satellites or dispatcher is set and the --nolocal option has been defined, the task will not be processed locally. Processing will end up in an error.
Setting --timeout_... or --nolocal parameters in the "Additional CLI parameter" area of the Server UI when defining hotfolder jobs is not supported.
Using the CLI-Monitor
pdfToolbox --monitor --endpoint=<dispatcher IP>:<dispatcher port> [--endpoint=<dispatcher IP>:<dispatcher port>]
Monitor is optional and mirrors the command line output of the dispatcher to another computer. Endpoint is the IP number and the port of the dispatcher.
When using more than one Dispatcher, also multiple Dispatcher IPs can be entered and observed.
- Server: Regular pdfToolbox Server/CLI license required
- Dispatcher: Dispatcher pdfToolbox Server/CLI license required
- Satellite: Regular pdfToolbox Server/CLI license required
- Monitor: No license required
- Client: No license required
Distributed processing can be combined with the License Server in order to activate some or all instances of these modules via this License Server. The setup is described here.
Distributed Processing in Enfocus Switch
Distributed Processing can also be used within Enfocus Switch.
Just configure the respective settings within the configurator for steps which shall be distributed. If all tasks shall be processed on other machines (Satellites), no local Server license is needed.
Some installations made better experiences, when the setting "Concurrent transfers to the same site" in Switch was set to "Automatic". Also the "Default number of slots for concurrent elements" should not be "0" (zero).
Known limitations until version 11.1
Due to technical limitatations of the used communication method (SOAP), it is not possible to process files greater than 2 GB using distributed processing until pdfToolbox v.11.1.
callas has eliminated this technical limitation since version 11.1. Until this version, files will be processed locally on the Client.