Invoke Server Developer's Manual

This document describes how to develop a Ninf-G Invoke Server.

1. Introduction
2. Specifications of Invoke Server
- 2.1 Detailed overview of Invoke Server
- 2.2 Protocol between a Ninf-G Client and Invoke Server
Appendix A. How to specify the Invoke Server
Appendix B. Miscellaneous information
- B.1. Job Timeout
- B.2. Redirect stdout/stderr is implemented using files

1. Introduction

A Ninf-G Client invokes a Ninf-G Executable on the server machine when a function requiring initialization of function/object handles, such as grpc_function_handle_init(), is called. Ninf-G, Version 2, implements the remote process invocation using the Globus Toolkit's Pre-WS GRAM feature. Implemented using the Globus API, the invocation mechanism has been embedded in Ninf-G. In order to utilize other systems, such as WS GRAM, UNICORE, or Condor for remote process invocation, Ninf-G, Version 4, implements the invocation mechanism as a separate module called "Invoke Server." This design enables users and developers to implement and add a new Invoke Server that can utilize any job invocation mechanism.

Ninf-G Version 4.2.0 includes the following Invoke Servers:

Invoke Server for WS GRAM, implemented in Python (GT4py)
Invoke Server for SSH, implemented in C (SSH)
Invoke Server for Condor, implemented in Java (Condor)
Invoke Server for Pre-WS GRAM, implemented in C (GT2c)
Invoke Server for WS GRAM, implemented in Java (GT4java)
Invoke Server for UNICORE, implemented in Java (UNICORE)
Invoke Server for NAREGI Super Scheduler, implemented in Java (NAREGISS)

1.1 Overview of a typical client application

Here is a typical flow of a Ninf-G Client application:

(1) grpc_initialize()
Initializes data structures used by the Ninf-G Client.
(2) grpc_function_handle_init()
Creates a function/object handle which requests remote process invocation. The request will be processed and a Ninf-G Executable will be created on the server machine. When the Ninf-G Executable is created, it connects to the Ninf-G Client to establish a TCP connection between the Ninf-G Executable and the Ninf-G Client.
(3) grpc_call() or grpc_call_async()/grpc_wait_any()
Calls the remote function, i.e. (3.1) the Ninf-G Client sends arguments to the Ninf-G Executable, (3.2) the Ninf-G Executable performs some form of computation, and (3.3) the Ninf-G Executable sends the results to the Ninf-G Client.
(4) grpc_function_handle_destruct()
Requests the Ninf-G Executable to terminate its process. If an error occurs during the termination, the Ninf-G Client requests the Invoke Server to kill the Ninf-G Executable.
(5) grpc_finalize()
Frees the data structures used by the Ninf-G Client.

Invoke Server is required to implement initialization and finalization of the function/object handles which are described in steps (2) and (4).

1.2 Requirements for underlying middleware

The only requirement for underlying middleware is that the middleware must be capable of remote process invocation. Examples of such middleware include the Globus Toolkit Pre-WS GRAM, Globus Toolkit WS GRAM, Unicore, Condor, and SSH.

1.3 Implementation overview

Invoke Server is an adapter for the underlying middleware and it handles requests from a Ninf-G Client. Invoke Server analyzes and processes the request sent from the Ninf-G Client and replies to the Ninf-G Client. For example, if Invoke Server receives a JOB_CREATE request from the Ninf-G Client, Invoke Server creates a Job ID, returns the Job ID to the Ninf-G Client, and invokes the job processes called for in the request.

Invoke Server can be implemented using any language. The details of the protocol existing between the Ninf-G Client and Invoke Server are described in Section 2.

1.4 Execution flow

This section describes a sample RPC flow to a server called serverA via the Invoke Server, IS_SAMPLE.

(Prerequisite)
(1) A client configuration file that describes that Invoke Server IS_SAMPLE is used for RPC to serverA must be prepared.
(grpc_function_handle_init())
(2) The Ninf-G Client requests Invoke Server IS_SAMPLE to create a function/object handle.
(3) The first time IS_SAMPLE is required to create a function/object handle, the IS_SAMPLE process is spawned by the Ninf-G Client on the same machine. ${NG_DIR}/bin/ng_invoke_server.IS_SAMPLE is a command for spawning an IS_SAMPLE process.
(4) The Ninf-G Client and IS_SAMPLE communicate using three pipes (stdin, stdout, and stderr).
(5) When grpc_function_handle_init() is called, the Ninf-G Client sends JOB_CREATE request to IS_SAMPLE, followed by the required information (e.g., the hostname and port number of the remote server), and JOB_CREATE_END.
(6) When IS_SAMPLE receives JOB_CREATE request, IS_SAMPLE returns "S" to the Ninf-G Client, which indicates that the request has been received by the Invoke Server.
(7) IS_SAMPLE generates a new Job ID that corresponds to the Request ID that was transferred with the JOB_CREATE request, and notifies the Job ID to the Ninf-G Client. Then, IS_SAMPLE invokes the remote processes (Ninf-G Executable) on serverA using its underlying middleware.
(8) The Ninf-G Client waits for the reply from IS_SAMPLE, and notify of Job ID. When the Ninf-G Client receives the reply and Job ID, it resumes the execution without waiting for actual job invocation on serverA.
(grpc_call())
(9) When the Ninf-G Executable is invoked on serverA, it connects to the Ninf-G Client using Globus IO. The connection is used for communication (e.g., argument transfers from the Ninf-G Client to the Ninf-G Executable) between the Ninf-G Client and the Ninf-G Executable. IS_SAMPLE does nothing for grpc_call(). If the underlying middleware for IS_SAMPLE returns an error on remote process invocation, IS_SAMPLE must notify the Ninf-G Client that the job invocation has failed.
(grpc_function_handle_destruct())
(10) When grpc_function_handle_destruct() is called, the Ninf-G Client requests the Ninf-G Executable to exit the process. This communication is carried out between the Ninf-G Client and the Ninf-G Executable. The Ninf-G Client does not wait for the Ninf-G Executables to be terminated.
(11) When the Ninf-G Executable exits the process, the job status managed by IS_SAMPLE should be changed to DONE, and IS_SAMPLE notifies the Ninf-G Client of the change in job status to DONE.
(12) The Ninf-G Client sends a JOB_DESTROY request to IS_SAMPLE.
(13) IS_SAMPLE returns "S" to the Ninf-G Client when it receives the JOB_DESTROY request.
(14) IS_SAMPLE returns DONE to the Ninf-G Client if the state of the corresponding job is DONE. Otherwise, IS_SAMPLE cancels the job and notifies the Ninf-G Client of the change in status to DONE when the cancellation is completed and the status of the job actually becomes DONE.
(grpc_finalize())
(15) When grpc_finalize() is called, the Ninf-G Client sends an EXIT request to IS_SAMPLE.
(16) IS_SAMPLE returns "S" to the Ninf-G Client when it receives the EXIT request. The pipes between IS_SAMPLE and Ninf-G Client (stdin, stdout, stderr) are closed after it.
(17) IS_SAMPLE cancels all jobs and wait the termination of all jobs, and exit.
(18) When the Ninf-G Client receives an "S" from IS_SAMPLE, it continues its execution, and does not wait for the termination of all jobs.

The following figure illustrates the interaction between the Ninf-G Client, Invoke Server, and the Ninf-G Executable.

Figure 1: Interaction between the Ninf-G Client, Invoke Server and the Ninf-G Executable

2. Specifications of Invoke Server

This section describes a detailed overview of Invoke Server and the protocol existing between a Ninf-G Client and Invoke Server.

2.1 Detailed overview of Invoke Server

Invoke Server is invoked when a Ninf-G Client initializes a function/object handle on the remote server which Ninf-G Client is configured to use with Invoke Server.
The maximum number of jobs per Invoke Server is limited. If the number of jobs exceeds the limit, a new Invoke Server is invoked.
Invoke Server exits the process if it receives an EXIT request from the Ninf-G Client. This request is sent when the Ninf-G Client calls grpc_finalize(). Invoke Server also exits the process if it is managing the maximum number of jobs and all jobs are terminated.
The Ninf-G Client and Invoke Server communicate using three pipes, created by the Ninf-G Client when the Invoke Server is invoked.
Ninf-G Client does not wait for the termination of Invoke Server after the Ninf-G Client sends an EXIT request to Invoke Server.
If the Ninf-G Client exits abnormally, the pipes will be disconnected. When Invoke Server detects that the pipes have been disconnected, Invoke Server must cancel all jobs and exit the process.
Invoke Server is implemented as a Unix executable or script file which should be located in the ${NG_DIR}/bin directory. It can be located in another directory if Invoke Server is supplied with an absolute path to the executable file.
The file names used with Invoke Server must follow the naming convention of "ng_invoke_server" + suffix, where the suffix corresponds to rules for the underlying middleware used for remote process invocation.
Log file for Invoke Server can be specified as an optional argument of the Invoke Server command.

Example:
```
-l [Log file name]
```
If this option is specified, Invoke Server outputs logs to the file specified by this argument. Otherwise, logs are not recorded.

2.2 Protocol between a Ninf-G Client and Invoke Server

2.2.1 Overview

A Ninf-G Client and Invoke Server exchange three types of messages, Request, Reply, and Notify. A Request message is sent from a Ninf-G Client to Invoke Server. Reply and Notify messages are sent from Invoke Server to the Ninf-G Client. The Ninf-G Client assumes that a Reply message must be returned from Invoke Server when the Ninf-G Client sends a Request message. A Notify message is used to send messages from Invoke Server to the Ninf-G Client asynchronously. Three different pipes are used for sending these three types of message.

Name	fd	direction
Request	stdin	Ninf-G	---->	Invoke
Reply	stdout	Client	<----	Server
Notify	stderr		<----

2.2.2 Protocol

All messages are sent as plain text. The Return code (<RET>) is 0x0d0a. The Return code is a delimiter that determines the unit of messages. A Job ID is generated by Invoke Server.

2.2.2.1 Request

Four Request messages, JOB_CREATE, JOB_STATUS, JOB_DESTROY, and EXIT are supported.

JOB_CREATE
- Format
  JOB_CREATE <Request ID><RET> hostname .....<RET> port .....<RET> ... (snip) JOB_CREATE_END<RET>
- Explanation
  This request is used to create and invoke a new job. Required information for job invocation is described as a set of attributes that is transferred along with a JOB_CREATE request. The details of these attributes are described in Section 2.2.2.4. JOB_CREATE is the only request that is described using multiple lines. All the other requests can be described with a single line.
  
  A Ninf-G Client transfers a Request ID to Invoke Server. Invoke Server generates a unique Job ID and returns it to the Ninf-G Client. The Job ID is used by the Ninf-G Client to specify the job.
  
  When Invoke Server receives a JOB_CREATE request, it must send a Reply message to the Ninf-G Client. Then, Invoke Server generates a unique Job ID and notifies the Ninf-G Client of the Job ID. Finally, Invoke Server requests job invocation on remote servers via the underlying middleware used with the Invoke Server.
JOB_STATUS
- Format
  JOB_STATUS <Job ID><RET>
- Explanation
  This request queries Invoke Server on the status of jobs. The current version of Ninf-G4 and prior does not use this JOB_STATUS request.
JOB_DESTROY
- Format
  JOB_DESTROY <Job ID><RET>
- Explanation
  This request is used to terminate and destroy jobs. Invoke Server cancels all jobs if it receives this request and the corresponding jobs are not completed. When Invoke Server confirms that all jobs are cancelled, it sends DONE to the Ninf-G Client.
EXIT
- Format
  EXIT<RET>
- Explanation
  This request is used to terminate Invoke Server. If Invoke Server receives this EXIT request, it must cancel all outstanding jobs and wait for their termination.

2.2.2.2 Reply

Invoke Server must send a Reply message to a Ninf-G Client if Invoke Server receives a Request message from that Ninf-G Client.

The reply to JOB_CREATE, JOB_DESTROY, and EXIT messages is:

[S | F <Error String>]<RET>

where S is sent in case of Success. Otherwise, F is returned, followed by <Error String>.

The reply to a JOB_STATUS message is:

[S <Status> | F <Error String>]<RET>

Where <Status> is denoted as:


<Status> : [PENDING | ACTIVE | DONE | FAILED]

Each status indication indicates the status such that:

PENDING : the Ninf-G Executable is waiting for invocation.
ACTIVE : the Ninf-G Executable is already invoked.
DONE : the Ninf-G Executable is already done.
FAILED : the Ninf-G Executable exited abnormally.

2.2.2.3 Notify

A Notify message is used to send an asynchronous message from Invoke Server to a Ninf-G Client. Two types of Notify message are provided.

CREATE_NOTIFY
- Format
  CREATE_NOTIFY <Request ID> [S <Job ID> | F <Error String>]<RET>
- Explanation
  This is used to notify the Ninf-G Client of the Job ID. A Job ID is case sensitive and cannot include invisible characters.
STATS_NOTIFY
- Format
  STATUS_NOTIFY <Job ID> <Status> <String><RET>
```
<Status> : [PENDING | ACTIVE | DONE | FAILED]
```
- Explanation
  This message is used to send notification that the status of a job has been changed.
  
  <String> can be any string, and the <String> is stored in an output log. It should be noted that the status of job can be changed from PENDING to DONE.

2.2.2.4 `JOB_CREATE` Request

This section describes the details of a JOB_CREATE Request.

Format
JOB_CREATE <Request ID><RET> hostname .....<RET> port .....<RET> ... (snip) JOB_CREATE_END<RET>
Attributes are placed between JOB_CREATE<RET> and JOB_CREATE_DONE<RET>. Only one attribute can occupy one line and one line must include one and only one attribute. Attributes can be placed in any order. There are two types of attributes, mandatory attributes and optional attributes. Invoke Server must return an error if mandatory attributes are not included. Any unknown optional attributes must be ignored.

Attributes

The following is a list of attributes supported by Ninf-G. Some of these attributes are provided for the Globus Toolkit's Pre-WS GRAM and WS-GRAM. Any new attribute can be defined using the Client configuration file <SERVER> section "invoke_server_option" attribute.

name mandatory meanings

hostname yes Host name of the server

port yes Port number

jobmanager no Job Manager

subject no Subject of the GRAM

client_name yes Host name of the Ninf-G Client

executable_path yes Path of the Ninf-G Executable

backend yes Backend of the remote function (e.g., MPI)

count yes Number of Ninf-G Executables

staging yes A flag indicating if staging is used or not

argument yes Arguments for the Ninf-G Executable

work_directory no Working directory of the remote function

gass_url no The URL of GASS

redirect_enable yes A flag indicating redirection of stdout/stderr

stdout_file no file name of stdout

stderr_file no file name of stderr

environment no Environment variables

tmp_dir no temporary files directory

status_polling yes Interval of status polling

refresh_credential yes Interval of credential refresh

max_time no Maximum execution time

max_wall_time no Maximum wall clock time

max_cpu_time no Maximum CPU time

queue_name no Name of the queue

project no Name of the project

host_count no Number of executables per host

min_memory no Minimum size of requested memory

max_memory no Maximum size of requested memory

rsl_extensions no RSL extension

name	mandatory	meanings
hostname	yes	Host name of the server
port	yes	Port number
jobmanager	no	Job Manager
subject	no	Subject of the GRAM
client_name	yes	Host name of the Ninf-G Client
executable_path	yes	Path of the Ninf-G Executable
backend	yes	Backend of the remote function (e.g., MPI)
count	yes	Number of Ninf-G Executables
staging	yes	A flag indicating if staging is used or not
argument	yes	Arguments for the Ninf-G Executable
work_directory	no	Working directory of the remote function
gass_url	no	The URL of GASS
redirect_enable	yes	A flag indicating redirection of stdout/stderr
stdout_file	no	file name of stdout
stderr_file	no	file name of stderr
environment	no	Environment variables
tmp_dir	no	temporary files directory
status_polling	yes	Interval of status polling
refresh_credential	yes	Interval of credential refresh
max_time	no	Maximum execution time
max_wall_time	no	Maximum wall clock time
max_cpu_time	no	Maximum CPU time
queue_name	no	Name of the queue
project	no	Name of the project
host_count	no	Number of executables per host
min_memory	no	Minimum size of requested memory
max_memory	no	Maximum size of requested memory
rsl_extensions	no	RSL extension

Detailed description

hostname
Host name of the server machine.
port
The server port number on which the server is listening. The default value is depend on underlying middleware.
jobmanager
The job manager used on the server machine.
subject [subject]
The certificate subject of the resource manager contact.
client_name [client name]
The host name of the client machine.
executable_path [path to the executable]
Absolute path of the Ninf-G Executable. The path represents a remote path if staging is off. Otherwise, the path represents a local path.
backend [backend]
The method for launching the Ninf-G Executable is specified as backend. The value is NORMAL, MPI, or BLACS. If MPI or BLACS is specified, the Ninf-G Executable must be invoked via the mpirun command.
count [N]
The number of Ninf-G Executables to be invoked. If the backend is MPI or BLACS, count means the number of nodes.
staging [true/false]
The value is true if staging is on and Invoke Server must transfer the Ninf-G Executable file from the local machine to the remote machine.
argument [argument]
An argument for the Ninf-G Executable is specified using this attribute. This attribute can specify one argument only, and multiple arguments must be specified one by one, by using this attribute for each one. The arguments must be passed to the Ninf-G Executable as arguments.

Example:
```
argument --client=...
argument --gass_server=...
```
work_directory [directory]
This attribute specifies the directory in which the Ninf-G Executable is invoked.
gass_url
This directory specifies the URL of the GASS server on the Client machine. This attribute is used for the Globus Toolkit's Pre-WS GRAM.
redirect_enable [true/false]
This attribute is set to true if the stdout/stderr of the Ninf-G Executable has been requested to be transferred to the Ninf-G Client.
stdout_file [filename]
If redirect_enable is set to true, this attribute specifies the name of the output file for stdout. Invoke Server must output the stdout to this file. The Ninf-G Client reads this file as an output file and writes the contents of the file to the stdout of the Ninf-G Client.
stderr_file [filename]
If redirect_enable is set to true, this attribute specifies the name of the output file of the stderr. Invoke Server must output the stderr to this file. The Ninf-G Client reads this file as an output file and writes the contents of the file to the stderr of the Ninf-G Client.
environment [ENV=VALUE]
The environment variable for the Ninf-G Executable is passed using this attribute. The environment variable and its value are connected by =. Only the variable is specified if it does not take a value. Multiple environment variables must be specified one by one.
tmp_dir [directory]
The directory in which temporal files are placed.
status_polling [interval]
Invoke Server may need to check the status of jobs by polling the status of existing jobs. This attribute specifies the interval of the polling. The value is in seconds, and if it is not specified, the default value 0 is passed.
refresh_credential [interval]
This attribute specifies the interval for refreshing credentials. The value is in seconds, and if it is not specified, the default value 0 is passed.
max_time [time]
This attribute specifies the maximum time of the job.
max_wall_time [time]
This attributes specifies the maximum wall clock time of the job.
max_cpu_time [time]
This attribute specifies the maximum cpu time of the job.
queue_name [queue]
This attribute specifies the name of the queue to which the Ninf-G Executable should be submitted.
project [projectname]
This attribute specifies the name of the project.
host_count [number of nodes]
This attribute specifies the number of nodes.
min_memory [memory size (MB)]
This attribute specifies the minimum requirements for the memory size of the job.
max_memory [memory size (MB)]
This attribute specifies the maximum memory size of the job.
rsl_extensions [RSL extension]
This attribute can be used to specify the RSL extension which is available for the Globus Toolkit's WS GRAM.

Appendix A. How to specify the Invoke Server

Invoke Server is specified by the Ninf-G Client using a Client configuration file.

A.1. How to specify Invoke Server

Invoke Server is specified by using the invoke_server attribute in the <SERVER> section.


invoke_server [type]

Type specifies the type of the Invoke Server, such as GT4py or UNICORE.

A.2. How to pass information to Invoke Server

Invoke Server may require options for its execution. Such options can be specified by an option attribute in the <INVOKE_SERVER> section or by an invoke_server_option attribute in the <SERVER> section.


option [String]
invoke_server_option [String]

Multiple attributes can be specified in the <SERVER> or <INVOKE_SERVER> sections.

A.3. Polling interval

Invoke Server must check the status of jobs, and this may be implemented using polling. The polling interval can be specified by the status_polling attribute in the <INVOKE_SERVER> section.


status_polling [interval (seconds)]

A.4. Logfile

The filename of the Invoke Server's execution log can be specified by the invoke_server_log attribute in the <CLIENT> section.


invoke_server_log [filename]

If this attribute is specified, Invoke Server outputs logs to a file with the specified filename and file type of that Invoke Server.

The log_filePath attribute in the <INVOKE_SERVER> section can be used to specify a log file for a specific Invoke Server.


log_filePath [Log file name]

A.5. Maximum number of jobs per Invoke Server

The maximum number of jobs per Invoke Server can be limited by the max_jobs attribute in the <INVOKE_SERVER> section. If the number of requested jobs exceeds this value, the Ninf-G Client invokes a new Invoke Server and requests that Invoke Server to manage the new jobs.


max_jobs [maximum number of jobs]

A.6. How to specify the path of the Invoke Server

If Invoke Server is not located in a pre-defined directory, the path attribute in <INVOKE_SERVER> can be used to specify the path of the Invoke Server.


path [path of the Invoke Server]

The Ninf-G Client passes the filename to Invoke Server as an attribute for the JOB_CREATE request.
Invoke Server outputs the stdout/stderr of the Ninf-G Executable to the file.
The Ninf-G Client outputs the contents of the file to the stdout/stderr.

last update : $Date: 2006/09/20 04:57:42 $