Ninf-G is a reference implementation of the GridRPC API. Ninf-G makes remote procedure calls via various protocols and middleware such as SSH and the Globus Toolkit.
GridRPC is middleware that provides a model for access to remote libraries and parallel programming for tasks on a grid. Typical GridRPC middleware includes Ninf and Netsolve. The other GridRPC middleware includes GridSolve, DIET, and OmniRPC.
GridRPC is considered effective for use in the following cases.
Commercial programs or libraries that use resources which are run on particular computers on the grid are sometimes provided only in binary format and cannot be executed on particular computers. There are also problems concerning licensing and source code compatibility. Furthermore, when using resources that can only be used with particular machines, such as video cameras, electron microscopes, telescopes and sensors, processing for the use of those resources on those machines is necessary.
In such cases, an environment that allows the resources (including software) to be used on a particular computer is needed.
When there are many programs that execute routines that do a large amount of computation on broadband servers on the grid, it takes a lot of time just to run parts of the program.
The time required to run the program can be shortened by off-loading such program parts to a broadband server.
In cases when there are strong demands on memory and disk space on the client machine so that broadband computation cannot be done, it is desirable to be able to do easily-understood offloading with no consideration given to argument marshalling.
Execution of Parameter Sweep by multiple servers on the grid
Parameter Sweep is a program that enables execution of computation on multiple servers in parallel, using some subset of the parameters. The respective servers run independently using different parameters, with virtually no dependence on other servers.
There are surprisingly many programs like Parameter Sweep.
The Monte Carlo method program is one of them.
Although Parameter Sweep can also be implemented with a Message Passing Interface (MPI), programming is rather simple with GridRPC and Parameter Sweep can be executed to match the (dynamically changing) scale of the grid (execution by multiple clusters, taking resource management, security, etc., into account).
Ordinary or large-scale task parallel programs on a grid
Task arrangement programs are easy to write with GridRPC. An API that supports the synchronization of various task arrangements with mixed exchange among multiple clients and servers can be used.
GridRPC not only provides an interface for easy mathematical computation and scheduling of tasks for parallel execution, but the execution of processing that matches the (dynamically changing) scale of the grid is possible, as in the case of Parameter Sweep.
New features and functions have been added to Ninf-G Version 5 (Ninf-G5).
Ninf-G5 does not assume specific Grid middleware as prerequisites, that is, unlike the past versions of Ninf-G (e.g. Ninf-G2, Ninf-G4), Ninf-G5 works in non Globus Toolkit environments.
Major functions of Ninf-G5 include (1) remote process invocation, (2) communications between a client and servers, and (3) information services and retrievals. Realized by the following three external modules, Ninf-G5 is able to implement these services according to the available software environments.
Invoke Server is a module to invoke remote processes according to the available Grid middleware such as Globus Toolkit WS GRAM, Pre-WS GRAM, Condor, and SSH.
Function/object handle management functions such as grpc_function_handle_init() interacts with the Invoke Server to control remote processes via function/object handles.
Communication Proxy is a module to implement communications between a Ninf-G Client and Ninf-G Executables.
In order to utilize specific communication libraries such as Globus IO, Communication Proxy module for the communication library is required. Otherwise, Ninf-G uses native TCP/IP for communications between a Ninf-G Client and Ninf-G Executables.
Information Service is a module to provide information about Ninf-G Executables to the Ninf-G Client.
Information Service NRF (Ninf-G Remote Information File) provides file based information services.
Ninf-G5 implements two types of communications between a Ninf-G Client and Ninf-G Executables, one is connection-full and the other one is connection-less. The connection-full type keeps the connection until the Ninf-G Executable will be disappeared. The connection-less type disconnects the connection and the connection will be established on demand such as by heartbeating and transfer of arguments and results. These two types are selectable by users according to the characteristics of applications and runtime environments.
Ninf-G is a set of library functions that provide an RPC capability in a Grid environment, based on the GridRPC API specifications.
Ninf-G and the application programs that use Ninf-G consist of Ninf-G Executables that execute computation on server machines, and Ninf-G Clients that issue requests for computation to the Ninf-G Executables from client machines.
The Ninf-G Executables consist of functions that perform calculations (calculation functions) and a Ninf-G stub program that calls the calculation functions. Communication between clients and servers is accomplished by TCP/IP using a proprietary Ninf-G protocol.
The relationships between clients and servers are illustrated in Fig. 1.
Figure 1: Clients and servers
Ninf-G employs the capabilities provided by the Grid middleware (e.g. Globus Toolkit) for server machine authentication, information search, job start-up, communication on External Modules. The relationships between applications, Ninf-G, Grid middleware and the OS are illustrated in Fig. 2.
Figure 2: Program hierarchy
Ninf-G Clients are comprised of the following elements.
Ninf-G Executables are comprised of the following elements.
Ninf-G is supplied to the user as a source package which includes library functions, utility commands, and external modules. Required software for specified components are shown in Table 1. Specific versions in parenthesis are supported versions.
Software | Globus Toolkit |
requirements | 4.0.5 or later (4.0.5, 4.0.6, 4.0.7, 4.0.8, 4.2.0, 4.2.1, 5.0.0, 5.0.1) |
required by | Invoke Server GT2c, GT4py, Communication Proxy GT, Information Service MDS4(*1) |
- | - |
Software | Java JDK |
requirements | 1.5 or later (1.5.0) |
required by | Ninf-G Java Client, Invoke Server Condor, SSH_Condor, NAREGISS, Information Service MDS4 |
- | - |
Software | ant |
requirements | 1.6 or later (1.6.2) |
required by | Ninf-G Java Client, Invoke Server Condor, NAREGISS, Information Service MDS4 |
- | - |
Software | Python |
requirements | 2.3 or later (2.3) |
required by | Invoke Server GT4py, SSH_Condor, Communication Proxy SSH |
- | - |
Software | NAREGI Middleware |
requirements | V1.1 or later |
required by | Invoke Server NAREGISS |
(*1) Invoke Server GT4py and Information Service MDS4 work on Globus Toolkit 4.x only. They do not work on Globus Toolkit 5.x.
The operating environment required for the library functions and utility commands are shown in Table 2.
Note: If the Ninf-G is compiled with the Globus Toolkit, the compiler must be the same as the compiler by which the Globus Toolkit was compiled. The flavor of the Globus Toolkit must be a Pthread flavor.
Target machine | PC-AT compatible (x86, AMD64) |
Operating system | Linux(*1) |
Compiler | gcc 2.95, gcc 3.0, 3.1, 3.2, 3.3, 3.4(*2) |
- | - |
Target machine | SPARC |
Operating system | Solaris 9 (SunOS 5.9)(*3) |
Compiler | Sun Compiler, gcc 3.2, 3.3 |
- | - |
Target machine | Apple Mac (PowerPC) |
Operating system | Mac OS X |
Compiler | gcc 4.0.0 |
- | - |
Target machine | IBM Power4 |
Operating system | AIX 5.2 |
Compiler | C for AIX Compiler, Version 6 |
- | - |
Target machine | Cell B.E. |
Operating system | Linux(*4) |
Compiler | gcc4.x |
(*1) We have tested Ninf-G5 on the following distributions.
(*2) There are problems with gcc 2.96, so we recommend you use gcc 2.95.x or gcc 3.0, 3.1, 3.2, 3.3, 3.4.
(*3) GT5 is not available on Solaris 9.
(*4) We have tested Ninf-G5 on the following distributions.
Ninf-G allows the definition of a single computation function (1) or multiple computation functions (2) for a Ninf-G Executable running on a server machine. The execution schemes for these are shown in Fig. 3. In either case, it is possible to execute just one computation function at a time on the Ninf-G Executable. To execute multiple computation functions at the same time, it is necessary to run multiple Ninf-G Executables. This is illustrated in Fig. 4.
In Ninf-G, the second scheme (2) is referred to as "Ninf-G Executable objectification" and the calling of the computation is referred to as a "method call."
Figure 3: Overview of operation
Figure 4: Parallel execution
Ninf-G provides handles for manipulating a Ninf-G Executable. Different handles are used for the two schemes, (1) and (2), described above. As shown in Table 3, two types of handles are provided, function handles and object handles.
Function handle | Used for manipulation of a Ninf-G Executable for which a single function is defined |
Object handle | Used for manipulation of a Ninf-G Executable for which multiple functions are defined |
Ninf-G Executables that run on server machines are started up from Ninf-G Clients, which run on client machines. A Ninf-G Executable is started up by performing the following procedure using the job control method provided by Invoke Server.
When running a Ninf-G Client program, however, there is no particular need for the user to be aware of this mechanism.
For example, if the Invoke Server for Globus Toolkit WS-GRAM is selected for use, the Invoke Server requests the remote WS-GRAM to perform the invocation. The requested remote WS-GRAM invokes the jobmanager, and the jobmanager invokes the Ninf-G Executable.
This process is shown in Fig. 5.
Figure 5: Starting up a Ninf-G Executable
Starting up a Ninf-G Executable requires path information that specifies the location of the Ninf-G Executable on that server machine. Information on the functions that are called by the Ninf-G Executable is also required. That information is collectively referred to as the Ninf-G Executable information. Ninf-G provides the following methods of registering and accessing Ninf-G Executable information.
When running a Ninf-G Client program, however, there is no particular need for the user to be aware of this mechanism.
Figure 6: Get information from NRF file
Figure 7: Get information from Ninf-G Executable
(*) The information search function provided by the Globus Toolkit.
Figure 8: Get information from WS MDS
This is a program written by a user for the purpose of controlling the execution of computation. It is obtained by linking a user-written application program to the Ninf-G Client Library.
The Ninf-G Client Library puts together the API used by application programs that run on client machines (Ninf-G Client API).
This is a program written for the execution of user requests for computation to be performed on a remote computer. It is obtained by linking a user-written computation function to stub code and the Ninf-G Executable Library. The stub code is produced by the stub generator according to the interface specifications of the user-defined computation function. The interface specifications are written in the Ninf-G IDL (Interface Description Language) specified by Ninf-G.
The Ninf-G Executable Library puts together the API (Ninf-G Executable API) used by a Ninf-G Executable.
A machine that is running a Ninf-G Client.
A machine that is running a Ninf-G Executable.
A function handle is a data item whose type is grpc_function_handle_t. The function handle represents a mapping from a function name to an instance of that function on a particular server.
An object handle is a data item whose type is grpc_object_handle_t_np. The object handle represents a mapping from a class name to an instance of that class on a particular server. The instance is called a Ninf-G remote object, and it is able to contain multiple methods.
A computational function written by the user. (It might be only a single computation function for a Ninf-G Executable)
A computational function written by the user. (It might be multiple computation functions for a Ninf-G Executable)
A session extends from the time an RPC is made to the time its execution is completed.
In Ninf-G, a session extends
This is the standard API that systems implementing GridRPC should have. The GridRPC C language API is published as an Open Grid Forum (OGF) recommendation (GFD-R 52).
IDL is the acronym for Interface Definition Language. It is a language for writing interfaces for the remote functions and remote methods defined by Ninf-G Executables.
This is the identifier for Ninf-G Executables. The user may specify any character string in the Ninf-G IDL.
NRF is an acronym of Ninf-G Remote Information File. This file is in XML and describes the information about the Ninf-G Executable generated on the specified server.
Ninf-G provides the following functionalities for reducing overhead for initialization of function handles.
A single Globus Toolkit GRAM call usually takes several seconds for GSI authentication and a process invocation via the Globus Toolkit jobmanager. This indicates that it will take more than several minutes to tens of minutes for hundreds of GRAM calls on a large-scale cluster. Also, many jobmanager processes which will be launched on the front-end node will increase the load on the front-end node and cause the creation of additional overhead.
Ninf-G implements a functionality which enables the creation of multiple
function handles via a single GRAM call and provides an API for
utilizing this functionality. For example,
grpc_function_handle_array_default_np()
takes three
arguments, a pointer to an array of function handles, the number of
function handles, and the name of the remote executable. When
grpc_function_handle_array_default_np()
is invoked, Ninf-G will
construct an RSL in which the count
attribute is specified as the
number of function handles, and pass the RSL to the GRAM. This
allows invocation of multiple remote executables, i.e. initialization of
multiple function handles, via a single GRAM call.
Ninf-G provides the following functionalities for efficient data transfers and elimination of redundant data transfers.
Although the semantics of a remote executable is "stateless," it is
desirable to provide a "stateful" remote executable since typical
applications repeat computation for large data sets with different
parameters.
In the case of "stateless" executables, the executable needs to send the
data in every remote library call, which would be a severe problem in a
Grid environment. Ninf-G provides a "stateful" remote executable as
a "Ninf-G remote object." A Ninf-G remote object can hold a "state" and
be used to eliminate redundant data transfers between a client and
servers. Ninf-G provides API functions such as
grpc_object_handle_init_np()
and
grpc_invoke_np()
for
utilizing Ninf-G remote objects.
grpc_object_handle_init_np()
initializes a Ninf-G remote object
and creates an object handle which is represents a connection
between the client and the Ninf-G remote object.
grpc_invoke_np()
calls methods of the Ninf-G remote object
as described in the Ninf-G IDL.
A Ninf-G remote object is an instance of a class which is defined in an
IDL file using DefClass
statement on the server side.
Multiple methods, which can be invoked by a client using a client
API such as grpc_invoke_np()
, can be defined in
a class using the DefMethod
statement.
Ninf-G enables data transfers with compression. A flag which specifies whether to enable or disable data compression, and a data size as the threshold for compressing data can be specified in the client configuration file.
In order to compensate for the heterogeneity and unreliability of a Grid environment, Ninf-G provides the following functionalities:
The GridRPC API specifies that the first argument of a client program must be a "client configuration file" in which information required for running applications is described. In order to compensate for the heterogeneity and unreliability of a Grid environment, Ninf-G provides client configuration formats for detailed description of server attributes such as the Globus jobmanager, and a protocol for data transfers, etc.
If a server machine is fully utilized, requests for initialization of function handles and remote library calls may be stuck in the queue and will not be launched for a long time, and this may cause deadlock of applications. Ninf-G provides a functionality to specify a timeout value for initialization of function handles as well as remote library calls. The timeout values can be specified in the client configuration file.
A remote executable reports a heartbeat message to the client at a pre-specified interval. Ninf-G provides an API function for checking the heartbeat from the remote executable. The interval can be specified in the client configuration file.
Ninf-G provides a functionality called "client callbacks" by which a remote executable calls a function on the client machine. The client callback can be used for sharing status between the server and the client. For example, the client callback can be used for showing the interim status of computation at the client machine and in interactive processing.
Ninf-G provides a server-side API function named
grpc_is_canceled_np()
for checking the arrival of cancel requests
from the client. If the client calls a grpc_cancel()
function,
grpc_is_canceled_np()
returns 1
. In order to
implement cancellation of a session, remote executables are required
to call grpc_is_canceled_np()
at an appropriate interval and
return by itself, if grpc_is_canceled_np()
returns
1
.
Ninf-G provides functionalities which are useful for debugging. Ninf-G enables redirection of stdout and stderr of remote executables to the client machine. Log messages generated by Ninf-G can also be stored on the client machine. Furthermore, Ninf-G enables the launch of "gdb" on the server machine when a remote executable is launched on the server. These functionalities are made available by turning on the flags in the client configuration file.
The GridRPC API and Ninf-G API implemented by Ninf-G5 is compatible with Ninf-G4 except the following two items.
The acceptable format of the second argument (server_name) of function/object handle initialize functions has been changed. Ninf-G5 does not accept the format of Globus GRAM Resource Manager Contact as a valid second argument though it can be used in Ninf-G4.
An argument (error) has been added to comply with the specification. This change is a bug fix.
The format of a Client configuration file is not compatible between Ninf-G4 and Ninf-G5 though they are very similar.
Due to protocol changes, Ninf-G4 client cannot communicate with Ninf-G5 executables and vice versa.
Globus Toolkit is not necessarily required. It is optional.
If the Globus Toolkit is used, the globus_core must be installed, which is not installed by binary installer. Source installer installs this module, or post processing after the binary installer installation is required. (see a.6 Installing GT4 by Binary installer)
Ninf-G users must be capable of submitting jobs using the Globus Toolkit from a client machine on which Ninf-G Client programs will run to server machines on which the Globus gatekeeper is running and Ninf-G Executables will be launched by the Globus jobmanager.
Ninf-G users must be capable of submitting commands using the SSH from a client machine on which Ninf-G Client programs will run to server machines on which the SSH command is running and Ninf-G Executables will be launched by the SSH command.
The server machine which executes Ninf-G Executables must be IP-reachable for either the client machine or the server side gateway machine which is IP-reachable to the client machine and executes a Remote Relay. That is, the server machines or the gateway machine should be capable of establishing a connection to the client machine.
Either the client machine which executes Ninf-G Client or the client side gateway machine which executes a Client Relay must be IP-reachable from the server machine.