- [jsspp10takefusa]
[PDF]
[Slides]
[Abstract]
-
Atsuko Takefusa, Hidemoto Nakada, Tomohiro Kudoh, Yoshio Tanaka.
An Advance Reservation-based Co-Allocation Algorithm for Distributed Computers and Network Bandwidth on QoS-guaranteed Grids.
15th Workshop on Job Scheduling Strategies for Parallel Processing, 2010.
Co-allocation of performance-guaranteed computing and network re- sources provided by several administrative domains is one of the key issues for constructing a QoS-guaranteed Grid. We propose an advance reservation-based co-allocation algorithm for both computing and network resources on a QoS- guaranteed Grid, modeled as an integer programming (IP) problem. The goal of our algorithm is to create reservation plans satisfying user resource requirements as an on-line service. Also the algorithm takes co-allocation options for user and resource administrator issues into consideration. We evaluate the proposed al- gorithm with extensive simulation, in terms of both functionality and practicality. The results show: The algorithm enables efficient co-allocation of both computing and network resources provided by multiple domains, and can reflect reservation options for resource administrators issues as a first step. The calculation times needed for selecting resources using an IP solver are acceptable for an on-line service.
- [hpcasia2009-takefusa]
[PDF]
[Slides]
[Abstract]
-
Atsuko Takefusa, Hidemoto Nakada, Seiya Yanagita, Fumihiro Okazaki, Tomohiro Kudoh, Yoshio Tanaka.
Design of a Domain Authorization-based Hierarchical Distributed Resource Monitoring System in cooperation with Resource Reservation.
Proc. HPC Asia 2009, pp. 77-84, 2009.
Grid and Network provisioning technology has enabled the construction of high-quality virtual computing infrastructures spanning several administrative organizations. However, it is still difficult for users to monitor the usage of distributed and various resources managed by multiple domains. We propose an authorization-based hierarchical distributed resource monitoring system called DMS that gathers information based on resource reservation, and filters information with the policies specified by the administrators using XACML, which is a standard authorization model and a policy description language. DMS co-works with the GridARS co-allocation framework to retrieve resource reservation information and adopts web services technologies and an extension of a standard data representation set. To confirm feasibility of the DMS system, we describe monitoring strategies for reserved computing and network resources in Collectors and we have developed a WSRF-based DMS prototype, which enables authorization by XACML. The experiments using the prototype system show: (1) Even when DMS employs a large number of policies, the overhead of the XACML authorization decision process is negligible, since that of WSRF/GSI is more dominant in the total processing time, and (2) the benefits of parallel information aggregation from multiple domains make the retrieval latency acceptable.
- [jsspp2007-takefusa]
[PDF]
[Slides]
[Abstract]
-
Atsuko Takefusa, Hidemoto Nakada, Tomohiro Kudoh, Yoshio Tanaka, Satoshi Sekiguchi.
GridARS: An Advance Reservation-based Grid Co-allocation Framework for Distributed Computing and Network Resources.
Proc. 13th Workshop on Job Scheduling Strategies for Parallel Processing (LNCS 4942), Seattle, pp. 152-168, 2007.
For high performance parallel computing on actual Grids, one of the important issues is to co-allocate the distributed resources that are managed by various local schedulers with advance reservation. To address the issue, we proposed and developed the GridARS resource co-allocation framework, and a general advance reservation protocol that uses WSRF/GSI and a two-phased commit (2PC) protocol to enable a generic and secure advance reservation process based on distributed transactions, and provides the interface module for various existing resource schedulers. To confirm the effectiveness of GridARS, we describe the performance of a simultaneous reservation process and a case study of GridARS grid co-allocation over transpacific computing and network resources. Our experiments showed that: 1) the GridARS simultaneous 2PC reservation process is scalable and practical and 2) GridARS can co-allocate distributed resources managed by various local schedulers stably.
- [gridnets07steve]
[PDF]
[Slides]
[Abstract]
-
Steven R. Thorpe, Lina Battestilli, Gigi Karmous-Edwards, Andrei Hutanu, Jon MacLaren, Joe Mambretti, John H. Moore, Kamaraju Syam Sundar, Yufeng Xin, Atsuko Takefusa, Michiaki Hayashi, Akira Hirano, Shuichi Okamoto, Tomohiro Kudoh, Takahiro Miyamoto, Yukio Tsukishima, Tomohiro Otani, Hidemoto Nakada, Hideaki Tanaka, Atsushi Taniguchi, Yasunori Sameshima, Masahiko Jinno.
G-lambda and EnLIGHTened: Wrapped In Middleware Co-allocating Compute and Network Resources Across Japan and the US.
Proc. First International Conference on Networks for Grid Applications (GridNets), pp. 8p, 2007.
This paper describes innovative architectures and techniques for reserving and coordinating highly distributed resources, a capability required for many large scale applications. In the fall of 2006, Japan's G-lambda research team and the United States' EnLIGHTened Computing research team used these innovations to achieve the world's first inter-domain coordination of resource managers for in-advance reservation of network bandwidth and compute resources between and among both the US and Japan. The compute and network resource managers had different interfaces and were independently developed. Automated interoperability among the resources in both countries was enabled through various Grid middleware components. In this paper, we describe the middleware components, testbeds, results, and lessons learned.
- [gca07nakada]
[Abstract]
-
Hidemoto Nakada, Atsuko Takefusa, Katsuhiko Ookubo, Tomohiro Kudoh, Yoshio Tanaka, Satoshi Sekiguchi.
An Advance Reservation-based Computation Resource Manager for Global Scheduling.
GCA2007 Proceedings of the 3rd International Workshop on Grid Computing and Applications, pp. 3-14, 2007.
Advance Reservation is one possible way to enable resource co-allocation on the Grid. This method requires all the resources to have advance reservation capability as well as coordination protocol support. We employed 2-phased commit protocol as a coordination protocol, which is common in the distributed transaction area, and implemented an Advance Reservation Manager called {\bf PluS}. PluS works with existing local queuing managers, such as TORQUE or Grid Engine, and provides users advance reservation capability. To provide the capability, there are two implementation methods; 1) completely replaces the scheduling module of the queuing manger, 2) represents reservation as a queue and controls the queues using external interface. We designed and implemented a reservation manager with both way, and evaluated them. We found that the former has smaller overhead and allows arbitrary scheduling policy, while the latter is much easier to implement withacceptable response time.
- [ofc2006-hayashi]
-
Michiaki Hayashi, Takahiro Miyamoto, Tomohiro Otani, Hideaki Tanaka, Atsuko Takefusa, Hidemoto Nakada, Tomohiro Kudoh, Naohide Nagatsu, Yasunori Sameshima, Shuichi Okamoto.
Managing and Controlling GMPLS Network Resources for Grid Applications.
Proc. OFC 2006, 2006.
- [cit06nakada]
[Abstract]
-
Hidemoto Nakada, Atsuko Takefusa, Katsuhiko Ookubo, Makoto Kishimoto, Tomohiro Kudoh, Yoshio Tanaka, Satoshi Sekiguchi.
Design and Implementation of a Local Scheduling System with Advance Reservation for Co-allocation on the Grid.
Proc. of 2006 IEEE International Conference on Computer and Information Technology, pp. 6p, 2006.
While advance reservation is an essential capability for co-allocating several resources on Grid environments, it is not obvious how it can co-exist with priority-based First Come First Served scheduling, that is widely used as local scheduling policy today. To investigate this problem, we 1) developed a scheduling API in Java for TORQUE, a variant of OpenPBS, that enables users to implement their own schedulers and replace the original scheduling module with them, 2) implemented a prototype scheduler module that has advance reservation capability with the API. We also provide an external interface for the reservation capability based on WSRF to enable co-allocation of resources over the Grid. Using this interface with the job submission module from Globus toolkit 4, users can make reservation for resources and submit jobs over the Grid.
- [hpdc2003-takefusa]
[PDF]
[Slides]
[Abstract]
-
Atsuko Takefusa, Osamu Tatebe, Satoshi Matsuoka, Yohei Morita.
Performance Analysis of Scheduling and Replication Algorithms on Grid Datafarm Architecture for High Energy Physics Applications.
Proc. the 12th IEEE International Symposium on High Performance Distributed Computing (HPDC-12), pp. 34-43, 2003.
Data Grid is a Grid environment for ubiquitous access and analysis of large-scale data. Because Data Grid is in the early stages of development, the performance of its petabyte-scale models in a realistic data processing setting has not been well investigated. By enhancing our Bricks Grid simulator to accomodated Data Grid scenarios, we investigate and compare the performance of different Data Grid models. These are categorized mainly as either central or tier models; they employ various scheduling and replication strategies under realistic assumptions of job processing for CERN LHC experiments on the Grid Datafarm system. Our results show that the central model is efficient but that the tier model, with its greater resources and its speculative class of background replication policies, are quite effective and achieve higher performance, while each tier is smaller than the central model.
- [hpdc10takefusa]
[PDF]
[Slides]
[Abstract]
-
Atsuko Takefusa, Henri Casanova, Satoshi Matsuoka, Fran Berman.
A Study of Deadline Scheduling for Client-Server Systems on the Computational Grid.
Proceedings of 10th IEEE International Symposium on High Performance Distributed Computing (HPDC-10), pp. 406-415, 2001.
The Computational Grid is a promising platform for the deployment of various high-performance computing applications. A number of projects have addressed the idea of software as a service on the network. These systems usually implement client-server architectures with many servers running on distributed Grid resources and have commonly been referred to as Network-enabled servers (NES). An important question is that of scheduling in this multi-client multi-server scenario. Note that in this context most requests are computationally intensive as they are generated by high-performance computing applications. The Bricks simulation framework has been developed and extensively used to evaluate scheduling strategies for NES systems. In this paper we first present recent developments and extensions to the Bricks simulation models. We discuss a deadline scheduling strategy that is appropriate for the multi-client multi-server case, and augment it with ``Load Correction'' and ``Fallback'' mechanisms which could improve the performance of the algorithm. We then give Bricks simulation results. The results show that future NES systems should use deadline-scheduling with multiple fallbacks and it is possible to allow users to make a trade-off between failure-rate and cost by adjusting the level of conservatism of deadline-scheduling algorithms.
- [hpdc8takefusa]
[PDF]
[Slides]
[Abstract]
-
Atsuko Takefusa, Satoshi Matsuoka, Hidemoto Nakada, Kento Aida, Umpei Nagashima.
Overview of a Performance Evaluation System for Global Computing Scheduling Algorithms.
8th IEEE International Symposium on High Performance Distributed Computing (HPDC8), pp. 97-104, 1999.
While there have been several proposals of high performance global computing systems, scheduling schemes for the systems have not been well investigated. The reason is difficulties of evaluation by large-scale benchmarks with reproducible results. Our Bricks performance evaluation system would allow analysis and comparison of various scheduling schemes on a typical high-performance global computing setting. Bricks can simulate various behaviors of global computing systems, especially the behavior of networks and resource scheduling algorithms. Moreover, Bricks is componentalized such that not only its constituents could be replaced to simulate various different system algorithms, but also allows incorporation of existing global computing components via its foreign interface. To test the validity of the latter characteristics, we incorporated the NWS system, which monitors and forecasts global computing systems behavior. Experiments were conducted by running NWS under a real environment versus the simulated environment given the observed parameters of the real environment. We observed that Bricks behaved in the same manner as the real environment, and NWS also behaved similarly, making quite comparative forecasts under both environments.
- [hpdc7aida]
[Slides]
-
Kento Aida, Atsuko Takefusa, Hidemoto Nakada, Satoshi Matsuoka, Umpei Nagashima.
A Performance Evaluation Model for Effective Job Scheduling in Global Computing Systems.
7th IEEE International Symposium on High Performance Distributed Computing (HPDC7) (poster), pp. 352-353, 1998.
- [sc97takefusa]
[PDF]
[Slides]
[Abstract]
-
Atsuko Takefusa, Satoshi Matsuoka, Hirotaka Ogawa, Hidemoto Nakada, Hiromitsu Takagi, Mitsuhisa Sato, Satoshi Sekiguchi, Umpei Nagashima.
Multi-client LAN/WAN Performance Analysis of Ninf: a High-Performance Global Computing System.
Supercomputing '97, 1997.
Rapid increase in speed and availability of network of supercomputers is making high-performance global computing possible, including our Ninf system. However, critical issues regarding system performance characteristics in global computing have been little investigated, especially under multi-client, multi-site WAN settings. In order to investigate the feasibility of Ninf and similar systems, we conducted benchmarks under various LAN and WAN environments, and observed the following results: 1) Given sufficient communication bandwidth, Ninf performance quickly overtakes client local performance, 2) current supercomputers are sufficient platforms for supporting Ninf and similar systems in terms of performance and OS fault resiliency, 3) for a vector-parallel machine (Cray J90), employing optimized data-parallel library is a better choice compared to conventional task-parallel execution employed for non-numerical data servers, 4) computationally intensive tasks such as EP can readily be supported under the current Ninf infrastructure, and 5) for communication-intensive applications such as Linpack, server CPU utilization dominates LAN performance, while communication bandwidth dominates WAN performance, and furthermore, aggregate bandwidth could be sustained for multiple clients located at different Internet sites; as a result, distribution of multiple tasks to computing servers on different networks would be essential for achieving higher client-observed performance. Our results are not necessarily restricted to the Ninf system, but rather, would be applicable to other similar global computing systems.