Abstract

広域ネットワーク上に分散した計算資源や情報資源を活用し、大規模計算を行うためのGrid と呼ばれる広域分散システムが注目されている。このような環境においては、故障検出、性能予測等のため、システム上の各資源の性能の計測が重要となる。Global Grid Forum 内の組織の一つである Grid Performance Working Group が、モニタリングシステムの基本的なアーキテクチャとXMLによるデータ形式を定義・提案しているが、この提案に対しては、1) アーキテクチャのスケーラビリティ、2)XML を用いたデータ表現のコスト、3) データ形式の拡張性、が検証されていない。本研究ではこれらを検証するために、GridRPC システムであるNinf 上に提案アーキテクチャの一部を実装し、評価をおこなった。その結果、アーキテクチャが現実的な設定範囲内では十分スケーラブルであり、XML を用いたデータ形式のコストは許容できる大きさであり、データ形式の拡張性も十分であることを確認した。

The Grid allows distributed resources to be coordinated in order to facilitate large-scale computing over the wide-area network. In such an environemnt, fault detection and performance monitoring as well as its predeiction becomes one of the important features that need to be agreed upon and possibly standardized. The Grid Performance Working Group within the Global Grid Forum has recently proposed and defined the basic architecture of Grid monitoring and the XML-based data format definitions, but the proposal has been yet tested in practice. In particular, technical concerns include 1) scalability of the proposed architecture, 2) the cost of XML representation of instrumentation events, and 3) extensibility and flexibility of the data definition schema. Our experimental implementation of the part of the proposed architecture on our Ninf GridRPC system has shown that, within a realistic Grid setting the architecture seems reasonably scalable, the added cost of data representation is within permissible bounds, and the schema is su.ciently extensible to accomodidate the specifics of the Ninf system.