The primary role of the OS in a cluster is the same twofold task: multiplex multiple user processes onto a single set of hardware components and providing useful abstractions for high-level software (beautification).
Just as in a conventional desktop system, the operating system (OS) for a cluster lies at the heart of every node. Whether the user is opening files, sending messages, or starting additional processes, the OS is omnipresent. While users may choose to use differing programming paradigms or middleware layers, the OS is almost always the same for all users. A generic design sketch of an OS is given in Figure 1. It shows the major building blocks of a typical cluster node with the hardware resources located at the bottom, a monolithic kernel as the core of the OS, and system and user processes as well as some middleware extensions running on top of the kernel in the user space. Some OSs feature a distinct layer to abstract the hardware for the kernel. Porting such an OS to a different hardware requires only the adaptation of this abstraction layer. On top of this, the core functionalities of the kernel execute memory manager, process and thread scheduler, and device drivers, to name the most important. Their services in turn are offered to the user processes by file systems, communication interfaces, and a general programmatic interface to access kernel functionality in a secure and protected manner. User (application) processes run in user space, along with system processes called daemons. Especially in clusters, the user processes often not only use the kernel functions but also use additional functionality that is offered by middleware. This functionality is usually located in user libraries, often supported by additional system services (daemons). Some middleware extensions require extended kernel functionality, which is usually achieved by loading special drivers or modules into the kernel. What, then, is the role of the OS in a cluster? The primary role is the same twofold task as in a desktop system: multiplex multiple user processes onto a single set of hardware components (resource management) and providing useful abstractions for high-level software (beautification). Some of these abstractions include protection boundaries, process/thread coordination and communication, and device handling. Therefore, in the remainder of this article, we will examine the abstractions provided by current cluster OSs and explore current research issues for clusters.