NC State University Cloud Computing Implementation
11.2 NC State University Cloud Computing Implementation
A cloud computing system should be designed based on a service-oriented archi- tecture (SOA) that can allocate resources on-demand in a location and device independent way. The system should incorporate technical efficiency and scalability through appropriate level of centralization, including sharing of cloud resource and control functions, and through explicit or implicit self-provisioning of resources and services by users to reduce administration overhead (Dreher et al., 2009 ; Vouk et al., 2009 ). One of the principal differences between “traditional” and cloud computing configurations is in the level of control delegated to the user. For example, in a tra- ditional environment, control of resource use and management lies primarily with the service delivery site and provider, whereas in a cloud environment this control is for most part transferred to users through self-provisioning options and appropriate user delegated privileges and access controls. Similarly, other traditional functions such as operating system and environment specifications and mode of access and prioritizations now become explicit user choices. While this can increase manage- ment efficiency and reduce provisioning costs, the initial base-line configuration of
a cloud computing system is more complex. At NC State University VCL is a high performance open-source award-winning 1 cloud computing technology initially conceived and prototyped in 2004 by NC State’s College of Engineering, Office of Information Technology, and Department
1 2007 Computerworld Honors Program Laureate Medal (CHPLM) for Virtual Computing Laboratory (VCL), 2009 CHPLM for NC State University Cloud Computing Services
258 M.A. Vouk et al. of Computer Science. Since then, VCL development has rapidly progressed in col-
laboration with industry, higher education, and K-12 partners to the point that today it is a large scale, production-proven system which is emerging as a dominant force in the nascent and potentially huge open-source private-cloud market (Seay et al., 2010 ; Schaffer et al., 2009 ; Vouk et al., 2009 ).
The experience at NC State University has shown that for education and research cloud computing implementations a flexible and versatile environment is needed to provide a range of differential services from Hardware-as-a-Service all the way to Cloud-as-a-Service and Security-as-a-Service. In the context of NC State’s VCL we distinguish we define each of these services as follows
• Hardware as a Service (HaaS) – On-demand access to a specific computational, storage and networking product(s) and/or equipment configuration possibly at a particular site
• Infrastructure as a service (IaaS) – On-demand access to user specified hardware capabilities, performance and services which may run on a variety of hardware products
• Platform as a service (PaaS) – On-demand access to user specified combination of hypervisors (virtualizations), operating system and middleware that enables user required applications and services running on either HaaS and/or IaaS
• Application as a Service (AaaS) – On-demand access to user specified applica- tion(s) and content. Software as a Service (SaaS) may encompass anything from PaaS through AaaS
• Higher level services – A range of capabilities of a cloud to offer a composition of HaaS, IaaS, PaaS and AaaS within an envelope of particular policies, such as security policies – for example Security-as-a-Service. Another example are com- posites and aggregates of lower-level service such as a “Cloud-as-a-Service” –
a service that allows a user to define sub-clouds (clusters of resources) that the user controls in full.
At some level all of the above services are available to the NC State VCL users, commensurate with the level and type of privileges granted to the user (Vouk et al., 2008 ; Vouk et al., 2009 ). If one wishes to construct high-performance computing (HPC) services within VCL with a particular topology, or to have the ability to deliver specific end-to-end quality of service, including application performance, it is essential to grant users both HaaS and IaaS access. We find that a carefully con- structed cloud computing implementation offering the basic services listed above can result in good technical performance and increased productivity regardless of whether the cloud computing system is serving commercial customers, educational institutions, or research collaboration functions.
Campus use of VCL has expanded exponentially over the last five years (Fig. 11.2 ). We now have over 30,000+ users and deliver over 100,000 reservations per semester through over 800 service environments (amounting about 500,000 CPU hours annually). In addition, we deliver over 8.500,000 HPC CPU hours annu- ally. In-state initiatives include individual UNC-System universities (e.g., ECU,
11 Integration of High-Performance Computing into Cloud Computing Services 259
Fig. 11.2 VCL usage
NCCU, UNC-CH, UNCG, WCU – technically all UNC System campuses which implement Shibboleth authentication have access to VCL), the NC Community College System (production and pilots in 15 colleges: Beaufort County, Brunswick, Cape Fear, Catawba Valley, Central Piedmont, Cleveland, Edgecombe, Fayetteville Tech, Forsyth Tech, Guilford Tech, Nash, Sandhills, Surry, Wake Tech), and several K-12 pilots and STEM initiatives.
In addition to multiple VCL deployments in the State of North Carolina, regional, national and international interest in VCL has increased dramatically after VCL was accepted as an incubator technology and posted as an open source implementation through the Apache Software Foundation (Apache VCL, 2010 ). Educational insti- tutions such as George Mason University (GMU) have become a VCL leader and innovator for the Virginia VCL Consortium, recently winning the 2009 Virginia Governor’s award for technology innovation, schools such as Southern University Baton Rouge and California State University East Bay are in the process of imple- menting VCL-based clouds. In addition to numerous deployments within the United States, VCL cloud computing implementations, including HPC configurations, have now been deployed world-wide and are providing a rich mix of experiences working with HPC in a cloud computing environment.
VCL’s typical base infrastructure (preferred but not required) is an HPC blade system. This type of system architecture provides capabilities for HPC delivery either as a whole or “sliced and diced” dynamically into smaller units/clusters. This allows the VCL cloud to be appropriately “packaged” to partition research
260 M.A. Vouk et al. clusters and sub-clouds and true high-performance computing from other sets
of highly individualized requirements such as single desktop services, groups of “seats” in classrooms, servers and server farms. These flexible configurations per- mit hardware infrastructure and platforms as services to be simultaneously delivered to users within the overall VCL cloud computing cyberinfrastructure, with each of these services being capable of customized provisioning of software and application as a service.
Figure 11.2 shows the number of VCL reservations made per day by users over last five years. This includes reservations made by individual students for Linux or Windows XP desktops along with some specific applications, but also reserva- tions that researchers may make to construct their own computational sub-clouds, or specialized resource aggregations – including high-performing ones. Figure 11.2 inset is the number of such concurrent reservations per day. What is not included in the data shown in these figures are the reservations that deliver standard queue-based VCL HPC services. We therefore label these as non-HPC (or general-VCL) services, although self-constructed high-performance sub-clouds are still in this category of services.
NC State’s VCL currently has about 2000 blades distributed over three produc- tion data centers and two research and evaluation data centers. About one third of the VCL blades are in this non-HPC service delivery mode, some of the remain- ing blades are in our experimental test-beds and in maintenance mode, and the rest (about 600–800) operate as part of the VCL-HPC ( http://hpc.ncsu.edu ) and are
controlled through a number of LSF 2 queues. There three things to note with reference to Fig. 11.2 :
1. VCL usage, and by implication that of the NC State Cloud Computing environ- ment, has been growing steadily
2. Resource capacity (virtual or bare-machine loaded) kept on the non-HPC side at any one time is proportional to the needed concurrency
3. Non-HPC usage has clear gaps The number of VCL reservations tends to decrease during the night, student vaca-
tions and holidays. On the other hand, if one looks at the NC State demand for HPC cycles we see that it has been growing (Fig. 11.3 ), but that it is much less subject to
seasonal variations (Fig. 11.4 ).
An in depth analysis of the economics of cloud computing (Dreher et al., 2009 ) shows one of the key factors is average utilization level of the resources. Any of our resources that are in VCL-HPC mode are fully utilized (with job backlog queues as long as two to three times the number of jobs running on the clusters). In 2009 (including maintenance down time) VCL-HPC recorded nearly 8.5 million HPC CPU-hours which is greater than a 95+% utilization level, while desktop aug- mentation and similar non-HPC usage recorded 550,000 CPU-hours (over 220,000 individual reservations) yielding about a 10–15% utilization.
2 http://www.platform.com/Products
11 Integration of High-Performance Computing into Cloud Computing Services 261
Fig. 11.3 NC State HPC usage over years
Fig. 11.4 NC State HPC usage over March 2008 – February 2009
To satisfy high demand for HPC, VCL is designed to shift hardware resources from the non-HPC desktop to HPC. To balance workloads during the times when non-HPC use is lower, such as during summer holidays, VCL can automatically move the resources into its HPC mode where they are readily used. When there is again need for non-HPC use, these resources are again automatically moved back in that use pool. As a result, the combined HPC/non-HPC resource usage runs at about 70% level. This is both a desirable and cost-effective strategy requiring an active collaboration of the underlying (cloud) components.
262 M.A. Vouk et al.