On Economical Use of the Enterprise Cloud

23.4.2 On Economical Use of the Enterprise Cloud

If grid computing has been dormant in the advocacy for wider adoption in the academia, it is deemed necessary to question whether cloud computing can take place in the candidacy of scientific computing platform. The first question can be, “How easy and efficient it is to do science in the cloud?” When efficiency and eco- nomical use of resources comes in mind, the question can be later followed with “How economical is it to do science in the cloud?” We will provide our perspective based on the experience of using Amazon cloud services.

The term pay-as-you-go can be tempting for an institution with insufficient infrastructural resource but eager to take benefit of enterprise infrastructure offered

1 In Hadoop terminology, a job is divided into several tasks

23 Feasibility Study and Experience 547 at low unit price. In Langmead et al. ( 2009 ), the cost incurred for analyzing human

genome data with 320-CPU cluster in EC2 was $85. According to the authors, the approach is beneficial since it condensed 1000 h of computation into a few hours without requiring the user to own or operate a computer cluster, which cost much more if purchased. We both agree and disagree to this opinion and explain the underlying arguments for the statement.

There are two services offered by Amazon that can be pertinent to sci- entific compute task, Amazon EC2 and Amazon Elastic MapReduce. Amazon Elastic MapReduce is tailored to applications built on top of MapReduce model. Researchers and developers can conveniently test and deploy their MapReduce- based applications on the platform and scale the amount of resources used only by using a web interface. However, there are also other scientific applications not based on MapReduce model. To deploy such applications in the cloud, bundling and packaging from the ground up is required. An image containing the applica- tions and container OS should be created and referenced every time a new virtual machine instance is launched.

In the absence of suitable images, user should manually configure the basic VMI offered by provider. Upon the creation of an instance, user should manually down- load the application binaries or build the sources and related dependencies when binaries are not available or compatibility is an issue. As the cloud operates on utility basis, downloading binaries or sources and working remotely to configure

a virtual machine is considered as using provider’s infrastructure hence user is charged for the data transfer. In Amazon EC2 case for example, by January 2010,

a standard Linux large instance in North Virginia (7.5 GB memory, 2 virtual cores, 850 GB local storage, 64-bit architecture) cost $0.34/h with data transfer out start- ing from $0.15/GB and decreasing progressively, and free to $0.1/GB unit price of data transfer in. A user who is running the instance for one day (24 h) with 1 TB data in and 1 TB data out should pay $158.16 consisting of $8.16 compute instance fee and $150 data transfer fee. If the user activity is configuring the standard vir- tual machine, he may have not anticipated that he is also being charged for the activity which may potentially contribute significant number to the final amount

he has to pay. When he has to configure a decent number of virtual machines, the cost will be multiplied, which is getting worse when failure occurs for sev- eral installations, thus configuration has to be done several times. Consequently, this additional cost can reduce the economical benefit of doing computation in the cloud.

To put it into a bigger picture, the current offer by cloud providers is attractive but the hidden cost may hinder the wider use in the academia. Current cloud-based applications either use a private cloud or a public cloud. If scaling can be done more dynamically by integrating virtualized resources in a private cloud with the public cloud by on-demand basis, the cost efficiency can be improved. Also, there should

be decent number of VMIs supporting various scientific applications and a mech- anism that can self-describe applications bundled within a VMI. We then propose the idea and current effort for the cloud integration and interoperability in the next section.

548 M.F. Simalango and S. Oh