Genetic Programming Cloud Computing

multiplication, are examples of the way in which images can be pre-processed in order to enhance information. Furthermore, different spectral indices are used for improved change detection and spectral enhancement studies. For instance, infrared band over red band is used for vegetation distribution, green band over red band is used for mapping surface water bodies and wetland delineation, red band over infrared band is used for mapping turbid waters, and red band over green band is used for mineral mapping Momm Easson, 2011a; Khorram et a.l, 2012. After pre-processing, satellite images are ready for image classification process that converts the original spectral data, which are variable and may show complex relationships across several image bands, into a simple thematic map for end users Khorram et al., 2012. The classification procedure extracts important and valid information from multidimensional data set that is otherwise difficult to understand. Each pixel in an image is assigned to a particular category in a set of categories of interest such as a set of land cover types. In the proposed system, K-means was the classification algorithm used to automatically cluster image pixels with similar spectral characteristics Momm Easson, 2011a. The selection of the K-means algorithm was based on its simple implementation and low computational cost. Any other classification algorithm could have been employed instead. The quantitative measure of the classification accuracy constitutes a post-processing step. In this step, accuracy is calculated by comparing the resultant thematic image with user provided reference information Momm Easson, 2011b. Kappa coefficient K is a common metric that is used to measure the agreement between thematic maps by accounting for any agreement due to random chance of agreement Momm Easson, 2011a. Kappa coefficient lies on a scale between -1 and 1, where 1 indicates complete agreement beyond random chance and 0 indicates agreement solely by chance. Kappa values greater than 0.80 represent strong agreement beyond the random change of agreement, values between 0.40 and 0.80 represent moderate agreement beyond the random change of agreement, and values below 0.40 represent poor agreement beyond the random change of agreement Momm Easson, 2011a; Gong, 2003. Kappa statistics can be computed as: The observed proportional agreement between two images � = � ∑ � �� � �= 1 the expected agreement by chance is: � � = � 2 ∑ � �+ � +� � �= 2 � �+ is the total of the i t row, f + is the total for the i t column. The kappa statistic is: � = � −� � −� � 3

2.2 Genetic Programming

Genetic programming GP is an automated method for generating computer programs that solve specific problems based on principles of natural selection Robinson, 2001; Abraham et al., 2006. Genetic programming starts with thousands of randomly created computer programs where the only successful individuals are progressively evolved over a series of generations. Fitness function in genetic programming determines the successful individuals according to how well they are able to solve the problem. The new generations are created based on mutation and crossover operations. Mutation is the operation where a function is replaced by another function in a solution, while the crossover operation means two solutions are combined to form two new solutions or offspring Robinson, 2001. Table 1 shows genetic programming steps Robinson, 2001; Abraham et al., 2006; Koza, 1992. In the proposed system, solutions are images that are created based on one satellite image. Step Detail Initial Population Random population of possible solutions is generated. The solutions are randomly generated programs and may not solve the problem. Fitness Ranking Using fitness metric, the individual solutions are rated and sorted based on the ability to solve the problem. Selection The solutions with highest fitness values are selected to generate a new generation of solutions. Crossover Parts of selected solutions are replaced with other solutions ’ parts to form new candidate solutions. Mutation Some of the more fit programs are selected and modified to generate new solutions. Repetition Until Success Repeat Fitness Ranking, Selection, Crossover, and Mutation steps until the solution with highest fitness value is found. Table 1. Genetic programming steps

2.3 Cloud Computing

Cloud computing allocates dynamic computing, storage and network resources to deliver large numbers of services to end- users and enable them to share access to these resources from anywhere, at any time, through their connected devices Hwang et al., 2012. Cloud data storage services provide large disk capacity and service interfaces that allow users to place and fetch data. Furthermore, cloud infrastructure provides thousands of computing nodes for any application, which allows programmers to use the power of these machines without considering infrastructure management issues such as handling network failure. Providers of cloud computing have developed workflow and data query platforms to support distributed computing and storage applications. Runtime support of cloud computing providers includes distributed monitoring services, a distributed task scheduler, distributed locking and other services Hwang et al., 2012. One of the popular distributed programming models on the cloud computing platform is MapReduceHadoop. This model is commonly employed to process large data sets in distributed mode over the cloud Apache, 2014. It is mainly used in data analytics, indexing, reputation systems, and data mining. 2.4 Hadoop Hadoop is a software framework that allows writing and running user applications on large data sets. It can easily ISPRS Technical Commission I Symposium, 17 – 20 November 2014, Denver, Colorado, USA This contribution has been peer-reviewed. doi:10.5194isprsarchives-XL-1-27-2014 28 expand to store and process petabytes of data on a thousand or more client machines Apache, 2014; White, 2012. Some features of Hadoop are:  Scalable: New nodes can be added to the Hadoop cluster when needed.  Flexible: It can join multiple data sets in different ways to analysis them easily.  Fault tolerant: When a node fails, the system replicates data to another node in the cluster and continues processing data. Hadoop has two major subprojects: MapReduce and Hadoop Distribute File System HDFS. MapReduce is a programming