multiplication, are examples of the way in which images can be pre-processed in order to enhance information. Furthermore,
different spectral indices are used for improved change detection and spectral enhancement studies. For instance,
infrared band over red band is used for vegetation distribution, green band over red band is used for mapping surface water
bodies and wetland delineation, red band over infrared band is used for mapping turbid waters, and red band over green band is
used for mineral mapping Momm Easson, 2011a; Khorram et a.l, 2012. After pre-processing, satellite images are ready for
image classification process that converts the original spectral data, which are variable and may show complex relationships
across several image bands, into a simple thematic map for end users Khorram et al., 2012. The classification procedure
extracts important and valid information from multidimensional data set that is otherwise difficult to understand. Each pixel in
an image is assigned to a particular category in a set of categories of interest such as a set of land cover types. In the
proposed system, K-means was the classification algorithm used to automatically cluster image pixels with similar spectral
characteristics Momm Easson, 2011a. The selection of the K-means algorithm was based on its simple implementation and
low computational cost. Any other classification algorithm could have been employed instead.
The quantitative measure of the classification accuracy constitutes a post-processing step. In this step, accuracy is
calculated by comparing the resultant thematic image with user provided reference information Momm Easson, 2011b.
Kappa coefficient K is a common metric that is used to measure the agreement between thematic maps by accounting
for any agreement due to random chance of agreement Momm Easson, 2011a. Kappa coefficient lies on a scale between -1
and 1, where 1 indicates complete agreement beyond random chance and 0 indicates agreement solely by chance. Kappa
values greater than 0.80 represent strong agreement beyond the random change of agreement, values between 0.40 and 0.80
represent moderate agreement beyond the random change of agreement, and values below 0.40 represent poor agreement
beyond the random change of agreement Momm Easson, 2011a; Gong, 2003. Kappa statistics can be computed as:
The observed proportional agreement between two images
� =
�
∑ �
�� �
�=
1 the expected agreement by chance is:
�
�
=
�
2
∑ �
�+
�
+� �
�=
2 �
�+
is the total of the i
t
row, f
+
is the total for the i
t
column. The kappa statistic is:
� =
� −�
�
−�
�
3
2.2 Genetic Programming
Genetic programming GP is an automated method for generating computer programs that solve specific problems
based on principles of natural selection Robinson, 2001; Abraham et al., 2006. Genetic programming starts with
thousands of randomly created computer programs where the only successful individuals are progressively evolved over a
series of generations. Fitness function in genetic programming determines the successful individuals according to how well
they are able to solve the problem. The new generations are created based on mutation and crossover operations. Mutation is
the operation where a function is replaced by another function in a solution, while the crossover operation means two solutions
are combined to form two new solutions or offspring Robinson, 2001. Table 1 shows genetic programming steps
Robinson, 2001; Abraham et al., 2006; Koza, 1992. In the proposed system, solutions are images that are created based on
one satellite image. Step
Detail Initial Population
Random population
of possible
solutions is
generated. The solutions are randomly generated programs
and may not solve the problem.
Fitness Ranking Using fitness metric, the
individual solutions are rated and sorted based on the ability
to solve the problem.
Selection The solutions with highest
fitness values are selected to generate a new generation of
solutions.
Crossover Parts of selected solutions are
replaced with other solutions ’
parts to form new candidate solutions.
Mutation Some of the more fit programs
are selected and modified to generate new solutions.
Repetition Until Success Repeat
Fitness Ranking,
Selection, Crossover,
and Mutation
steps until
the solution with highest fitness
value is found. Table 1. Genetic programming steps
2.3 Cloud Computing
Cloud computing allocates dynamic computing, storage and network resources to deliver large numbers of services to end-
users and enable them to share access to these resources from anywhere, at any time, through their connected devices Hwang
et al., 2012. Cloud data storage services provide large disk capacity and service interfaces that allow users to place and
fetch data. Furthermore, cloud infrastructure provides thousands of computing nodes for any application, which allows
programmers to use the power of these machines without considering infrastructure management issues such as handling
network failure. Providers of cloud computing have developed workflow and data query platforms to support distributed
computing and storage applications. Runtime support of cloud computing providers includes distributed monitoring services, a
distributed task scheduler, distributed locking and other services Hwang et al., 2012. One of the popular distributed
programming models on the cloud computing platform is MapReduceHadoop. This model is commonly employed to
process large data sets in distributed mode over the cloud Apache, 2014. It is mainly used in data analytics, indexing,
reputation systems, and data mining. 2.4
Hadoop
Hadoop is a software framework that allows writing and running user applications on large data sets. It can easily
ISPRS Technical Commission I Symposium, 17 – 20 November 2014, Denver, Colorado, USA
This contribution has been peer-reviewed. doi:10.5194isprsarchives-XL-1-27-2014
28
expand to store and process petabytes of data on a thousand or more client machines Apache, 2014; White, 2012. Some
features of Hadoop are:
Scalable: New nodes can be added to the Hadoop cluster when needed.
Flexible: It can join multiple data sets in different ways to
analysis them easily.
Fault tolerant: When a node fails, the system replicates data to another node in the cluster and continues
processing data.
Hadoop has two major subprojects: MapReduce and Hadoop Distribute File System HDFS. MapReduce is a programming