association rule mining algorithm to create fingerprints of users and areas.
The remainder of this paper is organized as follows: Section 2 reviews related works on utilizing geo-social
networks to uncover human behavioral patterns. Section 3 explains the concept of association rule mining. Section 4
describes an overall process of data preparing for association rule mining. Section 5 illustrates the experiment that the authors
conducted to extract crowd behavioral patterns and urban characteristics through data gathered from Foursquare for Italy.
Finally, the conclusion of paper is presented Section 6.
2. RELATED WORK
In recent years, many studies have been conducted on geo- social networks to discover users’ characteristics including
social relations, mobility patterns, preferences etc. For instance, in one of the large-scale research studies carried out on geo-
social networks Li Chen, 2009, much data from these networks such as Brightkite was collected and the results were
analyzed in terms
of users’ features, their travel experience, and their relation. In another study, researchers attempted to
characterize the relationship that exists between people’s cyber interests and their mobility properties Trestian et al., 2009. In
a study by Gao et al., 2012, the pattern of user check-ins with respect to social-historical ties was explored. For example, they
found out that users with friends tend to go to similar locations than those without.
In addition, a number of researches were carried out in this field in order to discover the geo-spatial characteristics. For example,
in a study on extremely large anonymized mobility data, the most visited areas by tourists during the day and the typical time
of the visits were identified Reades et al., 2009. In another study, a probabilistic topic model was adopted for extraction of
urban patterns and recurrent behaviors from location-based social networks Ferrari et al., 2011. Also in a research carried
out by Lee et al., 2013, a novel method was proposed to discover urban characteristics by exploiting common behavioral
patterns.
3. ASSOCIATION RULE MINING
Association rule mining is one of the most important and well researched techniques of data mining which aims to extract
interesting correlations, frequent patterns, associations or casual structures among sets of items Han, 2000.
Unlike other data mining functions, association is transaction- based. In transactional data, a collection of items is associated
with each case. The collection can theoretically include all possible members of the collection. However, in actuality, only
a tiny subset of all possible items is present in a given transaction.
The first step in association analysis is the enumeration of item sets. An item set is any combination of two or more items in a
transaction. The maximum number of items in an item set is specified by the user. If the maximum is two, all the item pairs
will be counted. If the maximum is greater than two, all the item pairs, all the item triples, and all the item combinations up to
the specified maximum will be counted. Association rules are calculated from item sets. Frequent item sets are those that
occur with a minimum frequency specified by the user. The minimum frequent item set support is a user-specified
percentage that limits the number of item sets used for association rules. An item set must appear in at least this
percentage of all transactions if it is to be used as a basis for rules.
An association mining problem can be decomposed into two sub-problems:
Finding all combinations of items in a set of
transactions that occur with a specified minimum frequency. These combinations are called frequent
item sets;
Calculating the rules that express the probable concurrence of items within frequent item sets.
One of the most well-known algorithms to solve association rule mining is Apriori algorithm Han Kamber, 2006. The
Apriori algorithm calculates the rules that express probabilistic relationships between items in frequent item sets. For example,
a rule derived from frequent item sets containing A, B, and C might state that if A and B are included in a transaction, then C
is also likely to be included. The IF component of an association rule is known as the antecedent. The THEN
component is known as the consequent. 3.1
Metrics for Association Rules Support
The support of a rule indicates how frequently the items in the rule occur together. Support is the percentagefraction of
transactions that include all the items in the antecedent and consequent to the total number of records in the database.
Confidence
The confidence of a rule is the conditional probability of the consequent given the antecedent. It is a measure of strength of
the association rules. Association rule mining is to find out association rules that
satisfy the predefined minimum support and confidence from a given database. However, there are times when both of these
measures may be high, and yet still produce a rule that is not useful.
A third measure is needed to evaluate the quality of the rule. Lift indicates the strength of a rule over the random co-
occurrence of the antecedent and the consequent, given their individual support. It provides information about the
improvement, the increase in probability of the consequent given the antecedent. Lift is defined as follows:
RuleSupport Support Antecedent
Support Consequent
1 Any rule with Lift of less than 1 does not prove to be a really
useful rule.
4. DATA PREPARATION