Negative and cyclic association rule

Negative and Cyclic
association rules
Presented by
Saurabh Singh Mathuriya (2013MCS2581)
Under the guidance of
Dr.S.K.Gupta

Department of Computer Science & Engineering

Negative Association Rules
 describes relationships between

item sets and implies the occurrence
of some item sets characterized by
the absence of others In contrast to
positive association rules.

Need
of Negative
Need
of Negative

ARAR
:
• Find the set of items which do not

present in a transaction together.
Unexpected patterns and exceptional patterns are

referred to as exceptions of rules in positive
association.
Example, while ‘bird(x) ⇒ flies(x)’ is a well-known
fact, but an exceptional rule is
‘bird(x), penguin(x)⇒ ¬ flies(x)’.
This exception indicates that unexpected patterns
and can involve negative terms and therefore
treated as a special case of negative rules.

Comparison
with Positive
PositiveAR
AR

Comparison with
Positive association rules consider only

items enumerated in transactions like
people buying milk & wheat bread
together.
Negative association rules might also

consider the same items, but in addition do
consider negated items (i.e. absent from
transactions) like people buying milk &
bread together but not cold-drink as well.

What is Negative Association Rules
If X and Y are set of items
and {X} => {~Y} is a NAR
Negative Association rule.
Rereerfr
• It means itemset X and Y are
negatively correlated.



In most of the case where X is
present
there Y is absant.

What is Negative AR contd.
¬ B also has a measure of
its strength, conf, defined as the ratio supp(A∪
¬ B)/supp(A).

 a negative rule A ⇒

Support-confidence framework for negative rules

—A and B are disjoint itemsets, that is, A ∩ B = ∅;
—supp(A) ≥ minsup, supp(B) ≥ minsup;
—supp(A⇒ ¬ B) = supp(A∪ ¬ B);
—conf(A⇒ ¬ B) = supp(A∪ ¬ B)/supp(A) ≥ minconf.


Negative Rules Forms
A⇒ ¬ B,

- Ex, People of age < 30 (less than 30) would NOT buy sedan
¬ A⇒ B
- Ex, People of age > 30 (NOT less than 30) would buy sedan
¬ A⇒ ¬ B
 Ex, People of age > 30 (NOT less than 30) would NOT buy

sedan
 (Note that although rule in the form of :X -> :Y contains
negative elements, it is equivalent to a positive association
rule in the form ofY -> X. Therefore it is not considered as a
negative association rule.)

Assumption 1: The minimum support is 30% and
minimum confidence is 70%.
Assumption 2: The numeric attribute AGE ranges from
18 to 70 and is quantized into two groups - less than
thirty and over thirty.


The rule that satisfies both minimum
support and minimum confidence criterion
is “{age < 30} -> {coupe}”, the confidence of
which is 75%. negative association rule
exists: “{age >30}->{not purchasing
coupe}”’, which has a confidence of 83.3%.
For the purpose of identifying purchase
pattern, it is obvious that the latter has
better predictive ability.
The preceding example illustrates that
negative association rules are as important
as positive ones.

Confidence of Negative AR
•To avoid counting them directly, we can
compute using

Locality of similarity
 We cannot find the positive rules with small


support and confidence values because that will
result in many uninteresting rules.
To eliminate unwanted rules and focus on
potential interesting ones, we predict possible
interesting negative AR by incorporating
domain knowledge of the data sets.
We use Taxonomy T for this.Which consist
vertex and directed edge.
Every vertex is a class and vertex which with
degree 0 is most general class and which
having out degree 0 is most specific class

LOS contd.
 Taxonomy T consists of vertexes and directed edges. Each

vertex represents a class.
 vertical relationship semantics is that the lower level vertex
values are instances of the values of immediate predecessor
vertexes, i.e., the is-a relationship. In a vertical relationship

is used to discover generalized association rules.
 semantics of the horizontal relationship is that the vertexes
on the same level having the same immediate predecessor
(siblings to borrow from rooted tree terminology)
encapsulate similarity among classes.

LOS Contd.
 Items belonging to the same LOS tend to participate in

similar association rules. This is generally true because
members of such groups tend to have similar activity
patterns.
 For example, in a retail database, instances are items
involved in transactions and customers are participants. If
there is no preference for each person, the purchase
probability of each item will be evenly distributed over all
brands.
 LOS can be extended to different levels following the same
parent node. For instance, it is more reasonable to put ‘IBM
Aptiva’, ‘Compaq Deskpro’, ‘Notebook’, and ‘Parts’ into one

LOS when viewing the database at a more abstract level.
 Intuitively, siblings are in the same LOS.

LOS Contd.

Discovering Negative Rules
to qualify as a negative rule, it must satisfy

two conditions:
first, there must exist a large deviation
between the estimated and actual
confidence ,that is similarity measure (SM).
second, the support and confidence are
greater than the minima required.

Pruning
In constructing candidate negative rules, there

are possibilities that an equivalent or similar
pair is generated

Another redundancy exists when items from
a LOS and all are sibling rules.
The pruning will either keep all positive ones
or keep all negative ones that have high
confidence.
An example is the pruning between rule
“Female -> `BuyHat”, and “!Male -> `BuyHat”.

Algorithm
// finding all positive rules
1. FreqSet1 = {frequent 1-item sets}
2. Find all positive rules
// Generate Negative rules
3. Delete all items from taxonomy, t which are not frequent
4. For all rules r (positive rules)
5.
6.
7.

TmpRuleSet = genNegCan(r)

For all rules tr (TmpRuleSet)
If SM(tr.conf,t.conf) > minConf
Rule = {Rule, Neg(tr) | Neg(tr).supp > minsup,Neg(tr).conf > minconf

8.

};
9.

endif;

10. endfor;
// pruning
11. Prune those result which have same meaning

Example

Results

Results contd.


Conclusion
Given the number of positive rules P and the average size

of the LOS L, the complexity of the algorithm is O(P x L).
Complexity does not depend on the number of
transactions since it is assumed that the supports of item
sets have been counted and stored for use in this as well as
other mining applications.
The complexity of discovering positive rules depends on not
only the number of transactions, but also the sizes of
attribute domains as well as the number of attributes.
The overall complexity of finding Negative ARwill be
proportional to that of discovering positive rules. The
performance is also affected by the choice of minimum
support.

Applications
Large DBs output results

- Helps to limit the search space in
huge databases by combining the
known positive associations with
negative rules as well based on domain
knowledge.
 Example, the positive association of
buying milk & bread can be combined
with NOT buying a bottle of beer.

Limitations
First, we cannot simply pick threshold

values for support and confidence that
are guaranteed to be effective in sifting both
Impractical
volume
of negative
rules if not chosen
positive and
negative
rules.

appropriately, which might impact performance
- For example, if there are 50,000 items in a store
then the possible combinations of items is 2 50000
wherein a majority of them will never appear together
even once in the entire database. Now if the absence
of a certain item combination is taken to mean
negative association, then we can generate millions of
negative association rules. However, most of these
rules are likely to be extremely uninteresting.
- Solution : There is a need to explicitly find out only

Cyclic Association Rules
Some item sets occurs after a certain period

of times.
The rule has the minimum confidence and

support at
regular time intervals. It need not hold for the
entire transactional database.

Overview
Step1: The dataset is divided into time segments.
Step2: Existing methods for discovering frequent item sets

in each segment.
Step3: Then pattern matching algorithms to detect cycles
in
association rules is applied.
Step4: techniques called cycle pruning and cycle skipping
which allow us to significantly reduce the amount
of
wasted work performed during the data mining
process

Problem Definition
We denote the ith time unit, i

0; by ti .
That is ti corresponds to the time interval [i.t,(i+1).t)
where t is the unit of time.
We denote the set of transactions executed in t i by D[i].
support of an itemset X in D[j] is the fraction of
transactions in D[j] that contain the itemset.
confidence of a rule X → Y in D[j] is the fraction of
transactions in D[j] containing X that also contain Y.
An association rule X→Y holds in time unit tj if the
support of XUY in D[j] exceeds supmin and the
confidence of X→Y in T[j] exceeds conmin
>=

Problem Definition contd.
 A cycle c is a tuple (l, o) consisting of a length l

and an offset o (the first time unit in which the
cycle occurs), 0 < o < l. We say that an association
rule has a cycle c = (l, o) if the association rule
holds in every lth time unit starting with time unit to
For example, if the unit of time is an hour and
“Tea=>Biscuit” holds during the interval 7AM-8AM
every day (i.e., every 24 hours), then “Tea =>
Biscuit” has a cycle (24, 7).
A cycle (li, oi) is a multiple of another cycle (lj , oj )
if lj divides li and (oj = oi mod lj ) holds.

Problem Definition contd.
 A time unit ti is said to be “part of cycle c” or

“participate in cycle c” if o = i mod l holds.
Exampleif the binary sequence 001100010101
represents
the association rule X -> Y ; then X -> Y holds
in
D[2], D[3], D[7], D[9], and D[11].
In this sequence, (4, 3) is a cycle

Modifying existing algorithm
 The existing algorithms for discovering

association rules cannot be applied directly.
extend the set of items with time attributes,
and then generate the rules.

The Sequential Algorithm
Step1 : Finding association rules(in each time segment)
– Maxima frequent item sets are generated.
– Association rules are generated from the large
itemsets
Step2 : Cycle detection
– By patern matching algorithm
Complexity of the cycle detection phase has an upper
bound of
O( r * n * lmax ).
- r- no of rules detected
-n is the no of segment
-lmax the maximum lengh of cycle of interest.

The Sequential Algorithm contd.
Cycle-Pruning, Cycle-Skipping and cycle
elimination
The major portion of the running time of the
sequential algorithm is spent to calculate the
support for itemsets.
A cycle of the rule X-> Y is a multiple of a cycle of
itemset XUY
Cycle Skipping:
If time unit ti is not part of a cycle of an itemset
X,then there is no need to calculate the support for
X in time segment D[i].

Cycle pruning :
If an itemset X has a cycle (l,o),
then any of the subsets of X has the
cycle (l,o).
Cycle elimination:
If the support for an itemset X is
below the minimum support
threshold supmin in time segment D[i],
then X cannot have any of the cycles
(j, i mod j) lmin

Dokumen yang terkait

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

19 819 7

ANTARA IDEALISME DAN KENYATAAN: KEBIJAKAN PENDIDIKAN TIONGHOA PERANAKAN DI SURABAYA PADA MASA PENDUDUKAN JEPANG TAHUN 1942-1945 Between Idealism and Reality: Education Policy of Chinese in Surabaya in the Japanese Era at 1942-1945)

1 29 9

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

7 202 3

Improving the VIII-B Students' listening comprehension ability through note taking and partial dictation techniques at SMPN 3 Jember in the 2006/2007 Academic Year -

0 63 87

The Correlation between students vocabulary master and reading comprehension

16 145 49

The correlation intelligence quatient (IQ) and studenst achievement in learning english : a correlational study on tenth grade of man 19 jakarta

0 57 61

An analysis of moral values through the rewards and punishments on the script of The chronicles of Narnia : The Lion, the witch, and the wardrobe

1 59 47

Improping student's reading comprehension of descriptive text through textual teaching and learning (CTL)

8 140 133

The correlation between listening skill and pronunciation accuracy : a case study in the firt year of smk vocation higt school pupita bangsa ciputat school year 2005-2006

9 128 37

Transmission of Greek and Arabic Veteri

0 1 22