Consider the transactional database shown in the following table. Transaction ID
ID: 3919191 • Letter: C
Question
Consider the transactional database shown in the following table.
Transaction ID Items Bought
T100 Plum, Apple, Peach, Orange, Pear, Banana
T200 Cherry, Apple, Peach, Orange, Pear, Banana
T300 Plum, Mango, Orange, Pear, Kiwi, Strawberry
T400 Plum, Watermelon, Avocado, Orange, Banana
T500 Avocado, Apple, Orange, Lemon, Pear
CONDITION: The minimum support is 60% and minimum confidence is 70%.
Based on the CONDITION above, answer the following five questions.
(1) Find all frequent itemsets using the Apriori algorithm. Show how the algorithm works in a
step by step manner.
(2) List all the association rules found by the Apriori algorithm.
(3) Find all frequent itemsets using the FP-tree algorithm. Show the final FP-tree you
constructed. Note that the FP-tree algorithm must have a pre-processing step, which sorts items
in a transaction based on the support values of the items. If several items have the same support
value, the pre-processing step sorts these items based on the alphabetical (lexicographical) order
of the items.
(4) List all the association rules found by the FP-tree algorithm.
(5) In this example, indicate whether the association rules produced by the Apriori algorithm are
the same as those produced by the FP-tree algorithm.
Remove the CONDITION above, and then answer the following ten questions.
(6) What is the maximum number of association rules that can be extracted from this database
(including rules that have zero support)?
(7) What is the maximum size of frequent itemsets that can be extracted (assuming minsup > 0)?
(8) Write an expression for the maximum number of size-3 itemsets that can be derived from this
database (including size-3 itemsets that have zero support).
(9) Write an expression for the maximum number of size-2 itemsets that can be derived from this
database (including size-2 itemsets that have zero support).
(10) Write an expression for the maximum number of size-5 itemsets that can be derived from
this database (including size-5 itemsets that have zero support).
(11) Find an itemset (of size 2 or larger) that has the largest support.
(12) Find an itemset (of any size) that has the largest support.
(13) Find a pair of items, X and Y, such that the rules {X} —> {Y} and {Y} —> {X} have the
same confidence value (if there are no such items X and Y, indicate NONE).
(14) Find a pair of items, X and Y, such that the rule {X} —> {Y} has a larger confidence value
than the rule {Y} —> {X} (if there are no such items X and Y, indicate NONE).
(15) Find a pair of items, X and Y, such that the rule {X} —> {Y} has a smaller confidence
value than the rule {Y} —> {X} (if there are no such items X and Y, indicate NONE).
Explanation / Answer
Answer:
Apriori Algorithm is used to mine the frequent items in the transactional Databases.
It is based on the prior knowledge of frequent itemset properties.
It is level-wise iterative search, where we need to find out the n itemsets,which is used to explore the n+1 itemsets in the algorithm.
Two important properties of Apriori Algorithm reduces the time complexity of this algorithm are defined as:
1) Downward closure property: Any subset of a frequent itemset must be frequent.
2) Apriori Pruning property: If any subset of an itemset is infrequent, then its superset should not be tested (antimonotonicity)
Support. This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears.
Confidence. This says how likely item Y is purchased when item X is purchased, expressed as {X -> Y}.
Part1:
Transaction_id
List_of_items
T100
Plum, Apple, Peach, Orange, Pear, Banana
T200
Cherry, Apple, Peach, Orange, Pear, Banana
T300
Plum, Mango, Orange, Pear, Kiwi, Strawberry
T400
Plum, Watermelon, Avocado, Orange, Banana
T500
Avocado, Apple, Orange, Lemon, Pear
ITERATION1: Generate a Candidate table say C1:
1 itemset
Itemset
Frequency (No. of transactions)
{Plum}
{Apple}
{Peach}
{Orange}
{pear}
{Banana}
{Cherry}
{Mango}
{Kiwi}
{Strawberry}
{Watermelon}
{Avocado}
{Lemon}
3
3
2
5
4
3
2
1
1
1
2
1
Now in your problem the minimum support is 60%, e.g: 3/5=0.6(fulfills your minimum count criteria). Where 3 is the frequency of plum and 5 is the total no. of transactions in your database.
L1: The itemset which satisfies the minimum count.
Itemset
Support_count
{Plum}
3
{Apple}
3
{Orange}
5
{pear}
4
{Banana}
3
Iteration-2:
Now generate candidate set C2, or 2-frequent itemset by generating the combination of each itemset with other. Scan your database and again checks the support count for each combination.
{Plum,apple}
1
{Plum,orange}
3
{plum,pear}
1
{plum,Banana}
2
{Apple,orange}
3
{apple,pear}
3
{apple,banana}
2
{orange,pear}
4
{orange,banana}
3
{pear,banana}
2
From the above iteration make Level L2, by eliminating all the items which does'nt stisfy the minimum support count.
{Plum,orange}
3
{Apple,orange}
3
{apple,pear}
3
{orange,pear}
4
{orange,banana}
3
Iteration3
Now generate candidate set C3, or 3-frequent itemset by generating the combination of each itemset with other. Then Scan your database and again checks the support count for each combination.
*eliminate repeat items
*Apply property 1 of this algorithm in this step,i-e, by comparing all the subsets of each itemset in L2. Therefore, it reduces the time complexity.
Here i mentioned NO for there is no subset of the corresponding itemset in L2 AND YEES vice-versa.
{Plum,orange,apple}(NO)
{Plum,orange,pear}(NO)
{Plum,orange,banana}(NO)
{Apple,orange,pear}(YES)
3
{Apple,orange,banana}(NO)
{orange,pear,banana}(NO)
L3:
{Apple,orange,pear}(YES)
3
ITERATION C4:
{plum ,orange,apple,pear}(YES)
null
Here your algorithm is terminated because the subset of C4 doesn't appear in l3
The item set {Apple,orange,pear} has been found by this algorithm.
Part2:
Now to find association rule, the confidence measure is used.
C(X--->Y)= SUPPORT (X U Y)/SUPPORT(X)
Making the ASSOCIATION Rule for this itemset{Apple,orange,pear} found by the Apriori algorithm:
X Y
{ Apple,orange }--->pear = 3/3=100%
{apple,pear}---->orange =3/3=100%
{orange,pear}---->apple =3/4= 75%
apple---> {orange,pear}= 3/3=100%
orange--->{apple,pear}=3/5=60% (REJECT THIS BECAUSE IT IS LESS THAN MINIMUM CONFIDENCE)
pear---->{ Apple,orange }=3/4=75%
Calculate the confidence for each of the rule mentioned above. In your question the minimum confidence is 70%.
Therefore from the above calculation of confidence, the strong rules are:
{ Apple,orange }--->pear = 3/3=100%
{apple,pear}---->orange =3/3=100%
{orange,pear}---->apple =3/4= 75%
apple---> {orange,pear}= 3/3=100%
pear---->{ Apple,orange }=3/4=75%
Transaction_id
List_of_items
T100
Plum, Apple, Peach, Orange, Pear, Banana
T200
Cherry, Apple, Peach, Orange, Pear, Banana
T300
Plum, Mango, Orange, Pear, Kiwi, Strawberry
T400
Plum, Watermelon, Avocado, Orange, Banana
T500
Avocado, Apple, Orange, Lemon, Pear