Consider the transactional database shown in the following table. Transaction ID

ID: 3919191 • Letter: C

Question

Consider the transactional database shown in the following table.
Transaction ID Items Bought
T100 Plum, Apple, Peach, Orange, Pear, Banana
T200 Cherry, Apple, Peach, Orange, Pear, Banana
T300 Plum, Mango, Orange, Pear, Kiwi, Strawberry
T400 Plum, Watermelon, Avocado, Orange, Banana
T500 Avocado, Apple, Orange, Lemon, Pear
CONDITION: The minimum support is 60% and minimum confidence is 70%.
Based on the CONDITION above, answer the following five questions.
(1) Find all frequent itemsets using the Apriori algorithm. Show how the algorithm works in a
step by step manner.
(2) List all the association rules found by the Apriori algorithm.
(3) Find all frequent itemsets using the FP-tree algorithm. Show the final FP-tree you
constructed. Note that the FP-tree algorithm must have a pre-processing step, which sorts items
in a transaction based on the support values of the items. If several items have the same support
value, the pre-processing step sorts these items based on the alphabetical (lexicographical) order
of the items.
(4) List all the association rules found by the FP-tree algorithm.
(5) In this example, indicate whether the association rules produced by the Apriori algorithm are
the same as those produced by the FP-tree algorithm.

Remove the CONDITION above, and then answer the following ten questions.
(6) What is the maximum number of association rules that can be extracted from this database
(including rules that have zero support)?
(7) What is the maximum size of frequent itemsets that can be extracted (assuming minsup > 0)?
(8) Write an expression for the maximum number of size-3 itemsets that can be derived from this
database (including size-3 itemsets that have zero support).
(9) Write an expression for the maximum number of size-2 itemsets that can be derived from this
database (including size-2 itemsets that have zero support).
(10) Write an expression for the maximum number of size-5 itemsets that can be derived from
this database (including size-5 itemsets that have zero support).
(11) Find an itemset (of size 2 or larger) that has the largest support.
(12) Find an itemset (of any size) that has the largest support.
(13) Find a pair of items, X and Y, such that the rules {X} —> {Y} and {Y} —> {X} have the
same confidence value (if there are no such items X and Y, indicate NONE).
(14) Find a pair of items, X and Y, such that the rule {X} —> {Y} has a larger confidence value
than the rule {Y} —> {X} (if there are no such items X and Y, indicate NONE).
(15) Find a pair of items, X and Y, such that the rule {X} —> {Y} has a smaller confidence
value than the rule {Y} —> {X} (if there are no such items X and Y, indicate NONE).

Explanation / Answer

Answer:

Apriori Algorithm is used to mine the frequent items in the transactional Databases.

It is based on the prior knowledge of frequent itemset properties.

It is level-wise iterative search, where we need to find out the n itemsets,which is used to explore the n+1 itemsets in the algorithm.

Two important properties of Apriori Algorithm reduces the time complexity of this algorithm are defined as:

1) Downward closure property: Any subset of a frequent itemset must be frequent.

2) Apriori Pruning property: If any subset of an itemset is infrequent, then its superset should not be tested (antimonotonicity)

Support. This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears.

Confidence. This says how likely item Y is purchased when item X is purchased, expressed as {X -> Y}.

Part1:

Transaction_id

List_of_items

T100

Plum, Apple, Peach, Orange, Pear, Banana

T200

Cherry, Apple, Peach, Orange, Pear, Banana

T300

Plum, Mango, Orange, Pear, Kiwi, Strawberry

T400

Plum, Watermelon, Avocado, Orange, Banana

T500

Avocado, Apple, Orange, Lemon, Pear

ITERATION1: Generate a Candidate table say C1:

1 itemset

Itemset

Frequency (No. of transactions)

{Plum}

{Apple}

{Peach}

{Orange}

{pear}

{Banana}

{Cherry}

{Mango}

{Kiwi}

{Strawberry}

{Watermelon}

{Avocado}

{Lemon}

Now in your problem the minimum support is 60%, e.g: 3/5=0.6(fulfills your minimum count criteria). Where 3 is the frequency of plum and 5 is the total no. of transactions in your database.

L1: The itemset which satisfies the minimum count.

Itemset

Support_count

{Plum}

{Apple}

{Orange}

{pear}

{Banana}

Iteration-2:

Now generate candidate set C2, or 2-frequent itemset by generating the combination of each itemset with other. Scan your database and again checks the support count for each combination.

{Plum,apple}

{Plum,orange}

{plum,pear}

{plum,Banana}

{Apple,orange}

{apple,pear}

{apple,banana}

{orange,pear}

{orange,banana}

{pear,banana}

From the above iteration make Level L2, by eliminating all the items which does'nt stisfy the minimum support count.

{Plum,orange}

{Apple,orange}

{apple,pear}

{orange,pear}

{orange,banana}

Iteration3

Now generate candidate set C3, or 3-frequent itemset by generating the combination of each itemset with other. Then Scan your database and again checks the support count for each combination.

*eliminate repeat items

*Apply property 1 of this algorithm in this step,i-e, by comparing all the subsets of each itemset in L2. Therefore, it reduces the time complexity.

Here i mentioned NO for there is no subset of the corresponding itemset in L2 AND YEES vice-versa.

{Plum,orange,apple}(NO)

{Plum,orange,pear}(NO)

{Plum,orange,banana}(NO)

{Apple,orange,pear}(YES)

{Apple,orange,banana}(NO)

{orange,pear,banana}(NO)

L3:

{Apple,orange,pear}(YES)

ITERATION C4:

{plum ,orange,apple,pear}(YES)

null

Here your algorithm is terminated because the subset of C4 doesn't appear in l3

The item set {Apple,orange,pear} has been found by this algorithm.

Part2:

Now to find association rule, the confidence measure is used.

C(X--->Y)= SUPPORT (X U Y)/SUPPORT(X)

Making the ASSOCIATION Rule for this itemset{Apple,orange,pear} found by the Apriori algorithm:

X Y

{ Apple,orange }--->pear = 3/3=100%

{apple,pear}---->orange =3/3=100%

{orange,pear}---->apple =3/4= 75%

apple---> {orange,pear}= 3/3=100%

orange--->{apple,pear}=3/5=60% (REJECT THIS BECAUSE IT IS LESS THAN MINIMUM CONFIDENCE)

pear---->{ Apple,orange }=3/4=75%

Calculate the confidence for each of the rule mentioned above. In your question the minimum confidence is 70%.

Therefore from the above calculation of confidence, the strong rules are: