Programming Breaks: Criteria to find the best rules in association rule mining

Sunday, August 11, 2013

Criteria to find the best rules in association rule mining

Criteria 1: Support Level
In association rule mining (ARM), we can rank the association rules by their support level. In this way, for the combination of two product or items, we may measure the fitness of an association rule (Ci, Nj) by its support level in the database Support(Ci, Nj).

Support(Ci, Nj) can be measured as ActualCount(Ci, Nj) or ActualCount(Ci, Nj) / TotalCount

Criteria 2: Chi-Square
However, from statistics, we know that it is possible the high support level may be by chance. Therefore, we want to know that the high support level, Support(Ci, Nj), of an association rule is a true phenomenon instead of being randomly created by chance. To this end, we can measure the ChiSquare(Ci, Nj). The details of ChiSquare(Ci, Nj) calculation can be found in the following blog:

http://czcodezone.blogspot.sg/2013/08/three-approaches-to-measure-how.html

The higher the ChiSquare(Ci, Nj), the least likely that Support(Ci, Nj) is a random sample by chance. With this, if we combine the Support(Ci, Nj) and ChiSquare(Ci, Nj), we can find the best rules by selecting association rules with both high Support(Ci, Nj) and ChiSquare(Ci, Nj).

Criteria 3: Lift
Another criteria is lift, which can be measure by:

Lift(Ci, Nj) = [ActualCount(Ci, Nj) / ActualCount(Ci)] / [ActualCount(Nj) / TotalCount]

Lift tells us how much better the rule does rather than guessing, the higher Life(Ci, Nj), the better the rule (Ci, Nj)

Criteria 4: Confidence
Confidence is measured by:

Confidence(Ci, Nj) = ActualCount(Ci, Nj) / ActualCount(Ci)

It measure how often the rule (Ci, Nj) is true, given the Ci is true, the higher Confidence(Ci, Nj), the better the rule

Programming Breaks

Sunday, August 11, 2013

Criteria to find the best rules in association rule mining

No comments:

Post a Comment

Blog Archive

Labels