"If the client bought the product A1 and the product A4, so he also bought the product A6 in 74% of the cases. This rule applies on 20% of the studied cases."
The source is docummented so you can easily understand all the rules generation process.
Let's see in action ?!
Let's see in action:
So let's get started. First I will create a transaction database and then generate the list of frequent itemsets and association rules.
I will be with confidence of 70%. So we have the following rules:
But wait, before start using it at your dataset, I have to give sou some warnings. Finding different combinations of items can be a very consuming task and expensive in terms of computer proccessing. So you will need more intelligent approaches to find frequent itemsets in a small amount time. Apriori is one approach that tries to reduce the number of sets that are chacked against the dataset. With Support measure and Confidence we can combine both to generate association rules.
The main problem of Apriori Algorithm is it requires to scan over the dataset each time we increase the length of our frequent itemsets. Imagine with a huge dataset, it can slow down the speed of finding frequent itemsets. Alternative techniques for this issue is the FP-Growth algorithm, for example. It only needs to go over the dataset twice, which can led to a better performance.
I hope you enjoyed this article, the first of 2013!