Mining Methods
The most famous story about association rule mining is the “beer and diaper.” Researchers discovered that customers who buy diapers also tend to buy beer. This classic example shows that there might be many interesting association rules hidden in our daily data.
Association rules help to predict the occurrence of one item based on the occurrences of other items in a set of transactions.
Association rules Examples
- People who buy bread will also buy milk; represented as{ bread → milk }
- People who buy milk will also buy eggs; represented as { milk → eggs }
- People who buy bread will also buy jam; represented as { bread → jam }
Association Rules discover the relationship between two or more attributes. It is mainly in the form of- If antecedent than consequent. For example, a supermarket sees that there are 200 customers on Friday evening. Out of the 200 customers, 100 bought chicken, and out of the 100 customers who bought chicken, 50 have bought Onions. Thus, the association rule would be- If customers buy chicken then buy onion too, with a support of 50/200 = 25% and a confidence of 50/100=50%.
Association rule mining is a technique to identify interesting relations between different items. Association rule mining has to:
- Find all the frequent items.
- Generate association rules from the above frequent itemset.
There are many methods or algorithms to perform Association Rule Mining or Frequent Itemset Mining, those are:
- Apriori algorithm
- FP-Growth algorithm
Apriori algorithm
The Apriori algorithm is a classic and powerful tool in data mining used to discover frequent itemsets and generate association rules. Imagine a grocery store database with customer transactions. Apriori can help you find out which items frequently appear together, revealing valuable insights like:
- Customers buying bread often buy butter and milk too. (Frequent itemset)
- 70% of people who purchase diapers also buy baby wipes. (Association rule)
How Apriori algorithm works:
- Bottom-up Approach: Starts with finding frequent single items, then combines them to find frequent pairs, triplets, and so on.
- Apriori Property: If a smaller itemset isn't frequent, none of its larger versions can be either. This "prunes" the search space for efficiency.
- Support and Confidence: Two key measures used to define how often an itemset appears and how strong the association between items is.
Limitations for Apriori algorithm
- Can be computationally expensive for large datasets.
- Sensitive to minimum support and confidence thresholds.
FP-Growth algorithm
FP-Growth stands for Frequent Pattern Growth, and it's a smarter sibling of the Apriori algorithm for mining frequent itemsets in data. But instead of brute force, it uses a clever strategy to avoid generating and testing tons of candidate sets, making it much faster and more memory-efficient.
Here's its secret weapon:
- Frequent Pattern Tree (FP-Tree): This special data structure efficiently stores the frequent itemsets and their relationships. Think of it as a compressed and organized representation of your grocery store database.
- Pattern Fragment Growth: Instead of building candidate sets, FP-Growth focuses on "growing" smaller frequent patterns (fragments) by adding items at their frequent ends. This avoids the costly generation and scanning of redundant patterns.
Advantages of FP-Growth over Apriori
- Faster for large datasets: No more candidate explosions, just targeted pattern growth.
- Less memory required: The compact FP-Tree minimizes memory usage.
- More versatile: Can easily mine conditional frequent patterns without building new trees.
When to Choose FP-Growth
- If you're dealing with large datasets and want faster results.
- If memory limitations are a concern.
- If you need to mine conditional frequent patterns.
Remember: Both Apriori and FP-Growth have their strengths and weaknesses. Choosing the right tool depends on your specific data and needs.
Next Topic :Apriori algorithm