Indiana University Bloomington

Luddy School of Informatics, Computing, and Engineering

Technical Report TR587:
Mining Frequent Itemsets Over Arbitrary Time Intervals in Data Streams

Chris Giannella, Jiawei Han, Edward Robertson and Chao Liu
(Nov 2003), 37 pages pages
Abstract:
Mining frequent itemsets over a stream of transactions presents difficult new challenges over traditional mining in static transaction databases. Stream transactions can only be looked at once and streams have a much richer frequent itemset structure due to their inherent temporal nature. We examine a novel data structure, an FPstream, for maintaining information about itemset frequency histories. At any time, requests for itemsets frequent over user-defined time intervals can be serviced by scanning the maintained FPstream producing an approximate answer with error guaranteed to be no worse than a user-specified frequency and temporal threshold. We develop an algorithm for constructing and updating an FPstream structure and present experiments illustrating the time and space required for maintenance.

Available as: