Wednesday, October 19, 2011

EFFICIENT PERIODICITY MINING IN TIME SERIES DATABASES USING SUFFIX TREES


EFFICIENT PERIODICITY MINING IN TIME SERIES DATABASES USING SUFFIX TREES

ABSTRACT

Periodic pattern mining or periodicity detection has a number of applications, such as prediction, forecasting, detection of unusual activities, etc. The problem is not trivial because the data to be analyzed are mostly noisy and different periodicity types (namely symbol, sequence, and segment) are to be investigated. Accordingly, we argue that there is a need for a comprehensive
Approach capable of analyzing the whole time series or in a subsection of it to effectively handle different types of noise (to a certain degree) and at the same time is able to detect different types of periodic patterns; combining these under one umbrella is by itself a challenge. In this paper, we present an algorithm which can detect symbol, sequence (partial), and segment (full cycle) periodicity in time series. The algorithm uses suffix tree as the underlying data structure; this allows us to design the algorithm such that its worst case complexity is Oðk: n2Þ, where k is the maximum length of periodic pattern and n is the length of the analyzed portion (whole or subsection) of the time series. The algorithm is noise resilient; it has been successfully demonstrated to work with replacement,
Insertion, deletion, or a mixture of these types of noise. We have tested the proposed algorithm on both synthetic and real data from different domains, including protein sequences. The conducted comparative study demonstrates the applicability and effectiveness of the proposed algorithm; it is generally more time-efficient and noise-resilient than existing algorithms.


Existing system

Three types of periodic patterns can be detected in a time series
 1) Symbol periodicity,
 2) Sequence periodicity or partial periodic patterns,
 3) Segment or full-cycle periodicity.

The problem is not trivial and none of the approaches described in the literature is comprehensive enough to handle all the related aspects. In other words, we argue the need to develop a noise-resilient algorithm that could tackle the problem well by being capable of:
1) Identifying the three different types of periodic patterns,
2) Handling asynchronous periodicity by locating periodic patterns that may drift from their expected positions up to an allowable limit,
3) Investigating periodic patterns in the whole time series as well as in a subsection of the time series. Realizing the significance of this problem, we     incrementally developed an algorithm
that covers all of these aspects.

Proposed System
          The algorithm is noise-resilient and works with replacement, insertion, deletion, or any mixture of these types of noise .It can also detect periodicity within a subsection of a time series and applies various redundant period pruning techniques to output a small number of useful periods by removing most of the redundant periods. The algorithm looks for all periods starting from all positions which have confidence greater than or equal to the user-provided
Periodicity threshold. The worst time complexity of the algorithm is Oðk: n2Þ. Finally, the different aspects of the algorithm, its applicability, and effectiveness have been demonstrated using both real and synthetic data.
          Our algorithm is linear distance- based, where we take the difference between any two successive occurrence vector elements leading to another vector called the difference vector. It is important to understand that we actually do not keep any such vector in the memory but this is considered only for the sake of explanation. The algorithm requires only a single scan of the data in
Order to construct the suffix tree; and the suffix tree is traversed only once to find all periodic patterns in the time series. An additional traversal of the suffix tree might be performed to sort the edges by the size of their occurrence vectors, but this is not among the necessary requirements of the algorithm.

No comments:

Post a Comment