COLLABORATIVE FILTERING WITH PERSONALIZED SKYLINES
Abstract
Collaborative filtering (CF) systems exploit previous ratings and similarity in user behavior to recommend the top-k objects/records which are potentially most interesting to the user assuming a single score per object. However, in various applications, a record (e.g., hotel) maybe rated on several attributes (value, service, etc.), in which case simply returning the ones with the highest overall scores fails to capture the individual attribute characteristics and to accommodate different selection criteria. In order to enhance the flexibility of CF, we propose Collaborative Filtering Skyline (CFS), a general framework that combines the advantages of CF with those of the skyline operator. CFS generates a personalized skyline for each user based on scores of other users with similar behavior. The personalized skyline includes objects that are good on certain aspects, and eliminates the ones that are not interesting on any attribute combination. Although the integration of skylines and CF has several attractive properties, it also involves rather expensive computations. We face this challenge through a comprehensive set of algorithms and optimizations that reduce the cost of generating personalized skylines. In addition to exact skyline processing, we develop an approximate method that provides error guarantees. Finally, we propose the top-k personalized skyline, where the user specifies the required output cardinality.
Existing System
The existence of multiple attributes induces the need to distinguish the concepts of scoring patterns and selection criteria. For instance, if two users unable to have visited the same set of hotels and have given identical scores on all dimensions, their scoring patterns are indistinguishable. On the other hand, they may have different selection criteria; e.g., service maybe very important to business traveler um, whereas un is more interested in good value for selecting a
hotel for her/his vacation. A typical CF system cannot differentiate between the two users, and based on their identical scoring patterns would likely make the same recommendations to both. To overcome this problem, the system could ask each user for an explicit preference function that weighs all attributes according to her/his choice criteria, and produces a single score per hotel. Su h a function would set apart um and un, but would also incur information loss due to the replacement of individual ratings (on each dimension) with a single value. For instance, two (overall) scores (by two distinct users) for a hotel maybe the same,
even though the ratings on every attribute are rather different. Furthermore, in practice, casual users may not have a clear idea about the relative importance of the various attributes. Even if they do, it may be difficult to express it using a mathematical formula. Finally, their selection criteria may change over time depending on the purpose of the travel.
Skyline Processing
We assume records with attributes, each taking values from a totally ordered domain. Accordingly, a record can be represented as a point in the d-dimensional space (in the sequel, we use the terms record, point, and object
Interchangeably). The skyline contains the best points according to any function that is monotonic on each attribute. Conversely, for each skyline record r, there is such a function that would assign it the highest score. These attractive properties of skylines have led to their application in various domains including multi objective optimization maximum vectors and the contour problem. They introduced the skyline operator to the database literature and proposed two disk-based algorithms for large data sets. The first, called D&C (for divide and conquer) divides the data set into partitions that fit in memory, computes the partial skyline in every partition, and generates the final skyline by merging the partial ones. The second algorithm, called BNL, applies the concept of block-nested loops. It improves BNL by sorting the data. Other variants of all these methods do not use any indexing and, usually, they have to scan the entire data set before reporting any skyline point. Another set of algorithms utilizes conventional or multidimensional indexes to speed up query processing and progressively report skyline points. Such methods include Bitmap, In addition to conventional databases; skyline processing has been studied in other scenarios. For instance, Morseet al. uses spatial access methods to maintain the skyline in streams with explicit deletions. Efficient skyline maintenance has also been the focus of in distributed environments; several methods query independent subsystems, each in charge of a specific attribute, and compute the skylines using the partial results. In the data mining context, identify the combinations
No comments:
Post a Comment