Wednesday, October 19, 2011

Traceback of DDoS Attacks Using Entropy Variations


Traceback of DDoS Attacks Using
Entropy Variations
Abstract:
          Distributed Denial-of-Service (DDoS) attacks are a critical threat to the Internet. However, the memoryless feature of the Internet routing mechanisms makes it extremely hard to trace back to the source of these attacks. As a result, there is no effective and efficient method to deal with this issue so far. In this paper, we propose a novel traceback method for DDoS attacks that is based on entropy variations between normal and DDoS attack traffic, which is fundamentally different from commonly used packet marking techniques. In comparison to the existing DDoS traceback methods, the proposed strategy possesses a number of advantages—it is memory nonintensive, efficiently scalable, robust against packet pollution, and independent of attack traffic patterns. The results of extensive experimental and simulation studies are presented to demonstrate the effectiveness and efficiency of the proposed method. Our experiments show that accurate traceback is possible within 20 seconds (approximately) in a large-scale attack network with thousands of zombies.
Existing system:

          Existing system  uses sampled traffic under non-attack conditions to build and maintains caches of the valid source addresses transiting network routers. Under attack conditions, route anomalies are detected by determining which routers have been used for unknown source addresses, in order to construct the attack graph.


Proposed system:
          In this proposed system we use entropy variations technique for traceback of DDoS attacks. The results of extensive experimental and simulation studies are presented to demonstrate the effectiveness and efficiency of the proposed method. Our experiments show that accurate traceback is possible within 20 seconds (approximately) in a large-scale attack network with thousands of zombies.
 The proposed strategy possesses a number of advantages,
          It is memory nonintensive,
           efficiently scalable,
           Robust against packet pollution,
          And independent of attack traffic patterns.
Algorithms
There are two algorithms used here.they are,
the local flow monitoring algorithm and the IP traceback algorithm

Technologies used:
Software requirmennts:
Front end:java

Data Leakage Detection


             Data Leakage Detection

ABSTRACT:

A data distributor has given sensitive data to a set of supposedly trusted agents (third parties). Some of the data is leaked and found in an unauthorized place (e.g., on the web or somebody’s laptop). The distributor must assess the likelihood that the leaked data came from one or more agents, as opposed to having been independently gathered by other means. We propose data allocation strategies (across the agents) that improve the probability of identifying leakages. These methods do not rely on alterations of the released data (e.g., watermarks). In some cases we can also inject “realistic but fake” data records to further improve our chances of detecting leakage and identifying the guilty party.

EXISTING SYSTEM:

Traditionally, leakage detection is handled by watermarking, e.g., a unique code is embedded in each distributed copy. If that copy is later discovered in the hands of an unauthorized party, the leaker can be identified. Watermarks can be very useful in some cases, but again, involve some modification of the original data. Furthermore, watermarks can sometimes be destroyed if the data recipient is malicious. E.g. A hospital may give patient records to researchers who will devise new treatments. Similarly, a company may have partnerships with other companies that require sharing customer data. Another enterprise may outsource its data processing, so data must be given to various other companies. We call the owner of the data the distributor and the supposedly trusted third parties the agents.


PROPOSED SYSTEM:

Our goal is to detect when the distributor’s sensitive data has been leaked by agents, and if possible to identify the agent that leaked the data. Perturbation is a very useful technique where the data is modified and made “less sensitive” before being handed to agents. we develop unobtrusive  techniques for detecting leakage of a set of objects or records.

In this section we develop a model for assessing the “guilt” of agents. We also present algorithms for distributing objects to agents, in a way that improves our chances of identifying a leaker. Finally, we also consider the option of adding “fake” objects to the distributed set. Such objects do not correspond to real entities but appear realistic to the agents. In a sense, the fake objects acts as a type of watermark for the entire set, without modifying any individual members. If it turns out an agent was given one or more fake objects that were leaked, then the distributor can be more confident that agent was guilty.
          
             

Problem Setup and Notation:

A distributor owns a set T={t1,…,tm}of valuable data objects. The distributor wants to share some of the objects with a set of agents U1,U2,…Un, but does not wish the objects be leaked to other third parties. The objects in T could be of any type and size, e.g., they could be tuples in a relation, or relations in a database. An agent Ui receives a subset of objects, determined either by a sample request or an explicit request:

1. Sample request
2. Explicit request

Guilt Model Analysis:

our model parameters interact and to check if the interactions match our intuition, in this section we study two simple scenarios as Impact of Probability p and Impact of Overlap between Ri and S. In each scenario we have a target that has obtained all the distributor’s objects, i.e., T = S.

Algorithms:
           
1. Evaluation of Explicit Data Request Algorithms

In the first place, the goal of these experiments was to see whether fake objects in the distributed data sets yield significant improvement in our chances of detecting a guilty agent. In the second place, we wanted to evaluate our e-optimal algorithm relative to a random allocation.
                     
2. Evaluation of Sample Data Request Algorithms

With sample data requests agents are not interested in particular objects. Hence, object sharing is not explicitly defined by their requests. The distributor is “forced” to allocate certain objects to multiple agents only if the number of requested objects exceeds the number of objects in set T. The more data objects the agents request in total, the more recipients on average an object has; and the more objects are shared among different agents, the more difficult it is to detect a guilty agent.


           
Hardware Required:

v System                       :           Pentium IV 2.4 GHz
v Hard Disk                 :           40 GB
v Floppy Drive            :           1.44 MB
v Monitor                     :           15 VGA colour
v Mouse                        :           Logitech.
v Keyboard                  :           110 keys enhanced.
v RAM                          :           256 MB

 

Software Required:

v O/S                             :           Windows XP.
v Language                  :           Asp.Net, c#.
v Data Base                 :           Sql Server 2005



                              

Classification Using Streaming Random Forests


Classification Using Streaming Random Forests
Abstract:
          We consider the problem of data stream classification, where the data arrive in a conceptually infinite stream, and the opportunity to examine each record is brief. We introduce a stream classification algorithm that is online, running in amortized Oð1Þ time, able to handle intermittent arrival of labeled records, and able to adjust its parameters to respond to changing class boundaries (“concept drift”) in the data stream. In addition, when blocks of labeled data are short, the algorithm is able to judge internally whether  the quality of models updated from them is good enough for deployment on unlabeled records, or whether further labeled records are required. Unlike most proposed stream-classification algorithms, multiple target classes can be handled. Experimental results on real and synthetic data show that accuracy is comparable to a conventional classification algorithm that sees all of the data at once and is able to make multiple passes over it.
Existing System:
          incremental classification algorithm which uses a multi-resolution data representation to find adaptive nearest neighbors of a test point.
Proposed system:
          a stream classification algorithm that is online, running in amortized time, able to handle intermittent arrival of labeled records, and able to adjust its parameters to respond to changing class boundaries (“concept drift”) in the data stream.
The Standard Random Forests Algorithm:
          The Random Forests algorithm is an ensemble classification technique developed by Breiman  As with any tree ensemble classifier, it grows a number of binary decision trees and predicts the class of each new record using the plurality of the class predictions from the set of trees. However, it differs from standard ensemble techniques in the way in which records are selected to grow each tree, and the way in which attributes are selected at each internal node. Suppose that a data set contains n records, each with m attributes. Each tree is grown by:
·        Choosing a subset of the n records, at random with replacement, to form a training set.
·        For each internal node, a subset of M (M _ m) randomly chosen attributes are selected, and the decision about which attribute and split point is made using the standard Gini index algorithm on only the selected attributes. A typical value of M (suggested by Breiman) is log2 m þ 1.
·        The tree is left unpruned.The random selection of records with replacement leaves some records (about a third of them) that are never used in the building of this tree. These records can be used to estimate the error rate of each tree, even as it is being built. This process is repeated with a fresh subset of the records to produce a user-specified number of trees. The choice of attribute and split point at each internal node of each tree is made in a doubly contextualized way—it depends on the other records that were chosen to build this tree, and on the other attributes that were chosen to build this internal node. This property makes overlearning  impossible and the predictions of the ensemble extremely robust.

A Dual Framework and Algorithms for Targeted Online Data Deliver


                                           A Dual Framework and Algorithms for Targeted
                                                            Online Data Deliver
Abstract:
          A variety of emerging online data delivery applications challenge existing techniques for data delivery to human users, applications, or middleware that are accessing data from multiple autonomous servers. In this paper, we develop a framework for formalizing and comparing pull-based solutions and present dual optimization approaches. The first approach, most commonly used nowadays, maximizes user utility under the strict setting of meeting a priori constraints on the usage of system resources. We present an alternative and more flexible approach that maximizes user utility by satisfying all users. It does this while minimizing the usage of system resources. We discuss the benefits of this latter approach and develop an adaptive monitoring solution Satisfy User Profiles
(SUPs). Through formal analysis, we identify sufficient optimality conditions for SUP. Using real (RSS feeds) and synthetic traces, we empirically analyze the behavior of SUP under varying conditions. Our experiments show that we can achieve a high degree of satisfaction of user utility when the estimations of SUP closely estimate the real event stream, and has the potential to save a significant amount of system resources. We further show that SUP can exploit feedback to improve user utility with only a moderate increase in resource utilization.
Existing system:
          Data delivery to human users, applications, or middleware that are accessing data from multiple autonomous servers.
Disadvantage of Existing System:
o   Maximizes user utility under the strict setting of meeting a priori constraints on the usage of system resources.
Proposed system:
          A framework for formalizing and comparing pull-based solutions and present dual optimization approaches.

Advantages of proposed System:
o   Maximizes user utility by satisfying all users.
o   Improve user utility with only a moderate increase in resource utilization.

Algorithms:

The sup algorithm: