On the data flow masquerading problem
Le, Anh Ngoc
MetadataShow full item record
The intense debate surrounding network neutrality gives rise to a very natural technical question: "how to design network applications which can evade the ISPs discriminatory practices?" For example, if an Internet service provider (ISP) discriminates against voice-over-IP traffic, how do we design voice-over-IP applications which still work effectively. The question posed as such is too general to be tractable. At the very least, we need to formally define the scope of an ISPs discriminatory practices, the scope of an applications capability, and what we mean by "work effectively." The question is thus a family of intriguing technical problems. Very briefly, this dissertation formulates and addresses several problems in the family, where the ISP uses any statistical machine learning (SML) packet classifier for discrimination, the application has access to a connection with bounded bandwidth provided by the ISP, and the term "work effectively" refers to two types of Quality of Services (total throughput and worst-case instantaneous throughput). We chose to limit the ISPs capability to SML-based classifiers for three reasons: (1) other content-based classifiers are either easier to evade or impractical when the content is encrypted, (2) SML-based classifiers alone are already quite effective for traffic discrimination (as demonstrated in other published studies and re-confirmed in this dissertation), and (3) evading an SML-engine is a central question in the emerging area of the security of machine learning. Our main contributions are as follows: (1) we perform extensive experiments confirming that SML-based classifiers can be very accurate for traffic discrimination, even under stringent practical constraints, (2) we precisely formulate the above packet classifier evasion problem which is named the DATA-FLOW MASQUERADING problem, (3) we prove several NP-hardness results for the problems of maximizing quality-of-service while evading the SML-based classifiers, (4) we present several efficient algorithms which show that one can in principle evade SML-based packet classifiers while maintaining a high level of QoS. In effect, our results suggest an end-to-end (E2E) solution to the traffic discrimination problem, and show the potential limitation and (in)security of SML-based packet filters.