Trustworthy Machine Learning
We study security, privacy, transparency, and fairness in the machine learning pipeline.
Publications
DeResistor: Toward Detection-Resistant Probing for Evasion of Internet Censorship
The arms race between Internet freedom advocates and censors has catalyzed the emergence of sophisticated blocking techniques and directed significant research emphasis toward the development of automated censorship measurement and In this paper, we however discover that probing censorship middleboxes using current evasion tools can be easily detected by censors, necessitating detection-resilient probing tools. We first develop a real-time detection approach that utilizes Machine Learning (ML) to detect flow-level packet-manipulation traces and an algorithm for IP-level detection based on Threshold Random Walk (TRW). We take the first steps toward detection-resilient censorship evasion by presenting DeResistor, a system that facilitates detection-resilient probing for packet-manipulation-based censorship evasion. DeResistor aims to defuse detection logic employed by censors by performing detection-guided pausing of censorship evasion attempts and interleaving them with normal user-driven network activity. We evaluate our techniques by leveraging Geneva, a state-of-the-art evasion strategy generator, and validate them against 11 simulated censors supplied by Geneva, while also testing them against real-world censors (i.e., China’s Great Firewall (GFW) and India). From an adversarial perspective, our proposed real-time detection method can quickly detect clients that attempt to probe censorship middleboxes with manipulated packets after inspecting only two probing flows. From a defense perspective, DeResistor is effective at shielding Geneva training from detection while enabling it to narrow the search space to produce less detectable traffic. Importantly, censorship evasion strategies generated using DeResistor can attain a high success rate from different vantage points against the GFW (up to 98% ) and 100% in India. Finally, we discuss detection countermeasures and extensibility of our approach to other censor-probing-based tools.
MIAShield: Defending Membership Inference Attacks via Preemptive Exclusion of Members
In membership inference attacks (MIAs), an adversary observes the predictions of a model to determine whether a sample is part of the model’s training data. Existing MIA defenses conceal the presence of a target sample through strong regularization, knowledge distillation, confidence masking, or differential privacy. We propose MIAShield, a new MIA defense based on preemptive exclusion of member samples instead of masking the presence of a member. MIAShield departs from prior defenses in that it weakens the strong membership signal that stems from the presence of a target sample by preemptively excluding it at prediction time without compromising model utility. To that end, we design and evaluate a suite of preemptive exclusion oracles leveraging model confidence, exact/approximate sample signature, and learning-based exclusion of member data points. To be practical, MIAShield splits a training data into disjoint subsets and trains each subset to build an ensemble of models. The disjointedness of subsets ensures that a target sample belongs to only one subset, which isolates the sample to facilitate the preemptive exclusion goal. We evaluate MIAShield on three benchmark image classification datasets. We show that MIAShield effectively mitigates membership inference (near random guess) for a wide range of MIAs; achieves far better privacy-utility trade-off compared with state-of-the-art defenses; and remains resilient in the face of adaptive attacks.
Adversarial Detection of Censorship Measurements
The arms race between Internet freedom technologists and censoring regimes has catalyzed the deployment of more sophisticated censoring techniques and directed significant research emphasis toward the development of automated tools for censorship measurement and evasion. We highlight Geneva as one of the recent advances in this area. By training genetic algorithm such as Geneva inside a censored region, we can automatically find novel packet-manipulation-based censorship evasion strategies. In this paper, we explore the resilience of Geneva in the face of censors that actively detect and react to Geneva’s measurements. Specifically, we develop machine learning (ML)-based classifiers and leverage a popular hypothesis-testing algorithm that can be deployed at the censor to detect Geneva clients within two to seven flows, i.e., far before Geneva finds any working evasion strategy. We further use public packet-capture traces to show that Geneva flows can be easily distinguished from normal flows and other malicious flows (e.g.,network forensics, malware). Finally, we discuss some potential research directions to mitigate Geneva’s detection.
DP-UTIL: Comprehensive Utility Analysis of Differential Privacy in Machine Learning
Differential Privacy (DP) has emerged as a rigorous formalism to reason about quantifiable privacy leakage. In machine learning (ML), DP has been employed to limit inference/disclosure of training examples. Prior work leveraged DP across the ML pipeline, albeit in isolation, often focusing on mechanisms such as gradient perturbation. In this paper, we present, DP-UTIL, a holistic utility analysis framework of DP across the ML pipeline with focus on input perturbation, objective perturbation, gradient perturbation, output perturbation, and prediction perturbation. Given an ML task on privacy-sensitive data, DP-UTIL enables a ML privacy practitioner perform holistic comparative analysis on the impact of DP in these five perturbation spots, measured in terms of model utility loss, privacy leakage, and the number of truly revealed training samples. We evaluate DP-UTIL over classification tasks on vision, medical, and financial datasets, using two representative learning algorithms (logistic regression and deep neural network) against membership inference attack as a case study attack. One of the highlights of our results is that prediction perturbation consistently achieves the lowest utility loss on all models across all datasets. In logistic regression models, objective perturbation results in lowest privacy leakage compared to other perturbation techniques. For deep neural networks, gradient perturbation results in lowest privacy leakage. Moreover, our results on true revealed records suggest that as privacy leakage increases a differentially private model reveals more number of member samples. Overall, our findings suggest that to make informed decisions as to which perturbation mechanism to use, a ML privacy practitioner needs to examine the dynamics between optimization techniques (convex vs. non-convex), perturbation mechanisms, number of classes, and privacy budget.
EG-Booster: Explanation-Guided Booster of ML Evasion Attacks
The widespread usage of machine learning (ML) in a myriad of domains has raised questions about its trustworthiness in securitycritical environments. Part of the quest for trustworthy ML is robustness evaluation of ML models to test-time adversarial examples. Inline with the trustworthy ML goal, a useful input to potentially aid robustness evaluation is feature-based explanations of model predictions. In this paper, we present a novel approach called EG-Booster that leverages techniques from explainable ML to guide adversarial example crafting for improved robustness evaluation of ML models before deploying them in security-critical settings. The key insight in EG-Booster is the use of feature-based explanations of model predictions to guide adversarial example crafting by adding consequential perturbations likely to result in model evasion and avoiding non-consequential ones unlikely to contribute to evasion. EG-Booster is agnostic to model architecture, threat model, and supports diverse distance metrics used previously in the literature. We evaluate EG-Booster using image classification benchmark datasets, MNIST and CIFAR10. Our findings suggest that EG-Booster significantly improves evasion rate of state-of-the-art attacks while performing less number of perturbations. Through extensive experiments that covers four white-box and three black-box attacks, we demonstrate the effectiveness of EG-Booster against two undefended neural networks trained on MNIST and CIFAR10, and another adversarially-trained ResNet model trained on CIFAR10. Furthermore, we introduce a stability assessment metric and evaluate the reliability of our explanation-based approach by observing the similarity between the model’s classification outputs across multiple runs of EG-Booster. Our results across 10 different runs suggest that EG-Booster’s output is stable across distinct runs. Combined with state-of-the-art attacks, we hope EG-Booster will be used as a benchmark in future work that evaluate robustness of ML models to evasion attacks.
Morphence: Moving Target Defense Against Adversarial Examples
Robustness to adversarial examples of machine learning models remains an open topic of research. Attacks often succeed by repeatedly probing a fixed target model with adversarial examples purposely crafted to fool it. In this paper, we introduce Morphence, an approach that shifts the defense landscape by making a model a moving target against adversarial examples. By regularly moving the decision function of a model, Morphence makes it significantly challenging for repeated or correlated attacks to succeed. Morphence deploys a pool of models generated from a base model in a manner that introduces sufficient randomness when it responds to prediction queries. To ensure repeated or correlated attacks fail, the deployed pool of models automatically expires after a query budget is reached and the model pool is seamlessly replaced by a new model pool generated in advance. We evaluate Morphence on two benchmark image classification datasets (MNIST and CIFAR10) against five reference attacks (2 white-box and 3 black-box). In all cases, Morphence consistently outperforms the thus-far effective defense, adversarial training, even in the face of strong white-box attacks, while preserving accuracy on clean data.
Making Machine Learning Trustworthy
Machine learning (ML) has advanced dramatically during the past decade and continues to achieve impressive human-level performance on nontrivial tasks in image, speech, and text recognition. It is increasingly powering many high-stake application domains such as autonomous vehicles, self–mission-fulfilling drones, intrusion detection, medical image classification, and financial predictions. However, ML must make several advances before it can be deployed with confidence in domains where it directly affects humans at training and operation, in which cases security, privacy, safety, and fairness are all essential considerations.
Explanation-Guided Diagnosis of Machine Learning Evasion Attacks
Machine Learning (ML) models are susceptible to evasion attacks. Evasion accuracy is typically assessed using aggregate evasion rate, and it is an open question whether aggregate evasion rate enables feature-level diagnosis on the effect of adversarial perturbations on evasive predictions. In this paper, we introduce a novel framework that harnesses explainable ML methods to guide high-fidelity assessment of ML evasion attacks. Our framework enables explanation-guided correlation analysis between pre-evasion perturbations and post-evasion explanations. Towards systematic assessment of ML evasion attacks, we propose and evaluate a novel suite of model-agnostic metrics for sample-level and dataset-level correlation analysis. Using malware and image classifiers, we conduct comprehensive evaluations across diverse model architectures and complementary feature representations. Our explanation-guided correlation analysis reveals correlation gaps between evasive adversarial samples and the corresponding perturbations performed on them. Using a case study on explanation-guided evasion, we show the broader usage of our methodology for assessing robustness of ML models.
PRICURE: Privacy-Preserving Collaborative Inference in a Multi-Party Setting
When multiple parties that deal with private data aim for a collaborative prediction task such as medical image classification, they are often constrained by data protection regulations and lack of trust among collaborating parties. If done in a privacy-preserving manner, predictive analytics can benefit from the collective prediction capability of multiple parties holding complementary datasets on the same machine learning task. This paper presents PRICURE, a system that combines complementary strengths of secure multi-party computation (SMPC) and differential privacy (DP) to enable privacy-preserving collaborative prediction among multiple model owners. SMPC enables secret-sharing of private models and client inputs with non-colluding secure servers to compute predictions without leaking model parameters and inputs. DP masks true prediction results via noisy aggregation so as to deter a semi-honest client who may mount membership inference attacks. We evaluate PRICURE on neural networks across four datasets including benchmark medical image classification datasets. Our results suggest PRICURE guarantees privacy for tens of model owners and clients with acceptable accuracy loss. We also show that DP reduces membership inference attack exposure without hurting accuracy.
Best-Effort Adversarial Approximation of Black-Box Malware Classifiers
An adversary who aims to steal a black-box model repeatedly queries the model via a prediction API to learn a function that approximates its decision boundary. Adversarial approximation is non-trivial because of the enormous combinations of model architectures, parameters, and features to explore. In this context, the adversary resorts to a best-effort strategy that yields the closest approximation. This paper explores best-effort adversarial approximation of a black-box malware classifier in the most challenging setting, where the adversary’s knowledge is limited to a prediction label for a given input. Beginning with a limited input set for the black-box classifier, we leverage feature representation mapping and cross-domain transferability to approximate a black-box malware classifier by locally training a substitute. Our approach approximates the target model with different feature types for the target and the substitute model while also using non-overlapping data for training the target, training the substitute, and the comparison of the two. We evaluate the effectiveness of our approach against two black-box classifiers trained on Windows Portable Executables (PEs). Against a Convolutional Neural Network (CNN) trained on raw byte sequences of PEs, our approach achieves a 92% accurate substitute (trained on pixel representations of PEs), and nearly 90% prediction agreement between the target and the substitute model. Against a 97.8% accurate gradient boosted decision tree trained on static PE features, our 91% accurate substitute agrees with the black-box on 90% of predictions, suggesting the strength of our purely black-box approximation.