conference

Fairify: Fairness Verification of Neural Networks

We proposed Fairify, an approach to make individual fairness verification tractable for the developers. The key idea is that many neurons in the NN always remain inactive when a smaller part of the input domain is considered. So, Fairify leverages white-box access to the models in production and then apply formal analysis based pruning.

Farhad Hossain, Hridesh Rajan

Fairify: Fairness Verification of Neural Networks

Towards Understanding Fairness and its Composition in Ensemble Machine Learning

We comprehensively study popular real-world ensembles: bagging, boosting, stacking and voting. We have developed a benchmark of 168 ensemble models collected from Kaggle on four popular fairness datasets. We use existing fairness metrics to understand the composition of fairness. Our results show that ensembles can be designed to be fairer without using mitigation techniques. We also identify the interplay between fairness composition and data characteristics to guide fair ensemble design.

Usman Gohar, Farhad Hossain, Hridesh Rajan

Towards Understanding Fairness and its Composition in Ensemble Machine Learning

Fix Fairness, Don't Ruin Accuracy: Performance Aware Fairness Repair using AutoML

Our approach includes two key innovations: a novel optimization function and a fairness-aware search space. By improving the default optimization function of AutoML and incorporating fairness objectives, we are able to mitigate bias with little to no loss of accuracy. Additionally, we propose a fairness-aware search space pruning method for AutoML to reduce computational cost and repair time.

Giang Nguyen, Farhad Hossain, Hridesh Rajan

23 Shades of Self-Admitted Technical Debt: An Empirical Study on Machine Learning Software

We provided a comprehensive taxonomy of machine learning SATDs. Our study analyzes ML SATD type organizations, their frequencies within stages of ML software, the differences between ML SATDs in applications and tools, and the effort of ML SATD removals. The findings discovered suggest implications for ML developers and researchers to create maintainable ML systems.

David OBrien, Farhad Hossain, Sayem Imtiaz, Rabe Abdalkareem, Emad Shihab, Hridesh Rajan

The Art and Practice of Data Science Pipelines: A Comprehensive Study of Data Science Pipelines In Theory, In-The-Small, and In-The-Large

This work attempts to inform the terminology and practice for designing data science (DS) pipeline. Our investigation suggest that DS pipeline is a well used software architecture but often built in ad hoc manner. We demonstrated the importance of standardization and analysis framework for DS pipeline following the traditional software engineering research on software architecture and design patterns. We also contributed three representations of DS pipelines that capture the essence of our subjects in theory, in-the-small, and in-the-large that would facilitate building new DS systems.

Farhad Hossain, Mohammad Wardat, Hridesh Rajan

The Art and Practice of Data Science Pipelines: A Comprehensive Study of Data Science Pipelines In Theory, In-The-Small, and In-The-Large

Fair Preprocessing: Towards Understanding Compositional Fairness of Data Transformers in Machine Learning Pipeline

We introduced the causal method of fairness to reason about the fairness impact of data preprocessing stages in ML pipeline. We leveraged existing metrics to define the fairness measures of the stages. Then we conducted a detailed fairness evaluation of the preprocessing stages in 37 pipelines collected from three different sources.

Farhad Hossain, Hridesh Rajan

Fair Preprocessing: Towards Understanding Compositional Fairness of Data Transformers in Machine Learning Pipeline

Do the Machine Learning Models on a Crowd Sourced Platform Exhibit Bias? An Empirical Study on Model Fairness

We have focused on the empirical evaluation of fairness and mitigations on real-world machine learning models. We have created a benchmark of 40 top-rated models from Kaggle used for 5 different tasks, and then using a comprehensive set of fairness metrics, evaluated their fairness. Then, we have applied 7 mitigation techniques on these models and analyzed the fairness, mitigation results, and impacts on performance.

Farhad Hossain, Hridesh Rajan

Do the Machine Learning Models on a Crowd Sourced Platform Exhibit Bias? An Empirical Study on Model Fairness

Boa Meets Python: A Boa Dataset of Data Science Software in Python Language

The popularity of Python programming language has surged in recent years due to its increasing usage in Data Science. The availability of Python repositories in Github presents an opportunity for mining software repository research, e.g., suggesting the best practices in developing Data Science applications, identifying bug-patterns, recommending code enhancements, etc. To enable this research, we have created a new dataset that includes 1,558 mature Github projects that develop Python software for Data Science tasks.

Farhad Hossain, Md Johirul Islam, Yijia Huang, Hridesh Rajan

Boa Meets Python: A Boa Dataset of Data Science Software in Python Language

A Secure Data Security Infrastructure for Small Organization in Cloud Computing

This paper shows a concern on the security element in cloud environment for small business addressing their shortcomings and finding solutions for it. Measured security features have been implemented by developing a secured data encryption, exchange and decryption infrastructure resulting in a data security model.

Manan B T Noor, Farhad Hossain

A Secure Data Security Infrastructure for Small Organization in Cloud Computing

Applying Ant Colony Optimization in Software testing to Generate Prioritized Optimal Path and Test Data

Ant colony optimization (ACO) based algorithm has been proposed which will generate set of optimal paths and prioritize the paths. Additionally, the approach generates test data sequence within the domain to use as inputs of the generated paths. Proposed approach guarantees full software coverage with minimum redundancy.

Farhad Hossain, M Shamim Kaiser, Shamin Al Mamun

Applying Ant Colony Optimization in Software testing to Generate Prioritized Optimal Path and Test Data