Privacy preserving distributed machine learning with federated learning
DISTPAB alleviates computational bottlenecks by distributing the task of privacy preservation utilizing the asymmetry of resources of a distributed environment, which can have resource-constrained devices as well as high-performance computers.
Abstract
Edge computing and distributed machine learning have advanced to a level that\ncan revolutionize a particular organization. Distributed devices such as the\nInternet of Things (IoT) often produce a large amount of data, eventually\nresulting in big data that can be vital in uncovering hidden patterns, and\nother insights in numerous fields such as healthcare, banking, and policing.\nData related to areas such as healthcare and banking can contain potentially\nsensitive data that can become public if they are not appropriately sanitized.\nFederated learning (FedML) is a recently developed distributed machine learning\n(DML) approach that tries to preserve privacy by bringing the learning of an ML\nmodel to data owners'. However, literature shows different attack methods such\nas membership inference that exploit the vulnerabilities of ML models as well\nas the coordinating servers to retrieve private data. Hence, FedML needs\nadditional measures to guarantee data privacy. Furthermore, big data often\nrequires more resources than available in a standard computer. This paper\naddresses these issues by proposing a distributed perturbation algorithm named\nas DISTPAB, for privacy preservation of horizontally partitioned data. DISTPAB\nalleviates computational bottlenecks by distributing the task of privacy\npreservation utilizing the asymmetry of resources of a distributed environment,\nwhich can have resource-constrained devices as well as high-performance\ncomputers. Experiments show that DISTPAB provides high accuracy, high\nefficiency, high scalability, and high attack resistance. Further experiments\non privacy-preserving FedML show that DISTPAB is an excellent solution to stop\nprivacy leaks in DML while preserving high data utility.\n