Swarm Learning for decentralized and confidential clinical machine learning

Jhoshitha N A
4 min readMay 27, 2021

The Swarm Intelligence (SI) algorithms have been proved to be a comprehensive method to solve complex optimization problems by simulating the emergence behaviors of biological swarms. Nowadays, data science is getting more and more attention, which needs quick management and analysis of massive data.

AI with swarm intelligence learns to detect cancer, lung diseases and COVID-19

“Swarm learning” — an international research team has trained AI algorithms to detect blood cancer, lung diseases and COVID-19 in data stored during a decentralized fashion. Swarm learning could thus significantly promote and accelerate collaboration and information exchange in research, especially in the field of medicine. To illustrate the feasibility of using Swarm Learning to develop disease classifiers using distributed data, four use cases of heterogeneous diseases (COVID-19, tuberculosis, leukaemia and lung pathologies) were taken. With more than 16,400 blood transcriptomes derived from 127 clinical studies with non-uniform distributions of cases and controls and substantial study biases, as well as more than 95,000 chest X-ray images, we show that Swarm Learning classifiers outperform those developed at individual sites. In addition, Swarm Learning completely fulfils local confidentiality regulations by design. We believe that this approach will notably accelerate the introduction of precision medicine.

SWARM LEARNING IN IDENTIFICATION OF COVID-19

We addressed whether SL could be used to detect individuals with COVID-19. Although COVID-19 is usually detected by using PCR-based assays to detect viral RNA, assessing the specific host response in addition to disease prediction might be beneficial in situations for which the pathogen is unknown, specific pathogen tests are not yet possible, existing tests might produce false negative results, and blood transcriptomics can contribute to the understanding of the host’s immune response.

In a first proof-of-principle study, we simulated an outbreak situation node with evenly distributed cases and controls at training nodes and test; this showed very high statistical performance parameters for SL and all nodes. Lowering the prevalence at test nodes reduced performance, but F1 scores deteriorated only when we reduced prevalence further (1:44 ratio); even under these conditions, SL performed best. When we reduced cases at training nodes, all performance measures remained very high at the test node for SL and individual nodes. When we tested outbreak scenarios with very few cases at test nodes and varying prevalence at the independent test node, nodes 2 and 3 showed decreased performance; SL outperformed these nodes and was equivalent to the central model. The model showed no sign of overfitting and comparable results were obtained when we increased the number of training nodes.

We recruited further medical centres in Europe that differed in controls and distributions of age, sex, and disease severity, which yielded eight individual centre-specific sub-datasets

In the first setting, centres E1–E6 teamed up and joined the Swarm network with 80% of their local data; 20% of each centre’s dataset was distributed to a test node and the model was also tested on two external datasets, one with convalescent COVID-19 cases (E7) and one of granulocyte-enriched COVID-19 samples (E8). SL outperformed all nodes in terms of area under the curve (AUC) for the prediction of the global test datasets. When looking at performance on testing samples split by centre of origin, it became clear that individual centre nodes could not have predicted samples from other centres. By contrast, SL predicted samples from these nodes successfully. This was similarly true when we reduced the scenario, using E1, E2, and E3 as training nodes and E4 as an independent test node.

In addition, SL can cope with biases such as sex distribution, age or co-infection bias and SL outperformed individual nodes when distinguishing mild from severe COVID-19. Collectively, we provide evidence that blood transcriptomes from COVID-19 patients represent a promising feature space for applying SL.

Login to GUVI to know more about Machine learning and Artificial Intelligence.

Also refer:

--

--