The World as a Data Community

2021/01/27 by

A team of computer scientists headed by Professor Carsten Binnig is developing a trustable platform for data sharing. The aim is to open up completely new possibilities for cooperation in the area of big data and AI.

When a patient goes to see the doctor, he doesn’t necessarily want to share the data that is collected by the doctor with any third parties. If the patient is suffering from a serious illness, however, they might be happy to share their data for medical research purposes but this is not easy to do due to data protection regulations. The conflict between data protection and data sharing does not only affect the medical sector: Shared data could help to make the production process in the industrial sector more efficient, cheaper, quicker and environmentally friendly. Data could help politicians make better decisions, while it is also essential for uncovering credit card fraud and money laundering in the financial sector. However, data protection regulations make data sharing a daunting task not to speak of the lack of a technical infrastructure that would make it possible to share data in compliance with these regulations.

And there is a huge amount of data available: It is anticipated that the amount of data produced worldwide will increase to 175 zettabytes by 2025. Although this data could be useful for the larger community, much of it is often lost because companies keep it for themselves. The EU wants to make data sharing easier but the technologies required do not yet exist. And this is the problem that the researchers headed by Professor Carsten Binnig from the Data Management Lab at TU Darmstadt are working on. They are developing TrustDBle (pronounced “‘trusta-ble”) – a new platform that will enable trustworthy and uncomplicated data sharing. The research is being conducted as part of the National Research Center for Applied Cybersecurity ATHENE, an alliance between TU Darmstadt, the Darmstadt University of Applied Sciences and the Fraunhofer Institutes SIT and IGD. It is the largest alliance between research institutes in the area of cybersecurity in Europe.

There is a huge need for technology such as TrustDBle. “We are currently experiencing a paradigm shift in the economy”, says Binnig. Companies have focussed up to now on using their data just for themselves. In the automotive sector, for example, manufacturers and suppliers have developed their own information systems. Although the individual information systems are often connected via interfaces, this only allows very limited data sharing capabilities and requires huge efforts to connect the individual systems. Today, more or less each company still manages only their own data in silos and does not yet leverage the potential of sharing data between companies.

“I am convinced that data sharing will have huge benefits for the economy and society.”

However, industry recently realised that it is beneficial to share data across organizational boundaries”, says Binnig. “It allows them to optimise business processes and make them more transparent, as well as to better push forward many applications in the area of AI where there is simply a lack of data, or even to make such applications possible at all.” But why is it so difficult to share data? “There is whole series of challenges when sharing data”, says the researcher from TU Darmstadt. “On the one hand, there are a multitude of laws such as the EU General Data Protection Regulation that stipulate how the data has to be handled. In addition, there are internal company policies that also regulate which data can be shared with whom and where.”

A good example is in a hospital: Although the hospital is permitted to collect data about patients, the volumes of data are very small overall and insufficient for training AI models. AI usually requires lots of example data to reliably learn patterns. For instance, in order to detect skin cancer on images, the AI ideally has to be provided with hundreds of thousands or preferably millions of images of skin cancer and also of healthy skin – it is only in this way that the AI will reliably be able to differentiate between the two on its own.

Read more: hoch3 FORSCHEN Science Quarterly, 4/2020

Current publications on the topic

  • El-Hindi, Muhammad; Karrer, Simon; Doci, Gloria; Binnig, Carsten: TrustD-Ble: Towards Trustable Shared Databases. In: FAB 2020.
  • El-Hindi, Muhammad; Heyden, Martin; Binnig, Carsten; Ramamurthy, Ravi; Arasu, Arvind; Kossmann: BlockchainDB – Towards a Shared Database on Blockchains. In: SIGMOD 2019.
  • El-Hindi, Muhammad; Binnig, Carsten; Arasu, Arvind; Kossmann, Donald; Ramamurthy, Ravi: BlockchainDB – A Shared Database on Blockchains. In: PVLDB 2019.

Background

Computer science professors Carsten Binnig und Sebastian Faust lead the ATHENE mission TRUDATA. The aim is to develop new technologies that will enable the trustable, reliable and autonomous sharing of data, which is relevant for many different applications in the health, manufacturing and financial sectors.