PRE-PRINT ”Elgar Encyclopedia of Law and Data Science”

Elgar Encyclopedia of Law and Data Science

Edited by Giovanni Comandé, Professor of Law, Sant’Anna School of Advanced Studies (www.santannapisa.it) and Coordinator, LIDER-Lab (www.lider-lab.it)

Publication Date: February 2022

For further info

 

SELECTION for 4 MARIE CURIE EARLY-STAGE RESEARCHERS (ESR) positions at Scuola Superiore Sant’Anna, Italy, funded in the framework of the “Legality Attentive Data Scientists (LeADS) Project” (Grant Agreement n. 956562) – RANKING LIST

ranking list_LeADS

Admission to the Programme
In case of equal score the youngest candidate prevails.

Solving the conflicts between data owners and data exploiters through a spectrum of quasi-property models

The European Union keeps moving forward with its plans for a regulatory framework to guide the data economy development and foster data-driven innovations for further economic and societal growth.[1] The use of and access to data plays a key role in this context, and different actors can have different priorities. In particular, individuals and companies both have an interesting in enjoying a degree of control over the information used to fuel these data-driven innovations: individuals because the use of data related to them might affect them, and companies – and other controllers – because they might wish to generate economic and societal development by processing data.

 

This reopens and further develops a question to which no single uniform answer has been found yet: what exactly is data, whom it belongs to, and what legal relationship is there between the subject and the data? The answers to these questions are extremely relevant, in particular where the data economy has moved as far as using personal data as consideration for digital services.[2] Seeking answers from a legal perspective can be troublesome as there are different regulations, even in the European context, that provide different and, in some cases, contradictory solutions.

 

The question is particularly timely as a proposal for a Data Act should be published soon by the European Commission. While the exact content of the Data Act is still unknown as the proposal from the European Commission hasn’t been published, this piece of legislation is intended to address a considerable number of issues surrounding the data economy and the possibility of data ownership.

 

Currently, from a legal point of view, there are different notions of what data means exactly. Often, we tend to defer to the General Data Protection Regulation (GDPR- and the realm of data protection regulations to answer this where data is associated with an individual and known as ‘personal data’.[3] We can also find ‘non-personal’ data where it is not related to a natural person, as in the case of the Free Flow Regulation.[4]However, this doesn’t stop here but in upcoming legislation, such as the Data Governance Act, we can also find general wider notions for data.[5] As such, in different situations, we might be confronted by a particular regulatory framework that deals with a set of situations. Consequently, a comprehensive and systematic view is necessary to tackle this first question in a holistic manner.

 

On the questions of whom it belongs to – if it belongs as such to anybody – and what legal bond is there between them and the data, the literature has discussed different approaches, has tried for quite some time to find a balance between the interests of the involved stakeholders. When it comes to companies, the database sui generis right, trade secrets, or copyright were seen as the potential solutions for it.[6] On the other hand, the legal literature dealing with ‘ownership’ of data by individuals, while a tempting solution, [7] is besieged by the fact that personal data is also safeguarded as a fundamental right.[8] In this sense, it was pointed out that people would not own personal data but rather control access to it via the notice and consent scheme and/or the general data protection framework, including the exercise of associated data rights, even on a collective basis.[9]

 

This latter scenario, a more active data rights exercise approach, is finding an echo in recent technological developments, such as decentralized identity management systems.[10] Until now, companies acted as data controllers and oversaw every single activity related to the data processing, from the collection of the personal data until its destruction going through its usage and possible sharing. Decentralized identity management systems, such as self-sovereign identities or personal data stores, allow for further control by data subjects themselves rather than having to file a request before a data controller and wait for an answer.[11] In this sense, data controllers do not select which data are they going to collect but rather have to accommodate the data that individuals create and make available for use.

 

This difference in the existing approaches for answering our initial question shows that there might be tensions between the involved stakeholders as their rights on the same object are different and, in some cases, expressing contradictory concerns. it is unclear how rights transfer between the involved parties should operate. To achieve a balance between different positions, it has been suggested the adoption of a quasi-property model, with a different grounding on a particular right depending on the scholar analyzed.[12]Through it, it would be possible to adopt a practical and hands-on solution to the issue of data ownership and, consequently, bridge the different positions mentioned above. Exploring whether or not this approach is compatible with the GDPR or not shall be one of the main challenges for the LeADS project.

 

As mentioned, the European regulatory framework is currently in flux and attempting to tackle the new future economic developments sustainably in the long run. There are currently different proposals undergoing discussion at a different level that deals with the uneasy question of what exactly is (personal) data from a legal point of view in a unified manner and try to find an answer to the question of ‘what is data ownership?’, which forms one of the main research crossroads for the LeADS project, as well as with other research questions that form up its core.

 

Authors: Prof. dr. Paul de Hert, Prof. dr. Gloria González Fuster, Andrés Chomczyk Penedo

 

[1] ‘Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee and the Committee of the Regions: A European Strategy for Data’ (European Commission 2020) COM(2020) 66 final.

[2] Carrie Gates and Peter Matthews, ‘Data Is the New Currency’, Proceedings of the 2014 New Security Paradigms Workshop(Association for Computing Machinery 2014) <https://doi.org/10.1145/2683467.2683477> accessed 1 April 2021.

[3] Art. 4(1) GDPR: ‘(…) any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person (…)’.

[4] Art. 3(1) Free Flow Regulation: ‘(…) means data other than personal data as defined in point (1) of Article 4 of Regulation (EU) 2016/679; (…)’.

[5] Art. 2(1) DGA: ‘(…) means any digital representation of acts, facts or information and any

compilation of such acts, facts or information, including in the form of sound, visual

or audiovisual recording; (…)’.

[6] Gianclaudio Malgieri, ‘“Ownership” of Customer (Big) Data in the European Union: Quasi-Property as Comparative Solution?’ (Social Science Research Network 2016) SSRN Scholarly Paper ID 2916079 <https://papers.ssrn.com/abstract=2916079> accessed 15 July 2021.

[7] Ignacio Cofone, ‘Beyond Data Ownership’ (Social Science Research Network 2020) SSRN Scholarly Paper ID 3564480 <https://papers.ssrn.com/abstract=3564480> accessed 1 April 2021; Václav Janeček, ‘Ownership of Personal Data in the Internet of Things’ (2018) 34 Computer Law & Security Review 1039; Patrik Hummel, Matthias Braun and Peter Dabrock, ‘Own Data? Ethical Reflections on Data Ownership’ [2020] Philosophy & Technology <http://link.springer.com/10.1007/s13347-020-00404-9> accessed 1 April 2021.

[8] Gloria González Fuster, The Emergence of Personal Data Protection as a Fundamental Right of the EU (Springer Science & Business 2014).

[9] Nestor Duch-Brown, Bertin Martens and Frank Mueller-Langer, ‘The Economics of Ownership, Access and Trade in Digital Data’ (European Commision, Joint Research Centre 2017) JRC Digital Economy Working Paper 2017–01 <https://www.ssrn.com/abstract=2914144> accessed 1 April 2021; Tommaso Fia, ‘An Alternative to Data Ownership: Managing Access to Non-Personal Data through the Commons’ [2020] Global Jurist <https://www.degruyter.com/document/doi/10.1515/gj-2020-0034/html> accessed 1 April 2021.

[10] Christopher Allen, ‘The Path to Self-Sovereign Identity’ (Life With Alacrity, 25 April 2016) <http://www.lifewithalacrity.com/2016/04/the-path-to-self-soverereign-identity.html> accessed 27 June 2019.

[11] Andrés Chomczyk Penedo, ‘Self-Sovereign Identity Systems and European Data Protection Regulations: An Analysis of Roles and Responsibilities’ (Gesellschaft für Informatik 2021) <https://dl.gi.de/bitstream/handle/20.500.12116/36505/proceedings-08.pdf?sequence=1&isAllowed=y>.

[12] Malgieri (n 6).

Public-private data sharing from “dataveillance” to “data relevance”

Data sharing has become a common practice between public and private entities all over the world. The reasons for this are broad and varied, ranging from making more data available for data-rich scientific research to allowing law enforcement agencies to pursue criminal activities with greater precision. While data collection remains a fundamental activity, as it enables an ever-growing amount of data to exist, the sharing of data and its subsequent repurposing can enable further major economic and social value. A single data controller can collect so little information in comparison to the data that can be made available from several third parties.

Regulators have taken notice of this and are planning accordingly to reap the supposed benefits of the data economy by further enabling and pushing for the sharing of data. In this sense, the recent European Strategy for Data puts this practice at its core, envisaging an environment of trusted data-driven innovations fueled by data sharing between digital platforms, governments, and individuals alike.[1]

In the field of law enforcement, the amount of data available also caught the attention of competent authorities a long time ago, as it allows for more ‘smart’ crime prevention yet at the expense of more privacy-invasive practices.[2] In this respect, the increasing amount of available data is highly interesting for the deployment of a forever-expanding surveillance apparatus by public authorities.[3] This has led to the emergence of what has been described as ‘dataveillance’[4] and its considerable expansion in the last decades, rooting itself in our society to become a troublesome practice.[5] In this context, the private sector makes available, either voluntary or not, a considerable portion of their data to law enforcement agencies,[6] with limitations.[7]

Data sharing can also involve access to public sector generated data by private businesses. In this respect, the open data movement has been for years pushing in this direction and, certain cases, triggering legislation that reduces the obstacles to making such data available for re-use by, for example, companies. While it is possible to find certain regulations that either foster or mandate such data sharing practices, all of them must be subject to general applications rules, such as the General Data Protection Regulation (GDPR).

As mentioned above, regulators intend to foster data sharing between private and public sectors. As the recent European Strategy for Data points out, certain kinds of information, such as that generated within smart cities, can provide an interesting field where public-private data sharing would be beneficial to society and individuals.[8] For example, data generated by the financial services industry provides a considerable amount of information, both in quantity and quality.[9] Nevertheless, a single payment can provide a sensitive insight into an individual’s life, from health data -for example from recurring pharmacy expenses- up to religious information -as in the case of monthly contributions to a religious organization-. This could be overcome by sharing certain information about payments in an aggregated manner, for example merely their time and date, which could help in understanding citizens movements in a city and plan city’s policies accordingly to accommodate for citizens’ benefit.[10]

But how can we avoid that these public-private data-sharing activities end up contributing to more ‘dataveillance’? While the GDPR covers a significant amount of data processing activities, we also need to involve other relevant pieces of legislation that contemplate public authorities, particularly law enforcement agencies, such as the Law Enforcement Directive. While the obligations and rights within the relevant legal framework diverse, it is possible to highlight that most of these activities should be conducted following some common principles.

Among these we point out that only accurate and relevant data should be used for a particular and specific purpose. In this respect, we can ask when the data are relevant enough for the intended purposes; in other words, we need to question when we have “good enough data” [11] for the intended public-private data sharing. By doing so, we can assess whether compliance with these rules has been reached. Through this, we can effectively implement the principles of data accuracy and minimization, alongside other applicable and relevant principles.

Understanding how these rules are effectively applied to, and guide, these public-private data sharing practices is crucial as regulators seek to foster them. For example, the European Union is currently working on a proposal for a Data Governance Act, which introduces data sharing services,[12] as well as data altruism.[13] Both of these categories, with their particularities, seek to foster data-sharing activities between private and public entities alike. Data protection watchdogs have raised their concerns regarding the current wording and extent of this proposal.[14] Among these concerns, the lack of clear integration between them and, in particular, the GDPR was highlighted as a troublesome issue.

Public-private data sharing activities are not likely to stop. On the contrary, the current data strategy for the European Union is to further expand the sharing of data in an automated manner using APIs, such as in the case of open finance.[15] The question that remains open on this front is whether these new data governance schemes can make us move from a dataveillance perspective towards a data relevance scenario. Within this context, we intend to explore this broad question in the different crossroads that this topic is present in the LeADS project and seek ideas to tackle the matter in an interdisciplinary manner.

Authors: Prof. dr. Paul de Hert, Prof. dr. Gloria González Fuster, Andrés Chomczyk Penedo

[1] ‘Citizens will trust and embrace data-driven innovations only if they are confident that

any personal data sharing in the EU will be subject to full compliance with the EU’s strict

data protection rules’ (see ‘Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee and the Committee of the Regions: A European Strategy for Data’ (European Commission 2020) COM(2020) 66 final.)

[2] David Wright and others, ‘Sorting out Smart Surveillance’ (2010) 26 Computer Law & Security Review 343.

[3] Margaret Hu, ‘Small Data Surveillance v. Big Data Cybersurveillance’ (2015) 42 Pepperdine Law Review 773.

[4] Roger Clarke, ‘Information Technology and Dataveillance’ (1988) 31 Communications of the ACM 498.

[5] Roger Clarke and Graham Greenleaf, ‘Dataveillance Regulation: A Research Framework’ (2017) 25 Journal of Law, Information and Science 104.

[6] David Lyon, Surveillance After Snowden (John Wiley & Sons 2015).

[7] ‘Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee and the Committee of the Regions: A European Strategy for Data’ (n 1).

[8] ibid.

[9] V Ferrari, ‘Crosshatching Privacy: Financial Intermediaries’ Data Practices Between Law Enforcement and Data Economy’ (2020) 6 European Data Protection Law Review 522.

[10] Ine van Zeeland and Ruben D’Hauwers, ‘Open Banking Data in Smart Cities’ (VUB Chair Data Protection on the Ground – VUB Smart Cities Chair – imec-SMIT-VUB 2021) Round table report <https://smit.vub.ac.be/wp-content/uploads/2021/02/Report-roundtable-Open-Banking-Smart-Cities_def.pdf> accessed 6 September 2021.

[11] Angela Daly, Monique Mann and S Kate Devitt, Good Data (Institute of Network Cultures 2019).

[12] According to the current wording of the proposal, under this service, we can include: (i) intermediate between data holders and data users for the exchange of data through different means; (ii) intermediate between data subjects and data users for the exchange of data through different means for the purpose of exercising data rights provided for in the GDPR, mainly right to portability; and (iii) provide data cooperatives services, i.e. negotiate on behalf of data subjects and certain data holders terms and conditions for the processing of personal data.

[13] According to the current wording of the proposal, under this term, we are referring to “(…) the consent by data subjects to process personal data pertaining to them, or permissions of other data holders to allow the use of their non-personal data without seeking a reward, for purposes of general interest, such as scientific research purposes or improving public services, such as scientific research purposes or improving public services”.

[14] ‘Joint Opinion 03/2021 on the Proposal for a Regulation of the European Parliament and of the Council on European Data Governance (Data Governance Act)’ (European Data Protection Board – European Data Protection Supervisor 2021) Joint Opinion 03/2021 <https://edpb.europa.eu/sites/edpb/files/files/file1/edpb-edps_joint_opinion_dga_en.pdf> accessed 25 March 2021.

[15] ‘Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee and the Committee of the Regions on a Digital Finance Strategy for the EU’ (European Commission 2020) Communication from the Commission (2020) 591 <https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:52020DC0591&from=EN> accessed 1 December 2020.

SELECTION for 4 MARIE CURIE EARLY-STAGE RESEARCHERS (ESR) positions at Scuola Superiore Sant’Anna, Italy, funded in the framework of the “Legality Attentive Data Scientists (LeADS) Project” (Grant Agreement n. 956562) – List of candidates admitted at the interview

  1. ABOLHASSANI MARYAM
  2. BEYSÜLEN ANGIN BERFU
  3. BRIFA PINELOPI MARIA
  4. CASALUCE ROBERTO
  5. CREPAX TOMMASO
  6. GAUR MITISHA
  7. LIYEW CHALACHEW MULUKEN
  8. POE ROBERT LEE
  9. SATKA ZENEPE
  10. SPERA FRANCESCO
  11. ULLAH ZAHID
  12. YANG QIFAN

IMPORTANT INFORMATION FOR THE CANDIDATES

The interviews are scheduled  on September 15th at 4 p.m CET online in the following public meeting room: https://sssup.webex.com/meet/g.comande

 

Data Privacy in the Financial and Industrial Sectors

Nowadays, one of the most important issues for enterprises across the financial services industry is privacy and data protection. Records and in particular financial records are considered sensitive for most of the consumers and good data handling practices are promoted by the respective regulators targeting for increased customer profiling in order to identify any potential opportunities and make a risk management analysis. To this extend, the management of data privacy and data protection is of great importance throughout the customer cycle. For example, there are several use cases in the finance sector that involve sharing of data across different organizations (e.g., sharing of customer data for customer protection or faster KYC, sharing of businesses’ data for improved credit risk assessment, sharing of customer insurance data for faster claims management and more).

To facilitate such cases, several EU funded projects have already discussed the need to reconsider data usage and regulation in order to unlock the value of data while fostering consumer trust and protecting fundamental rights. Permissioned blockchain infrastructure is utilized in order to provide privacy control, auditability, secure data sharing, as well as faster operations. The core of the blockchain infrastructure is enhanced in two directions: (i) Integration of tokenization features and relevant cryptography, as a means of enabling assets trading (e.g., personal data trading) through the platform; and (ii) utilization of Multi-Party Computation (MPC) and Linear Secret Sharing (LSS) algorithms in order to enable querying of encrypted data as a means of offering higher data privacy guarantees. Based on these enhancements the project will enable the implementation disruptive business models for personalization, such as personal data markets.

LeADS builds upon those results and steps forward by setting a more ambitious goal: to experiment, in partnership with businesses and regulators, on a way to pursue not only lawfulness of data mining and AI development, but both the amplest protection for fundamental rights and, simultaneously, the largest possible data exploitation in the digital economy using coexisting characteristics of the data driven financial services, LeADS helps to define: Trust; Involving; Empowering; Sharing. Participation in several of the mentioned projects (e.g. XAI, SoBigData++) and/or close scientific connections with the research teams (e.g. CompuLaw) by several consortium members ensure close collaboration with the named projects. There are great potentials to be found in data science and AI development entailing both great risks in terms of privacy and industrial data protection. Even considering legal novelties like: the Digital Single Market strategy, the GDPR, the Network and Information Security (NIS) directive, the e-privacy directive and the new e- privacy regulation, legal answers are often regarded as inadequate compromises, where individual interests are not really protected. As far it concerns the academic elaboration with the subject, there are several challenges that still need to be addressed in data-driven financial services that LeADS could met regarding the empowerment of individuals (users, clients, stakeholders, etc.) in their data processing, through “data protection rights” or “by design” technologies, like for example blockchain as described before.

The approach LeADS Early-Stage Researchers will be developed in two folds: 1) The study of digital innovation and business models (e.g. multisided markets, freemium) dependent on the collection and use of data in the financial sector. It will also: a) link this analysis to the exploration of online behaviour and reactions of users to different types of recommendations (i.e. personalized recommendations by financial/industrial applications) that generate additional data as well as large network effects; b) assess (efficiency, impact studies) the many specific privacy regulations that apply to online platforms, business models, and behaviours, and 2) Proposal of a user centric data valorisation scheme by analysing user-centric patterns, the project aims to: a) Identify alternative schemes to data concentration, to place the user at the heart of control and economic valorisation of “his” data, whether personal or not (VRM platforms, personal cloud, private open data); b) Assess the economic impact of these new schemes, their efficiency, and the legal dimension at stake in terms of liability and respect of privacy. The project will also suggest new models allowing the user to obtain results regarding the explainability of the algorithms that are being utilized by financial organizations to provide the aforementioned personalized recommendations for their offerings. LeADS research will overcome contrasting views that consider privacy as either a fundamental right or a commodity. It will enable clear distinctions between notions of privacy that relate to data as an asset and those which relate to personal information affecting fundamental rights.

Against this background, LeADS innovative theoretical model, based on new concepts such as “Un- anonymity” and “Data Privaticity”, will be assessed within several legal domains (e.g. consumer sales and financial services, information society contracts, etc.) and in tight connection with actual business practices and models and the software they use. Finally, due to the increasing potential of Artificial Intelligence information processing, a fully renewed approach to data protection and data exploitation is introduced by LeADS by building a new paradigm for information and privacy as a framework that will empower individuals’ awareness in the data economy; wherein data is constantly gathered and processed without awareness, and the potential for discrimination is hidden in the design of the algorithms used. Thus, LeADS will set the theoretical framework and the practical implementation template of financial smart models for co- processing and joint-controlling information, thereby answering the specific need to clarify and operationalize these newly- introduced notions in the GDPR.

Technical and legal aspects of privacy-preserving services: the case of health data

Nowadays, the potential usefulness as well as the value of health data are broadly recognized. They may transform traditional medicine into clinical science intertwined with data research, driving innovation and producing value from the perspective of the key stakeholders of the health care ecosystem: not only patients but also health care providers and the life insurance sector.

Yet, the health data does not appear out of thin air, it is not a product that can be viewed in isolation. It is:

  • the personal data related to the physical or mental health of a natural person, including the provision of health care services, which reveal information about his or her health status (data concerning health),
  • the personal data relating to the inherited or acquired genetic characteristics of a natural person which give unique information about the physiology or the health of that natural person and which result, in particular, from an analysis of a biological sample from the natural person in question (genetic data),
  • the personal data resulting from specific technical processing relating to the physical, physiological or behavioural characteristics of a natural person, which allow or confirm the unique identification of that natural person, such as facial images or dactyloscopic data (biometric data).

Thus, the individual cannot be deprived of the right to decide about their processing as the health issues are at the very centre of the privacy protection sphere.

It becomes clear that balancing the interests of the private individual whose privacy is protected, interests of other private and public actors, and general common interests is highly problematic. Naturally, processing of the health data cannot be unrestricted: optimally, the legal framework should facilitate unlocking the value of health data for European citizens and businesses and empower users in the management of their own health data without undermining the very essence of the right to privacy.

Currently, processing of health data falls under complex GDPR legal regime. This, however, poses a serious challenge for the data processors on the one hand and, on the other, gives rise to numerous legal questions. What are the grounds for processing such data in this highly differentiated context?  How should medical data be protected both on the regulatory and technological level? How can we harness newest technology to increase data safety? How can anonymization and/or privacy-preserving data management techniques using efficient cryptography (e.g. homomorphic, secure multi-party computations) contribute to reaching higher protection levels without becoming a hurdle or an impediment for legitimate data processing? Can the blockchain technologies be used for health information exchange? Should the creation of technological infrastructure be coupled with establishing proper key management schemes?

The task is twofold. First, on the regulatory level general policy guidelines for legislators, independent agencies, businesses on data sharing platforms are necessary, together with the analysis of the policy and market implications of providing privacy-preserving services. Second, the practical recommendations are needed: specific postulates should be formulated on how data protection techniques can be applied in the health domain, in order to contribute to achieving the abovementioned aims.

Author: dr. Katarzyna Południak-Gierz, Jagiellonian University

WATCH AGAIN THE WEBINAR: SoBigData++ and LeADS joint Awareness Panel. Legal Materials as Big Data: (algo)Rithms to Support Legal Interpretation. A Dialogue with Data Scientists.

SoBigData++ and LeADS joint Awareness Panel. Legal Materials as Big Data: (algo)Rithms to Support Legal Interpretation. A Dialogue with Data Scientists.

6th of July 2021

Video

Is blockchain THE reliability solution for big data?

Blockchains have sparked great enthusiasm from the data science community who believes this technology will be THE solution to data authenticity, data privacy protection, data quality guarantee, smooth data access and real time analysis [1], [2]. Data being considered as the new digital oil, data science and blockchain seem to be the perfect match [3]. Indeed, data science allows people/organizations to extract valuable knowledge from humongous volume of structured or unstructured data. So, blockchain provides security and reliability of the manipulated data. But does it sound too good to be true?

 

Blockchain is a way to implement a decentralized repository (a.k.a Distributed Ledger Technology) managed by a group of participants, without necessity of assuming trust among each other. Blockchain groups data records into blocks that are cryptographically signed and chained by back-linking each block to its predecessor. Blockchain was initially proposed for cryptocurrency (e.g., Bitcoin). This first generation of blockchain applications is called Blockchain 1.0. Later, smart contracts were introduced, paving the way to decentralized applications referred as Blockchain 2.0. Today, Blockchain 3.0 explores a wider spectrum of target applications like e-health, smart cities, identity management, etc [4].

 

Big data is one of the possible Blockchain 3.0 applications. Deepa et al [5] recently published a survey on the use of blockchain technology for big data which shows that projects try to apply blockchain-based solutions at different steps of big data processing. This includes big data acquisition (data collection, data transmission and data sharing [6]), big data storage (by securing decentralized file systems or by detecting malicious updates in databases [7]) or big data analytics (for machine learning model sharing, decentralized intelligence and trusted decision-making of machine learning [8]).

 

Although blockchain technology appears to be a good candidate to secure big data, this technology is not flawless [9] [10] [11] and security threats/vulnerabilities have been identified at each layer of the blockchain stack model [12]. First of all, blockchains depend on the underlying network services and attacks on routing protocols or on DNS can harm a blockchain network. At the consensus layer, which is the core component that directly dictates the behavior and the performance of the blockchain, the situation is also complex [13]. The classic Proof of Work protocol is far from being a panacea and is a non-sense from the environment point of view [14]. In addition, most miners are gathering around mining pools to increase their processing capability, and thus, their chance of adding a new block to the blockchain. At the time of writing, the blockchain.com website estimates that six bitcoin mining pools (F2Pool, AntPool, Poolin, ViaBTC, Huobi.pool and SlushPool) represent 63% of the hash rate [15]. If they collude with each other, they can launch the 51% attack and destabilize the whole bitcoin network [13]. Consequently, more and more consensus algorithms are studied, proposed, and extended such as proof of stake, of authority, of activity, RBFT, YAC, etc. However, an ideal consensus algorithm is still missing as almost all algorithms have significant disadvantages in one way or another with respect to their security and performance, as concluded in [13]. The Replicated State Machine layer, which is responsible for the interpretation and execution of transactions, can be vulnerable too. Blockchain technology doesn’t guarantee the reliability of the data, only the integrity of the blocks. For instance, Karapapa et al. [16] showed how to make ransomwares available using Ethereum smart contracts. Confidentiality of data is also not always embedded in the blockchain. Finally, blockchain is implemented as software running on computers and thus attackers can exploit security holes and misconfigurations. E.g., white hat hackers found more than 40 bugs in blockchain and cryptocurrency platforms during a one month bug bounty session in 2019 – 4 of them were buffer overflows which made possible to inject arbitrary code [17].

 

To conclude, blockchain technology offers promising features to big data. However, one should acknowledge the current technical limitations of the technology. Another consideration is legal aspects. Indeed, the European Parliamentary Research Service observed many points of tension between blockchains and the GDPR [18]. When all these issues will be answered then yes … blockchain will be a serious candidate for being the reliability solution for big data.

 

By Romain Laborde

 

References

[1]       “Why Data Scientists Are Falling in Love with Blockchain Tech,” Techopedia.com. https://www.techopedia.com/why-data-scientists-are-falling-in-love-with-blockchain-technology/2/33356 (accessed Apr. 21, 2021).

[2]       2021 at 1:00pm Posted by Isaac Rallo on March 15 and V. Blog, “Six use cases in Blockchain Analysis.” https://www.datasciencecentral.com/profiles/blogs/six-use-cases-in-blockchain-analysis (accessed Apr. 21, 2021).

[3]       “What Makes Blockchain and Data Science a Perfect Combination.” https://www.rubiscape.io/blog/focus-on-data-diversity-to-make-your-ai-initiatives-successful-0 (accessed Apr. 21, 2021).

[4]       D. Di Francesco Maesa and P. Mori, “Blockchain 3.0: applications survey,” Journal of Parallel and Distributed Computing, vol. 138, pp. 99–114, Apr. 2020, doi: 10.1016/j.jpdc.2019.12.019.

[5]       N. Deepa et al., “A survey on blockchain for big data: Approaches, opportunities, and future directions,” arXiv preprint arXiv:2009.00858, 2020.

[6]       N. Tariq et al., “The Security of Big Data in Fog-Enabled IoT Applications Including Blockchain: A Survey,” Sensors, vol. 19, no. 8, Art. no. 8, Jan. 2019, doi: 10.3390/s19081788.

[7]       N. Zahed Benisi, M. Aminian, and B. Javadi, “Blockchain-based decentralized storage networks: A survey,” Journal of Network and Computer Applications, vol. 162, p. 102656, Jul. 2020, doi: 10.1016/j.jnca.2020.102656.

[8]       Y. Liu, F. R. Yu, X. Li, H. Ji, and V. C. M. Leung, “Blockchain and Machine Learning for Communications and Networking Systems,” IEEE Communications Surveys Tutorials, vol. 22, no. 2, pp. 1392–1431, Secondquarter 2020, doi: 10.1109/COMST.2020.2975911.

[9]       X. Li, P. Jiang, T. Chen, X. Luo, and Q. Wen, “A survey on the security of blockchain systems,” Future Generation Computer Systems, vol. 107, pp. 841–853, 2020.

[10]     M. Saad et al., “Exploring the attack surface of blockchain: A comprehensive survey,” IEEE Communications Surveys & Tutorials, vol. 22, no. 3, pp. 1977–2008, 2020.

[11]     Y. Wen, F. Lu, Y. Liu, and X. Huang, “Attacks and countermeasures on blockchains: A survey from layering perspective,” Computer Networks, vol. 191, p. 107978, 2021.

[12]     I. Homoliak, S. Venugopalan, D. Reijsbergen, Q. Hum, R. Schumi, and P. Szalachowski, “The Security Reference Architecture for Blockchains: Toward a Standardized Model for Studying Vulnerabilities, Threats, and Defenses,” IEEE Communications Surveys & Tutorials, vol. 23, no. 1, pp. 341–390, 2020.

[13]     M. Sadek Ferdous, M. Jabed Morshed Chowdhury, M. A. Hoque, and A. Colman, “Blockchain Consensus Algorithms: A Survey,” arXiv e-prints, p. arXiv-2001, 2020.

[14]     A. B. Business CNN, “Bitcoin mining in China could soon generate as much carbon emissions as some European countries, study finds,” CNN. https://www.cnn.com/2021/04/09/business/bitcoin-mining-emissions/index.html (accessed Apr. 21, 2021).

[15]     “pools,” Blockchain.com. https://www.blockchain.com/charts/pools (accessed May 03, 2021).

[16]     C. Karapapas, I. Pittaras, N. Fotiou, and G. C. Polyzos, “Ransomware as a Service using Smart Contracts and IPFS,” in 2020 IEEE International Conference on Blockchain and Cryptocurrency (ICBC), 2020, pp. 1–5.

[17]     Mix, “Security researchers found over 40 bugs in blockchain platforms in 30 days,” TNW | Hardfork, Mar. 14, 2019. https://thenextweb.com/news/blockchain-cryptocurrency-vulnerability-bug (accessed Apr. 28, 2021).

[18]     M. Finck, “Blockchain and the General Data Protection Regulation: Can distributed ledgers be squared with European data protection law?,” PE 634.44, Jul. 2019. [Online]. Available: https://www.europarl.europa.eu/RegData/etudes/STUD/2019/634445/EPRS_STU(2019)634445_EN.pdf.

Rights of the Internet of Everything (Last-JD-RIoE) – First Annual Conference

Wednesday and Thursday, 21-22 July, Online

This event, which takes place in the framework of the LAST-JD-RIoE Project, funded by the European Union’s Horizon 2020 research and innovation programme under Marie Skłodowska-Curie ITN EJD grant agreement No 814177, gathers world authorities on different aspects of the Internet of Everything the promote scientific discussion, exchange research ideas and promote business opportunities.

For further info and Program

Registration