data dignity
What is data dignity?
Data dignity, also known as data as labor, is a theory positing that people should be compensated for the data they have created. The term was coined in 2018 by Jaron Lanier and E. Glen Weyl.
Data dignity would enable data creators to have a say in when, how and where their data is used and to receive payment in exchange for that data. The theory of data dignity focuses on the ethical and responsible procurement and use of data and considers data a form of property.
History of data dignity
Lanier and Weyl first introduced the concept of data dignity in a Harvard Business Review essay titled "A Blueprint for a Better Digital Society." At its inception, the concept centered on the exchanges that occur on free services, such as social media, where organizations collect, analyze and sell user data.
Tech companies monetize captured data through monitoring people's actions online and personalizing algorithms to manipulate customer behavior with targeted advertisements, a phenomenon known as surveillance capitalism. This type of spying strips users of their privacy and data rights and enables big tech companies to profit off unknowing users.
Data dignity argues that this relationship should instead be a transparent, consensual exchange between users and companies, with acknowledgment provided through attribution or compensation. In their essay, Lanier and Weyl call for an open market that would foster an open society and create an equal playing field for organizations and users.
Data dignity in the era of generative AI
Generative AI models such as ChatGPT have exacerbated the problems Lanier and Weyl highlighted. These models are trained on data sources across the internet, from Facebook posts and works of art to a novel's plot or a developer's code. They then produce output in response to users' queries that draws on the original training data.
Currently, generative AI's output manipulates users' data sufficiently to avoid copyright infringement, and AI developers are not required to provide financial compensation to the data's original creator. But with few clear regulations or laws yet in place and no clear copyright guidelines, organizations are largely handling incorporating generative AI into their products on a case-by-case basis.
Consequently, creators and customers have limited protections over how their data is used -- although without their data, these models wouldn't be possible. Implementing data dignity would empower creators, rather than tech companies, by acknowledging and potentially paying people for their data contributions.
Benefits of data dignity
As a proposed solution to the inequalities of the data economy, implementing data dignity would provide several benefits for both enterprises and individual users:
- Greater data control. Data dignity would mean people would play a bigger role in how their data is used because they would be considered the owners of the information. They would have complete control over their contributions, and nothing would be included without their consent, empowering people to exercise greater control over their own data.
- Higher-quality data and models. Because more thought would go into the data that organizations include and purchase, data quality would improve, enabling more trustworthy and accurate model responses. This could also help further expand and refine models to produce better output. In addition, paying people for their skills and expertise could help make models more sophisticated, boosting revenue for all.
- Transparency. Instating data dignity would require organizations and companies to be transparent about the sources of their data and how that data is used. Ethical data collection policies would require clear consent and would hold organizations accountable for their actions. When people know how their data is being used, there is likely to be more trust between enterprises and users. In addition, a greater understanding of the technology would help eliminate black box AI and make the technology more accessible to users.
- Data protection and privacy. Data dignity emphasizes handling data in a responsible and respectful manner and would give data owners complete control over their personal information. This could include rigorous data security and retention policies and requirements that organizations only acquire necessary data.
- Return of dignity. Many people fear that their worth and skills are becoming valueless in the market as AI technology becomes more advanced and replaces jobs. Implementing data dignity could position AI as a form of social collaboration, rather than an uncontrollable force that inflicts more harm than good. Creating a paying market for data would return dignity to creators.
Drawbacks of data dignity
Although data dignity promotes the ethical use of data and creates an economy of data transparency, it does not come without its downsides, including the following:
- Economic impacts. Requiring organizations to pay for the data they collect could limit their ability to generate revenue through data-driven services or advertising, potentially hindering economic growth. The movement could also face pushback from organizations, which might not want to pay for something they have historically considered free.
- Regulation complications. Regulating the digital economy is difficult, as the tech sector moves quickly. Implementing data dignity could require organizations to invest in new technology, training or legal assistance to conform to new standards. For multinational companies, adhering to global laws could also prove difficult, as regulations could differ significantly from region to region. Legal frameworks such as Europe's General Data Protection Regulation would have to be put in place to avoid loopholes and protect users.
- Restricted innovation. Enforcing stricter regulations and privacy requirements could hinder smaller startups or organizations, which might not be able to comply or afford the costs of purchasing data. This could minimize the size and scope of innovation and growth, including in the area of AI development.
Data dignity vs. data governance vs. data integrity
Although data dignity, data governance and data integrity are all concepts focusing on ethical and responsible data management, each concept focuses on a specific aspect of data treatment.
Data governance is the process of managing data in enterprise systems and covers the availability, usability, integrity and security of data. A data governance plan focuses on providing consistent, trustworthy and high-quality data across an enterprise.
Without an effective data governance program in place, data silos can build up, causing inconsistencies that can negatively impact organizations. Data governance programs often involve building a governance team to create uniform standards and policies to avoid the misuse of personal customer data or sensitive information across an organization.
Data integrity focuses on data accuracy and reliability. It is a broad concept that involves keeping data complete, consistent and safe through its entire lifecycle. This encompasses the people, processes, rules and tools that help avoid data errors, corruption or unauthorized changes.
All three concepts focus on the proper use and handling of data, and both data governance and data integrity uphold key aspects of data dignity. But data governance focuses on a broader organizational framework for data management, whereas data dignity focuses on the ethical treatment of personal data from an individual perspective. Likewise, data integrity focuses on ensuring that data is accurate and uncorrupted, regardless of whether that data is personal, and does not address ethical questions such as compensating creators.
What would data dignity implementation look like?
If data dignity were to become a reality, all data available on the internet would be attributed to its creators in the form of acknowledgment or payment. This would mean that enterprises would pay users for their data and users would pay organizations to use services that require others' data, creating a market for data and returning dignity to the creators.
Addressing the relationship between tech platforms and individual users would require collective action to instate privacy and data standards and policies. This would have to be a collaborative effort, as a systemic change of this size will likely experience a great deal of pushback.
Lanier wrote that, to tackle this issue, data dignity implementation should start small and grow to include groups known as mediators of individual data (MIDs) to advocate for data owners. These volunteers would seek out recognition and compensation from organizations for individuals in an attempt to close the growing gap between the two. Collective bargaining power is crucial for users to obtain value in a digital landscape, according to Lanier, and MIDs can help data creators gain recognition.
Because there are no clear guidelines for data dignity implementation and practice, getting started will involve trial and error and invention. Lanier has argued that taking a people-centered approach is the best way for users to gain dignity and value in the digital economy, but this does not account for the billions of contributions to AI models already in existence.