The integration of artificial intelligence (AI) into software products has exploded. According to research from Gartner, before 2023, approximately 2,300 AI products had been developed. In 2023 alone, more than double that amount came to market. This growth is unsurprising, considering that 92% of businesses surveyed by Gartner said they were investing in AI tools in 2024. Many business recognize that a failure to leverage the benefits of AI could leave them at a significant competitive disadvantage compared to their AI-adopting peers.
This rapid expansion has brought copyright challenges to the forefront of AI integration. Behind the scenes—and sometimes in the headlines—software companies are grappling with two critical issues: using customer data to train AI models and protecting users from potential infringement and privacy claims arising from AI-generated content. These challenges are forcing companies to revisit and revise their terms of service agreements, giving rise to various legal and business implications.
The AI Data Dilemma
Let’s begin by addressing the “why” behind companies pushing to capture more user data to train AI systems. Simply put, the demand for high-quality, diverse data to train AI has skyrocketed. Initially, companies relied heavily on publicly available data from the Internet to feed their AI models. However, this resource is not limitless.
This is why many software companies are trying to leverage non-personally identifying data created by their users—data that is often more current, relevant, and tailored to their specific applications than what is publicly available elsewhere. This shift towards leveraging user data has, for many companies, become a competitive necessity in the race to develop more sophisticated and accurate AI systems.
However, this pivot is not without its challenges. It raises significant copyright concerns and pushes the boundaries of what users might expect when they agree to a company’s terms of service—terms that few people thoroughly read. As we’ll explore, companies must navigate these challenges carefully, balancing their need for training data with the necessity of maintaining user trust and complying with privacy and copyright laws.
Potential Copyright Issues in AI Training
As recently reported by The New York Times, tech giants like Adobe, Google, Snap, and Meta have updated their terms of service to explicitly allow the use of user-generated content in training their AI models. While this approach aims to solve the data scarcity problem, it is not without controversy.
For example, Adobe faced backlash in 2023 from its user base of creative professionals after updating its terms of service. Users expressed concern that, as a result of the changes, Adobe was trying to claim rights to their intellectual property for AI training purposes. This outcry forced Adobe to quickly clarify its intentions and promise a rewrite of the terms in clearer language, which resulted in an explicit statement that Adobe “will not use your Local or Cloud Content to train generative AI.”
However, that is not the approach being taken by many other companies. While Adobe chose to explicitly exclude user content from AI training, other tech giants have moved in the opposite direction. For example, per The New York Times, Meta “announced last September that the new version of its large language model was trained on user data that its previous iteration had not been trained on.”
These contrasting approaches highlight an important question in the AI era: Is it legal to use user-generated content for AI training? The answer, like most legal issues, is not cut and dry. The fair use doctrine in U.S. copyright law may offer some protection for companies training AI models, but its application in this context is evolving and largely untested. Proponents argue that AI training is transformative and does not create derivative works, potentially qualifying as fair use. Others assert that this use of customer data is primarily commercial in nature, weighing against fair use. Also, if AI outputs closely mimic training data, it could infringe copyrights. This legal uncertainty puts software companies in a precarious position as they balance innovation with potential liability. Given the potential gains from AI, many companies may be willing to bear copyright infringement risks that flow from their use of user data.
Potential Privacy Issues in AI Training
While most companies take affirmative steps to comply with privacy laws and regulations as they collect, store, and use customer data internally, special attention must be given to compliance with those laws and regulations for any customer data shared with third-party AI platforms that are outside the companies’ control. To the extent personally identifying information or any other customer data subject to applicable state and federal privacy laws and regulations is shared with an external AI platform for training or other purposes, appropriate safeguards and disclosures must be made to ensure compliance.
FTC Warns Against Surreptitious Changes to Terms of Service
As software companies rush to update their terms of service to accommodate AI training needs, they face another legal risk: enforcement action from the Federal Trade Commission (FTC). In a recent blog post titled “AI (and other) Companies: Quietly Changing Your Terms of Service Could Be Unfair or Deceptive,” the FTC issued a warning to the industry.
The FTC acknowledges the data-hungry nature of AI development, calling data “the new oil” that fuels innovation. However, it also recognizes the conflict between companies’ desire to use user data for AI training and their existing commitments to protect user privacy.
The agency explicitly states that it may be unfair or deceptive for a company to adopt more permissive data practices—such as using data for AI training—and only inform consumers through a surreptitious, retroactive amendment to its terms of service or privacy policy. This practice, according to the FTC, risks violating the law.
The FTC’s stance is not new. It cites examples dating back to 2004, demonstrating a long history of challenging deceptive and unfair practices related to privacy policies. The agency emphasizes that even as technology evolves, the principle remains the same: a business cannot unilaterally renege on its privacy commitments after collecting users’ data.
For software companies integrating AI, this warning underscores the importance of transparency and user consent when updating terms of service. According to the FTC, there is “nothing intelligent about obtaining artificial consent.”
Copyright Concerns for Users of AI-Enabled Software
As AI becomes more integrated into software products, a new copyright challenge has emerged, which is the potential for users to face copyright infringement claims based on AI-generated content. This issue is particularly relevant for products like Microsoft’s GitHub Copilot, an AI-powered coding assistant.
Microsoft has taken a proactive approach to address this concern. In November 2023, the company expanded its Copilot Copyright Commitment to include commercial customers using the Azure OpenAI Service. According to Microsoft, “if a third party sues a commercial customer for copyright infringement for using Microsoft’s Copilots or the output they generate, we will defend the customer and pay the amount of any adverse judgments or settlements that result from the lawsuit, as long as the customer used the guardrails and content filters we have built into our products.”
This move by Microsoft is significant for several reasons:
- It acknowledges the legal uncertainty surrounding AI-generated content and copyright.
- It shifts the potential legal burden from the user to the software provider.
- It may set a precedent for how other companies approach similar issues.
As more companies integrate AI into their software products, we may see a variety of strategies emerge to address these types of copyright concerns.
Best Practices for Software Companies Integrating AI
As the legal landscape around AI and copyright and privacy continues to evolve, software companies will be forced to evolve their own strategies to mitigate risks and maintain user trust. Some of the steps companies may consider include:
- Transparent Communication: Be clear and upfront about how user data may be used for AI training, particularly in light of privacy regulations and the FTC warning. Instead, clearly communicate any changes and their implications to users.
- Obtain Explicit Consent: When possible, seek explicit consent from users before using their data for AI training, such as in the form of an opt-in feature.
- Regular Legal Reviews: Conduct regular reviews of your terms of service and privacy policies to ensure they accurately reflect your AI practices and comply with current laws and regulations—in the U.S. and beyond.
- Educate Users: Provide clear guidelines and education to users about the potential copyright implications of using AI-generated content. Some companies may choose to follow Microsoft’s lead and offer indemnification for any claims stemming from the use of such content.
- Monitor Regulatory Developments: Stay informed about evolving regulations and court decisions related to AI and copyright.
Being proactive can help software companies better position themselves to leverage AI’s capabilities while guarding against copyright and privacy issues and maintaining user trust. The path ahead may be uncertain, but with careful planning and responsible practices, software companies can navigate these challenges successfully.