Filling the Void: Tracking Industry Solutions to AI Regulatory Challenges
3.11.2024
Truly a tale as old as tech is that of bleeding-edge technological advancement, regulators seeking to keep pace and the gray area that exists in the middle of these two points. Once again, that is where lawyers find themselves on the topic of artificial intelligence.
AI has posed several challenges, many of which are playing out in our courts. Last year, a group of artists filed suit against an AI company for the illegal use of their work. Recently, The New York Times filed a lawsuit against Microsoft and OpenAI for copyright infringement. Copyright is one of most complex issues in the Wild West of AI, and it’s one without any simple answers or solutions.
AI is also disrupting our campuses, requiring educational institutions to come up with ways to combat a steep increase in plagiarism. Further, AI is blamed for bias in the use of facial recognition technology, with potentially harmful repercussions for people already at a disadvantage. But just as people and technology have played a role in creating some of our current challenges, it is also within our collective capability, using the same ingenuity and tools, to forge effective solutions.
Copyright Complexities and Industry Response
One of the biggest legal practice areas challenged with the increased use of these models is copyright. This is because, in some instances, there are questions as to where the ingested data that powers them comes from.[1] For example, some models scrape the internet and absorb massive amounts of data, possibly including copyrighted material, which can inadvertently infringe on content contributors’ rights depending on how they impact models’ outputs.[2]
In December 2023, The New York Times joined a growing number of litigants in filing suit against OpenAI alleging copyright infringement.[3] Specifically, The New York Times alleged that OpenAI, creator of ChatGPT, uses its published works to train its AI model and that there have been instances of “blatant regurgitation” of their articles in ChatGPT’s outputs as opposed to outputs that are truly transformative and thus more compelling representations of “fair use,” in support of OpenAI’s arguments.[4]
Though this case is freshly filed, the implications it can have for AI copyright regulations may be significant. It could set precedent and expectations around what constitutes acceptable use of copyrighted materials in generative AI products, what level of documentation and transparency regarding training should be readily available and what rights content contributors may have in this context. This is significant for those who do not have the resources and headline-making capability of The New York Times.
In February of 2024, one of the first attempts to reconcile some of these issues came when Google signed a deal to train its AI model on Reddit users’ posts for $60 million dollars.[5] This may indicate a future trend in how businesses seek to avoid, or at least limit liability, when building their models leveraging large-scale data ingestion through third-party platforms’ content.
Andersen v. Stability AI Ltd.
One area that raises more questions is social media platforms such as Instagram and X, formerly known as Twitter, which serve as tools for up-and-coming artists to build their brands and gain larger followings by posting their works publicly. Users’ expectations for how those posts will be utilized are important to note. Artists may not consent to having their pieces ingested into machine-learning models but have limited recourse available when they are used.
Many artists pride themselves on having a unique style. The potential of AI to replicate that style and borrow from their techniques can result in negative impacts for an artist’s bottom line and brand sustainability. To combat this, in January 2023, three artists joined forces to file suit in federal court against popular generative AI platforms for these precise reasons.[6],[7] Unfortunately for the artists, copyright claims cannot be taken up in the federal courts if a copyright is not properly filed and registered with the U.S. Copyright Office, which happened to be the case for many of the works cited in the suit.[8]
Because of that and other defects outlined in U.S. Senior District Judge William H. Orrick’s order, the case was largely dismissed, marking a critical victory for the AI companies named in the complaint.[9] Still, it was not a total loss, as the artists were granted some latitude by Judge Orrick, who granted them an opportunity to amend their complaints to remove the defects and narrow their scope accordingly.[10] The plaintiffs refiled their complaint in November 2023.[11] One important reminder here for attorneys is to urge artist clients to register copyrights federally for works they seek to protect through the USCO.
The USCO also had an open comment period between August and October 2023 for industry stakeholders to weigh in on some of the questions AI has raised about copyright. Some of the questions they posed for comments included:[12]
- “What are your views on the potential benefits and risks of this technology?”
- “Does the increasing use of distribution of AI-generated material raise unique issues for your sector or industry?”
- “Are there any statutory or regulatory approaches that have been adopted or are under consideration in other countries that relate to copyright and AI that should be considered or avoided in the United States?”
- “Is new legislation warranted to address copyright or related issues with generative AI?”[13]
Safeguards and Solutions
As this space continues to develop and we await the dust to settle, the question is: what, if anything, can serve as technical safeguards for content creators in the interim?
As it turns out, academics and various AI developers are making efforts to help solve some of these issues. For starters, while content contributors can opt out from allowing certain developers to use their work, the efficacy of this mechanism has resulted in challenges from some. Since a prerequisite to opt out and removal is often providing proof that a model is using your content, exercising this option can prove difficult.
One currently available solution being developed by the University of Chicago is Project Nightshade.[14] This project adopts an aggressive approach regarding current AI training practices. The developers point to existing opt-out mechanisms, stating that they “have been disregarded by model trainers in the past” and “can be easily ignored with zero consequences” because they are “unverifiable and unenforceable.”[15] The team, including lead developers Ben Zhao and Shawn Shan, describe the functionality of this tool in the following way:
“ [I]t is designed as an offense tool to distort feature representations inside generative AI image models. . . . Nightshade is computed as a multi-objective optimization that minimizes visible changes to the original image. While human eyes see a shaded image that is largely unchanged from the original, the AI model sees a dramatically different composition in the image. For example, human eyes might see a shaded image of a cow in a green field largely unchanged, but an AI model might see a large leather purse lying in the grass. Trained on a sufficient number of shaded images that include a cow, a model will become increasingly convinced cows have nice brown leathery handles and smooth side pockets with a zipper, and perhaps a lovely brand logo.”[16]
The distortion effect of the kind presented here offers some hope for content creators to protect their works. It may be encouraging for them to see these types of tools becoming available, but what can be more assuring is if developers themselves take proactive steps toward addressing these problems. In fact, this can be mutually beneficial as regulations and rules are starting to form around this technology because they will help protect both developers and artists.
As AI developers are being frequently summoned before Congress and expected to address general concerns surrounding the safe use and deployment of AI, genuine demonstrations of good faith toward ethical practices can go a long way toward easing those concerns. Whether it’s recognizing artists for their works or identifying deep fakes more effectively, concepts like data provenance, i.e., information about where data came from and how it may have been modified, are vital, and AI content credentials are a great step toward achieving that. Content credentials are embedded metadata used for verification purposes. While digital watermarks have been used in the past as an attempt to preserve the integrity of content, it is now easy to have them removed; in contrast, content credentials are cryptographic and unalterable.[17]
Attempts to surface solutions like content credentials into the mainstream are being spearheaded by companies like Adobe, a member of the Content Authenticity Initiative and co-founder of the Coalition for Content Provenance and Authenticity, which comprises members that include Intel and Microsoft.[18] Both are focused on creating standards around the sharing of digital content across platforms and websites.[19] The mobile phone industry is undergoing a similar transformation as brands including Samsung and Motorola will have newer devices roll out with content credential capability.[20] These kinds of tools are important to look out for to preserve integrity and transparency. Attorneys can work with their clients to seek out appropriate tools.
Pioneers in deploying technical defensive safeguards can play a major role in influencing future regulations of controls that the industry may be expected to follow. Even if not explicitly prescribed in a regulation, such safeguards can become industry standard, similar to how encryption and multifactor authentication are commonly available to users today.
AI and Plagiarism
OpenAI’s launch of ChatGPT threw the long-existing AI discussion into hyperdrive when it acquired 100 million monthly active users only two months after it went public in November 2022, making it the fastest growing consumer application in history.[21] Unfortunately, as users began to experiment with its capabilities, misuse and unintended outcomes accompanied that exploration. Namely, students became aware that they could have AI write unique outputs/responses to unique inputs/prompts, i.e., they did not have to read books to do book reports or really do much of anything to produce a multi-page essay, or science problem, or recall a historically significant moment – and teachers began to catch on. Education is an industry that is dependent on self-governance, which tends to come in the form of academic handbooks, etc. Like the legal environment, these handbooks most likely have not addressed AI directly. Also like the legal environment, schools could technically point to existing, broad rules, and administrators could likely defer to customary practice, which prohibits plagiarism and any other action that goes against the spirit of academic honesty and integrity and could reasonably be deemed cheating.
Still, the issue is not in clarifying the wrongness of using AI in these circumstances; the issue is detecting it. Just as the law can be difficult to apply to significant advances in technology, academia’s self-governance model, through the use of now-outdated plagiarism trackers, can present similar challenges. Enter Edward Tian, who, while completing his senior year at Princeton University, launched GPTZero at around the same time that ChatGPT was breaking user acquisition records in January 2023.[22] With this new technology, the fight against advanced plagiarism was now purportedly balanced, as GPTZero’s purpose is to detect AI-generated content, although it has been criticized for producing false positives. Regardless, in October 2023, the American Federation of Teachers signed a deal with GPTZero to assist teachers in identifying possible plagiarism.[23]
Facial Recognition Technology
The Black Lives Matter movement has highlighted important discussions about the use of facial recognition technology. Concerns have been raised about potential biases and the need for responsible use, as well as law enforcement tracking of protesters at rallies.[24] These discussions are vital as they guide us toward more equitable and transparent applications of AI technologies. In a report published by the National Institute of Standards and Technology, studies demonstrated that algorithms falsely identified Black and Asian faces 10 to 100 times more than white faces.[25]
Several facial recognition technology developers have since ceased development and distribution of this innovation.[26] While different algorithms may produce distinctive results, and technical enhancements are rapid in this space, struggles with the technology persist to this day. In December 2023, the Federal Trade Commission announced that popular national drugstore chain Rite Aid would be prohibited from using facial recognition technology for surveillance purposes for five years, citing Rite Aid’s “reckless use” of the technology that “left its customers facing humiliation and other harms.”[27] Among the transgressions listed in the FTC complaint, Rite Aid failed to
- Consider and mitigate potential risks to consumers from misidentifying them, including heightened risks to certain consumers because of their race or gender;
- Test, assess, measure, document or inquire about the accuracy of its facial recognition technology before deploying it;
- Prevent the use of low-quality images in connection with its facial recognition technology, increasing the likelihood of false-positive match alerts;
- Regularly monitor or test the accuracy of the technology after it was deployed; and
- Adequately train employees tasked with operating facial recognition technology in its stores and flag that the technology could generate false positives.[28]
It did not help that Rite Aid had also violated a 2010 FTC order by failing to adequately implement a comprehensive information security program.[29] In light of these circumstances, there has been a boom over the years in anti-FRT fashion and arts, including masks, LED visors and even knit sweaters designed to confuse the recognition software.[30] While it may not be feasible to suggest that clients and developers invest in the use of these fashion accessories, the FTC’s Rite Aid order does outline helpful guidelines and protocols for proper and safer use of facial recognition technology.
General Best Practices
A simplified overview of where we find ourselves today is that AI is a fast-developing technology yielding a strikingly steep adoption curve for users, which can present new risks. To help address those risks, we are witnessing the emergence of new tools and markets. As regulations surrounding AI continue to evolve, those involved can be guided by some basic principles, regardless of what final shape they may take, which can serve to both insulate companies from potential liability and protect content creators.
First, blind trust in autonomous technologies without any human oversight is imprudent. When some lawyers attempted to rely fully on AI, they found out the hard way, via sanctions or even job termination, that some AI tools can “hallucinate” (i.e., produce an incorrect output based on unintended patterns it recognizes) when it comes to generating case law.[31] GPTZero has experienced issues with false-positives; in one example, it claimed that our own U.S. Constitution was drafted with the help of AI.[32] Therefore, if you or your client are seeing areas of your business where there is full automation without any oversight, especially when sensitive data is involved, be aware of the risks.
Secondly, honest approaches to AI self-governance in lieu of fully fleshed out regulations should lean on existing principles of ethical data stewardship. Organizations collecting and processing potentially sensitive (or otherwise regulated) data should implement meaningful forms of transparency, consent and security so the emergence of AI should not present any surprises there.
This is critical for both developers of the technology, as well as for those seeking to procure it. Developers should clarify how their models operate, what data they ingest and how they ingest it and ensure that any potentially sensitive data is secure through adherence to appropriate encryption protocols.
One way tech companies like IBM, Meta, and Microsoft have already begun to proactively address ethical AI is by pledging to voluntary commitments outlined by the White House.[33] In addition to this gesture of good faith, which involves committing to practices that touch upon safety, security, trust, and five other pillars,[34] a number of these companies bolster those commitments through resources they publish outlining best practices for responsible AI self-governance.[35] Attorneys may want to note these commitments and advise that their AI developer clients consider making similar guarantees to their customers (and have the internal processes to make good on them.) At a minimum, attorneys should ensure that, regardless of whether representing content creators or AI developers, the platforms’ terms of use are continually updated and speak to whether or not works may be used for the purpose of model training. Being mindful of the FTC’s position on this process is also critical, as the commission recently published a blog making it clear to AI developers that “quietly changing your terms of service could be unfair or deceptive,” which could result in possible enforcement actions.[36] Thus, merely making passive changes to policies without clear and explicit notice to users can result in liability.
As a last note, grace goes a long way. It is easy to vilify developers for making mistakes as they innovate and grow, but there is a learning curve for stakeholders industrywide. Not every outcome is foreseeable, but if we continue to take steps toward embracing this technology and employing ethical practices, the future for AI offers some exciting possibilities.
Matthew Lowe is a senior data privacy and AI attorney at IBM. He is a fellow of information privacy with the International Association of Privacy Professionals and is a lecturer at the University of Massachusetts Amherst, where he teaches courses in data privacy, cyber law, and AI ethics. He also serves on NYSBA’s Committee on Technology and the Legal Profession.
[1] Melissa Heikkilä, OpenAI’s Hunger for Data Is Coming Back To Bite It, MIT Technology Review, Apr. 19, 2023, https://www.technologyreview.com/2023/04/19/1071789/openais-hunger-for-data-is-coming-back-to-bite-it/.
[2] Id.
[3] Tom Chavez, OpenAI & The New York Times: A Wake-Up Call for Ethical Data Practices, Forbes, Jan. 31, 2024, https://www.forbes.com/sites/tomchavez/2024/01/31/openai–the-new-york-times-a-wake-up-call-for-ethical-data-practices/?sh=3788b9de2348.
[4] Id.
[5] Anna Tong, et al, Exclusive: Reddit in AI Content Licensing Deal wWth Google, Reuters, Feb. 21, 2024, https://www.reuters.com/technology/reddit-ai-content-licensing-deal-with-google-sources-say-2024-02-22/
[6] No. 3:23-cv-00201 (D.C. Cal. Oct. 30, 2023).
[7] James Vincent, AI Art Tools Stable Diffusion and Midjourney Targeted With Copyright Lawsuit, The Verge, Jan. 16, 2023, https://www.theverge.com/2023/1/16/23557098/generative-ai-art-copyright-legal-lawsuit-stable-diffusion-midjourney-deviantart.
[8] Carl Franzen, Midjourney, Stability AI and DeviantArt Win a Victory in Copyright Case by Artists – but the Fight Continues, VentureBeat, Oct. 30, 2023, https://venturebeat.com/ai/midjourney-stability-ai-and-deviantart-win-a-victory-in-copyright-case-by-artists-but-the-fight-continues/.
[9] Id.
[10] Id.
[11] Andersen v. Stability AI, No. 3:23-cv-00201 (D.C. Cal. Oct. 30, 2023).
[12] Artificial Intelligence and Copyright, U.S. Copyright Office, https://public-inspection.federalregister.gov/2023-18624.pdf.
[13] Emilia David, US Copyright Office Wants To Hear What People Think About AI and Copyright, The Verge, Aug. 29, 2023, https://www.theverge.com/2023/8/29/23851126/us-copyright-office-ai-public-comments.
[14] What Is Nightshade?, The University of Chicago, https://nightshade.cs.uchicago.edu/whatis.html
[15] Id.
[16] Id.
[17] Rashi Shrivastava, Content Credentials That Label AI-Generated Images Are Coming To Mobile Phones and Cameras, Forbes, Oct. 27, 2023, https://www.forbes.com/sites/rashishrivastava/2023/10/27/content-credentials-that-label-ai-generated-images-are-coming-to-mobile-phones-and-cameras/?sh=3fb0f24b208c.
[18] Content Credentials, Adobe, Sep. 13, 2023, https://helpx.adobe.com/ca/creative-cloud/help/content-credentials.html.
[19] Id.
[20] Id. at Note 25.
[21] Krystal Hu, ChatGPT Sets Record for Fastest-Growing User Base – Analyst Note, Reuters, Feb. 2, 2023, https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/.
[22] Emma Bowman, A College Student Created an App That Can Tell Whether AI Wrote an Essay, NPR, Jan. 9, 2023, https://www.npr.org/2023/01/09/1147549845/gptzero-ai-chatgpt-edward-tian-plagiarism.
[23] Megan Cerullo, American Federation of Teachers Partners With AI identification Platform, GPTZero, CBS News, Oct. 17, 2023, https://www.cbsnews.com/news/gptzero-ai-detector/.
[24] Matthew Lowe, All Eyes on U.S.: Regulating the Use & Development of Facial Recognition Technology, 48 Rutgers Computer & Tech. L.J . 1, 2021.
[25] Natasha Singer & Cade Metz, Many Facial-Recognition Systems Are Biased, Says U.S. Study, N.Y. Times, Dec. 19, 2020, https://www.nytimes.com/2019/12/19/technology/facial-recognition-bias.html.
[26] Supra note 3.
[27] Rite Aid Banned from Using AI Facial Recognition After FTC Says Retailer Deployed Technology without Reasonable Safeguards, FTC, Dec. 19, 2023, https://www.ftc.gov/news-events/news/press-releases/2023/12/rite-aid-banned-using-ai-facial-recognition-after-ftc-says-retailer-deployed-technology-without.
[28] Id.
[29] Id.
[30] Thomas Germain, 10 Pieces of Fashion You Can Wear to Confuse Facial Recognition, Gizmodo, Feb. 25, 2023, https://gizmodo.com/anti-facial-recognition-fashion-what-to-wear-mask-1850093479.
[31] Prashnu Verma and Will Oremus, These Lawyers Used ChatGPT To Save Time. They Got Fired and Fined, Wash. Post, Nov. 16, 2023, https://www.washingtonpost.com/technology/2023/11/16/chatgpt-lawyer-fired-ai/.
[32] Benj Edwards, Why AI Detectors Think the US Constitution Was Written by AI, Ars Technica, Jul. 14, 2023, https://arstechnica.com/information-technology/2023/07/why-ai-detectors-think-the-us-constitution-was-written-by-ai/.
[33] Diane Bartz and Trevor Hunnicutt, Adobe, Others Join Voluntary US Scheme to Manage AI Risk, Reuters, Sept. 12, 2023, https://www.reuters.com/sustainability/adobe-others-join-white-houses-voluntary-commitments-ai-2023-09-12/.
[34] White House, Voluntary Commitments, https://www.whitehouse.gov/wp-content/uploads/2023/09/Voluntary-AI-Commitments-September-2023.pdf.
[35] AI Ethics, IBM, https://www.ibm.com/impact/ai-ethics.
[36] FTC, AI (and Other) Companies: Quietly Changing Your Terms of Service Could Be Unfair or Deceptive, Feb. 13, 2024, https://www.ftc.gov/policy/advocacy-research/tech-at-ftc/2024/02/ai-other-companies-quietly-changing-your-terms-service-could-be-unfair-or-deceptive.