The Oakland Press Blogs: The Law Blogger: The Grey Lady Sues Open AI and Microsoft for Copyright Violations

The old sues the new; our latest "clash of titans" takes the form of one of the most significant lawsuits filed this year. Full disclosure: this post was not generated or assisted by any AI tool; it is the original work product of the author, Attorney Timothy P. Flynn.

Last week, the New York Times sued Open AI and its affiliated companies, along with Microsoft, alleging copyright infringement. The Times alleges that Open AI, in the development of its proprietary machine learning neural network, scraped millions of copyrighted Times' articles and other protected content from the Internet.

Further, the Times alleges that despite efforts to negotiate a resolution with the defendant companies, the paper is now owed money damages for the use of its content. This claim is based on the Times' allegations that they have made a significant investment, literally over centuries, in their news platform. Here is a link to the complaint that was filed in the United States District Court in Manhattan.

Core Allegations in the NYT Complaint

The first paragraph of the Times' complaint fittingly reads like a piece of finely-tuned journalism:

Independent journalism is vital to our democracy. It is also increasingly rare and valuable. For more than 170 years, The Times has given the world deeply reported, expert, independent journalism. Times journalists go where the story is, often at great risk and cost, to inform the public about important and pressing issues. They bear witness to conflict and disasters, provide accountability for the use of power, and illuminate truths that would otherwise go unseen. Their essential work is made possible through the efforts of a large and expensive organization that provides legal, security, and operational support, as well as editors who ensure their journalism meets the highest standards of accuracy and fairness. This work has always been important. But within a damaged information ecosystem that is awash in unreliable content, The Times’s journalism provides a service that has grown even more valuable to the public by supplying trustworthy information, news analysis, and commentary.

These are the self-proclaimed attributes the Times asserts it brings to world-class professional journalism:

Investigative reporting;
Breaking news reporting;
Beat reporting;
Reviews and analysis;
Commentary and opinion;
10.1 million digital and print subscribers worldwide;
250 articles publised every day; and
an unparalelled archive of content

At certain points, the complaint editorializes that the cost of the world class journalism the Times brings to the news-consuming public has drastically increased due to general AI products and that flood "today's information ecosystem". The complaint notes the hundreds of newspapers that have gone out of business as a direct result of the Internet and laments the open floodgates of "misinformation".

To protect its work product, the Times alleges that it has copyrighted every edition of its newspaper for over 100 years; it has deployed a paywall; and has strict licensing agreements.

Here are the counts of the complaint which lay out the legal theories of liability for the Defendant companies:

Count I - Copyright Infringment

Count II - Vicarious Copyright Infringment

Counts III and IV - Contributory Copyright Infringment

Count V - Violation of the Digital Millennium Copyright Act

Count VI - Unfair Competition by Misappropriation

Count VII - Trademark Dillution

In their prayer for relief, the Times seeks statutory and compensatory damages; disgorgement; an injunction against ChatGPT; destruction of all ChatGPT models that use NYT content in violation of their copyrights, and, of course, attorney fees.

The NYT is bringing the house in this suit. All within the context of Artificial Intelligence, Artificial General Intelligence, and machine learning.

Machine Learning Basics

The term "Artificial Intelligence" is one of the most grotesque misnomers of all time. Tech industry professionals eschew the term for the more proper: "machine learning".

Good old fashioned AI was a complex system of math-based rules. Then, sometime around the turn of the last Century, neural computing networks -computer networks designed to function more like a human brain- began to develop along with high-capacity supercomputers, giving birth to the new era of AI or, more appropriately, machine learning.

The idea behind machine learning is that language, through prompts, is broken down to its basic component parts -words and characters- and assigned numeric value. With massive computing capacity behind it, the machine then uses probability to determine an accurate, or humanly appropriate, output in response to a given prompt. In the coding and design process, various outputs are then ranked. Thus, through a series of prompts, the computer learns to provide a better, more responsive, higher quality output.

Another feature of modern machine learning is the large language model [LLM]. A massive amount of language data is stored in the machine from which it retrieves and fashions its natural language response to a specific set of prompts. While human users think up the prompts, its the machine that has total access to the massive LLM and vast stores of other data. Think in terms of the Library of Congress combined with every college library on the planet, and then some.

In the NYT copyright infringment lawsuit, the newspaper alleges that when Open AI's ChatGPT program scraped all manner of language data from the Internet, it swept proprietary NYT content along with it in the process. The Times further alleges that ChatGPT favors the NYT "style" of language as it lends itself to a highly ranked quality output. Makes sense when you think about it. If ChatGPT responds to a series of prompts in the manner of a seasoned NYT journalist, the AI user is ahead of the game.

One of the many interesting allegations contained in the NYT complaint is that ChatGPT's first two versions were constructed on open source platforms with detailed specifications made public. Not so with ChatGPT's third and fourth iterations, notes the Times. This is because, according to the newspaper, Open AI purposely concealed the data it copied from the Internet to train its latest computer models.

As an offer of proof set forth in the complaint itself, the NYT compares ChatGPT output and the text of a NYT article; the similarity is unmistakable. Plagiarism, says the Times. In another example, the times cites to a prompt complaining that the user was "paywalled out" of a specific NYT article, and asked ChatGPT to reproduce a portion of the article. The program complied with alacrity, reproducing the copyrighted and paywalled text.

Another very interesting offer of proof and allegation of injury is the Times assertion that ChatGPT committed what is known in AI parlance as "hallucination". Hallucination occurs when a machine, like a chatbot, generates seemingly realistic sensory experiences that do not correspond to real world input; "misinformation", says the Times. They cite to an example where the prompt seeks a reproduction of the sixth paragraph of a specific NYT article, referenced by date, title and author. The output, however, contains non-existent quotes and other text not found in the article. This has obvious implications to the Times' journalistic reputation and could lead to a claim for damages.

It will be interesting to see how Open AI and Microsoft respond to these highly specific allegations.

What's happening over at Open AI?

You may have heard about all the drama over at Open AI when they suddenly fired their CEO, Sam Altman last fall. Open AI originally started out as a non-profit, as noted in the NYT complaint. Their stated mission back in 2015 was to develop AI for the good of humanity, not to maximize profits. The company's board of directors had a distinctly non-tech world look; mostly academics and other non-profit professionals, except for Altman, whose tech credentials are solid.

Despite its stated mission, as the potential for this powerful computing tool came into better focus, Microsoft jumped aboard with billions of venture capital in exchange for a 49% ownership of Open AI's for profit subsidiary. Microsoft, with its myriad tech professional contacts, supported Altman's installation as CEO.

Last November, however, a giant board of directors misunderstanding led to Altman's firing amid great backlash by Open AI's employees and by Microsoft, its benevolent investor. Some key folks at Microsoft quietly, then not so quietly, reached out to the Open AI board and reinstalled Altman. All seems to be well for the moment. Then, last week, here comes the NYT lawsuit.

What's Next in the Lawsuit?

The Defendant companies now have the option to answer the Times' complaint or, in lieu of an answer, they can file a motion for summary judgment pursuant to the Federal Rules of Civil Procedure.

Given the recent board of directors drama, we will stay tuned to what Open AI and Microsoft do next. They need to focus on this lawsuit because if they lose, every content generator, including this 15-year old -nearly 650 post- blog, will have their collective hands out for a portion of Open AI's profits generated from our content.

With Manhattan as the venue, this lawsuit will feature a high tech litigation battle between some of the most sophisticated law firms in the world. Four Big Law firms representing the NYT hail from New York, Washington, DC, Seattle, and Los Angeles.

Post #637

www.clarkstonlegal.com

Labels: artificial intelligence, ChatGPT, copyright, federal court, large language model, lawsuit, machine learning, Microsoft, New York Times, Open AI, Sam Altman

Blogs > The Law Blogger

The Grey Lady Sues Open AI and Microsoft for Copyright Violations

Core Allegations in the NYT Complaint

Machine Learning Basics

What's happening over at Open AI?

What's Next in the Lawsuit?

Post a Comment

About Me

Other Oakland Press Blogs

Sections:

Marketplace

Services:

MICentral Network: