Stop the Misinformation and Fearmongering: AI Companies Need to License the Content They Use for Training

In a recent Thought Leadership piece on Techcouver, “Don’t Scare AI Companies Away, Canada – They’re Building the Future”, Alistair Vigier, CEO of CasewayAI, a self-described “legal technology company that uses AI to make the law easier”, drew a false dichotomy between the legitimate steps taken by content creators to protect their rights under the law and the Canada’s ability to innovate and grow its AI industry.

In doing so, he repeated misinformation about how AI training deals with copyrighted content and raised the false spectre that investment and innovation could flee Canada because of recent lawsuits, one of which targets his company.

He states that a recent lawsuit brought by a consortium of Canadian media companies against OpenAI is all based on a misunderstanding of how AI works. In his op-ed, he claims that “AI systems like OpenAI rely on publicly available data to learn and improve. This does not equate to stealing content.” Moreover, he equates the ingestion of content by various AI development models with a human being reading a book. “This is how AI works. The AI “reads” as much as it can, gets really “smart,” and then explains what it knows when you ask it a question. Like a human learns from reading the news, so does an AI.”

This is a fundamental misunderstanding of the nature of copyright law, and how AI deals with content used in training. The fact that content may be “publicly available” is irrelevant when it comes to unauthorized copying. A book in a library is “publicly available”, but that confers no right to reproduce the book. Copying selected excerpts may be permissible under Canada’s fair dealing provisions, but we are talking about the holus-bolus copying of full texts, images, musical scores and whatever else the AI vacuum sucks up. As for the analogy with the human brain, a brain absorbs content and learns from it, but it does not make a copy.

Mr. Vigier implies that no reproduction or copying occurs during the AI training process. “OpenAI’s models do not reproduce articles verbatim; they process vast datasets to identify patterns, enabling insights and efficiency.” The AI industry continues to muddy the waters by claiming that when content is “ingested” it is converted to numeric data and is thus not actually copied, but it is well established that in “processing” content to convert it to data, copies are made. Moreover, the New York Times in its separate suit in the US against OpenAI has demonstrated that by typing in leads of articles, it can prompt OpenAI to reproduce verbatim the rest of the article. OpenAI’s response was to claim that the Times had “tricked” the AI. Converting copied content to another form, such as from hard copy text to a digital version, or to a dataset, is still copying. The Copyright Act is explicit that copyright confers “the sole right to produce or reproduce the work or any substantial part thereof in any material form whatever”.

So much for the misinformation. Now for the scare tactics. He suggests that lawsuits, such as the one against his own company by the Canadian Legal Information Institute (CanLII), which is suing CasewayAI for copyright infringement and violation of its Terms of Use, and is seeking an injunction against CasewayAI using the content in question, plus damages, will “stifle innovation”. He says, “The lawsuits against Caseway and OpenAI message tech companies: you’re not welcome here. If this continues, Canada won’t just lose its AI startups; it will lose the future of job creation.”

As tendentious and self-interested as this statement is, it is also not borne out by the evidence. Currently there are more than 30 lawsuits in the US pitting content companies against AI developers in an attempt to clarify the intent and meaning of the law. There are similar lawsuits in the UK, the EU, even in India! Yet, as far as I know innovation and AI development is proceeding apace in those areas. Mr. Vigier seems to feel the competition comes from countries where “laws and courts are more innovation-friendly”. Countries like Dubai and the Bahamas.

This is not only patently ridiculous but fearmongering of the worst kind, based on an inaccurate and misinformed knowledge of how AI is developed and trained. Moreover, it impugns the legitimate right of a rightsholder to seek the protection of the law to protect their creativity and investment in content. Robust AI development needs to go hand in hand with robust copyright protection for creators, with an appropriate sharing of the spoils of the new wealth created from the creative output of authors, artists, musicians and other rightsholders.

That leads to licensing solutions, but licensing solutions will only work when fair compensation is paid for the taking of proprietary content for use in developing a commercial application. Licensing is at the heart of the media lawsuits against OpenAI in both the US and Canada. When licensing negotiations break down, court action is usually the sequel.

To suggest that enforcement of rights under the law by content owners will hamper development of AI and job creation in Canada is to stand the world on its head. Assertion of rights by rightsholders—based on undisputed evidence that material is being copied without permission– will facilitate the establishment of licensing regimes that work for both parties. Misinformation and fearmongering do not serve the interests of either the content or the AI industry in Canada.

Hugh Stephens is Editor of the award winning International Copyright blog and author of the recently published book In Defence of Copyright published by Cormorant Books.

Stop the Misinformation and Fearmongering: AI Companies Need to License the Content They Use for Training

Comments

Leave a Reply Cancel reply

Reader Interactions

Comments

Leave a Reply Cancel reply