LLMs unleashed: Navigating the chaos of online experimentation


Are you ready to bring more awareness to your brand? Consider becoming a sponsor for The AI Impact Tour. Learn more about the opportunities here.

In an audacious move that defies conventional wisdom, generative AI companies have embraced a cutting-edge approach to quality assurance: Releasing large language models (LLMs) directly into the wild, untamed realms of the internet.

Why bother with tedious testing phases when you can harness the collective might of the online community to uncover bugs, glitches and unexpected features? It’s a bold experiment in trial by digital fire, where every user becomes an unwitting participant in the grand beta test of the century.

Strap in, folks, because we’re all on this unpredictable ride together, discovering LLMs’ quirks and peculiarities one prompt at a time. Who needs a safety net when you have the vast expanse of the internet to catch your errors, right? Don’t forget to “agree” to the Terms and Conditions.

Ethics and accuracy are optional

The chaotic race to release or utilize gen AI LLM models seems like handing out fireworks — sure, they dazzle, but there’s no guarantee they won’t be set off indoors! Mistral, for one, recently launched its 7B model under Apache 2.0 licenses; however, in the absence of explicit constraints, there is a concern regarding the potential for misuse. 

VB Event

The AI Impact Tour

Connect with the enterprise AI community at VentureBeat’s AI Impact Tour coming to a city near you!


Learn More

As seen in the example below, minor adjustments of parameters behind the scenes can result in completely different outcomes. 

Biases embedded in algorithms and the data they learn from can perpetuate societal inequalities. CommonCrawl, which uses Apache Nutch based web-crawler, constitutes the bulk of the training data for LLMs: 60% of GPT-3’s training dataset and 67% of LLaMA’s dataset. While highly beneficial for language modeling, it operates without comprehensive quality control measures. Consequently, the onus of selecting quality data squarely falls upon the developer. Recognizing and mitigating these biases are imperative steps toward ethical AI deployment.

Developing ethical software should not be discretionary, but mandatory. 

However, if a developer chooses to stray from ethical guidelines, there are limited safeguards in place. The onus lies not just on developers but also on policymakers and organizations to guarantee the equitable and unbiased application of gen AI. 

In Figure 3, we see another example in which the models, if misused, can have potential impacts that may go far beyond the intended use and raise a key question:

Who is liable?

In the fantastical land of legal jargon where even the punctuation marks seem to have lawyers, the terms of services loosely translate to, “You’re entering the labyrinth of limited liability. Abandon all hope, ye who read this (or don’t).”

The terms of services for gen AI offerings neither guarantee accuracy nor assume liability (Google, OpenAI) and instead rely on user discretion. According to a Pew Research Center report, many users of these services are doing so to learn something new, or for tasks at work and may not be equipped to differentiate between credible and hallucinated content. 

The repercussions of such inaccuracies extend beyond the virtual realm and can significantly impact the real world. For instance, Alphabet shares plummeted after Google’s Bard chatbot incorrectly claimed that the James Webb Space Telescope had captured the world’s first images of a planet outside of our solar system.

The application landscape of these models is continuously evolving, with some of them already driving solutions that involve substantial decision-making. In the event of an error, should the responsibility fall on the provider of the LLMs itself, the entity offering value-added services utilizing these LLMs, or the user for potential lack of discernment?

Picture this: You’re in a car accident. Scenario A: The brakes betray you, and you end up in a melodramatic dance with a lamppost. Scenario B: You, feeling invincible, channel your inner speed demon while DUI and bam! Lamppost tango, part two.

The aftermath? Equally disastrous. But hey, in Scenario A, you can point a finger at the car company and shout, ‘You let me down!’ In Scenario B, though, the only one you can blame is the person in the mirror — and that’s a tough conversation to have. The challenge with LLMs is that brake failure and DUI may happen simultaneously.

Where is ‘no-LLM-index’

The noindex rule, set either with the meta tag or HTTP response header requests the search engines to drop the page from being indexed. Perhaps, a similar option (no-llm-index) should be available for content creators to opt out of LLMs processing. LLMs are not compliant with the requirements under California Consumer Privacy Act of 2019 (“CCPA”) request to delete or GDPR’s right to erasure.

Unlike a database, in which you know exactly what information is stored and what should be deleted when a consumer requests to do so, LLMs operate on a different paradigm. They learn patterns from the data they are trained on, allowing them to generate human-like text.

When it comes to deletion requests, the situation is nuanced. LLMs do not have a structured database where individual pieces of data can be selectively removed. Instead, they generate responses based on the patterns learned during training, making it challenging to pinpoint and delete specific pieces of information.

A pivotal moment in the legal sphere occurred in 2015 when a U.S. appeals court established that Google’s scanning of millions of books for Google Books limited excerpt of copyrighted content constituted “fair use.” The court ruled that scanning of these books is highly transformative, the public display of the text is limited and the display is not a market substitute for the original. 

However, gen AI transcends these boundaries, delving into uncharted territories where legal frameworks struggle to keep pace. Lawsuits have emerged, raising pertinent questions about compensating content creators whose work fuels the algorithms of LLM producers.

OpenAI, Microsoft, Github, and Meta have found themselves entangled in legal wrangling, especially concerning the reproduction of computer code from copyrighted open-source software. 

Content creators on social platforms already monetize their content and the option to opt-out versus monetize the content within the context of LLMs should be the creator’s choice.

Navigating the future

Quality standards vary across industries. I have come to terms with my Amazon Prime Music app crashing once a day. In fact, as reported by AppDynamics, applications experience a 2% crash rate, although it is not clear from the report if it includes all the apps (including Prime Music?) or the ones that are AppDynamics customers and care about failure and still exhibit a 2% crash rate. Even a 2% crash rate in healthcare, public utilities or transportation would be catastrophic.

However, expectations regarding LLMs are still being recalibrated. Unlike app crashes, which are tangible events, determining when AI experiences breakdowns or engages in hallucination is considerably more challenging due to the abstract nature of these occurrences. 

As gen AI continues to push the boundaries of innovation, the intersection of legal, ethical and technological realms beckons comprehensive frameworks. Striking a delicate balance between fostering innovation and preserving fundamental rights is the clarion call for policymakers, technologists and society at large.

China’s National Information Security Standardization Technical Committee has already released a draft document proposing detailed rules on how to determine the issues associated with gen AI. President Biden issued an Execute Order on Safe, Secure and Trustworthy AI, on and the assumption is that other government organizations across the world will follow suit.

In all honesty, once the AI genie is out of the bottle, there’s no turning back. We’ve witnessed similar challenges before — despite the prevalence of fake news on social media, platforms like Facebook and Twitter have managed little more than forming committees in response.

LLMs need a vast amount of training data and the internet just gives that up — for free. Creating such extensive datasets from scratch is practically impossible. However, constraining the training solely to high-quality data, although challenging, is possible, but might raise additional questions around the definition of high-quality and who determines that.

The question that lingers is whether LLM providers will establish committee after committee, pass the baton to the users — or, for a change, actually do something about it.

‘Till then, fasten your seat belt. 

Amit Verma is the head of engineering/AI labs and founding member at Neuron7.


Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing an article of your own!

Read More From DataDecisionMakers


Source link