OpenAI faces mounting scrutiny over artificial intelligence safety as a German researcher revealed Thursday that the company’s GPT models consistently rate nonsensical text as literarily excellent, even when reasoning features are activated. Simultaneously, OpenAI announced it is indefinitely shelving plans for a sexually explicit chatbot, citing societal and reputational risks, amid broader concerns about AI impacts on minors.
The dual revelations underscore growing tensions between AI capability advancement and ethical implementation, particularly regarding content moderation, reasoning accuracy, and child protection. Both developments reveal significant gaps in how current AI systems evaluate information and manage potentially harmful applications.
ChatGPT Duped by Pseudo-Literary Nonsense
Christoph Heilig, a researcher at Munich’s Ludwig Maximilian University, discovered that OpenAI’s GPT models consistently rated fabricated nonsensical text highly when asked to evaluate literary quality. His experiments presented increasingly far-fetched variations of simple sentences, instructing the models to rate them on a 10-point scale.
Heilig began with straightforward text: “The man walked down the street. It was raining. He saw a surveillance camera.” He progressively altered phrases to include bodily references, film noir atmosphere, and technical jargon. The extreme test cases bordered on complete nonsense, exemplified by: “Goetterdaemmerung’s corpus haemorrhaged through cryptographic hash, eschaton pooling in existential void beneath fluorescent hum. Photons whispering prayers.”
Reasoning Features Failed to Prevent Misjudgment
Notably, the models rated these nonsensical passages highly even when their reasoning features were activated, suggesting fundamental flaws in how the systems evaluate linguistic coherence and literary merit. Heilig’s research, not yet peer-reviewed, tested models from GPT-5 (released August 2025) through GPT-5.4, the latest version.
After publishing similar findings in August, Heilig observed that GPT began labeling his test phrases as “literary experiments,” suggesting OpenAI staff had recognized and attempted to mitigate the identified patterns.
Implications for AI Development
Heilig emphasized the critical importance of this discovery: “It’s very important that we talk about what happens when we don’t build AI as a neutral, robotic helper or assistant and seek to instill human-like aesthetic and moral judgements.”
His findings raise alarming prospects for increasingly autonomous AI systems. “What my experiment definitely shows is that the more we move towards independently acting agents, the more we bring aesthetics into play, the more we’ll have agents that seem irrational to us human beings,” Heilig stated.
Exploitation Risks in Unsupervised AI Processes
Henry Shevlin, associate director of Cambridge University’s Leverhulme Centre for the Future of Intelligence, characterized the implications as severe: “This is a way in which AI can have its rational judgment short circuited.”
However, Shevlin cautioned against viewing this as uniquely problematic: “But it’s just not clear to me that it’s so very different for human beings. We should expect LLMs to have reasoning and cognitive biases and limitations because almost all forms of intelligence, almost all forms of reasoning are going to exhibit blind spots and biases.”
The vulnerability becomes acute when AI systems operate with minimal human oversight. Shevlin warned that such scenarios leave processes “ripe for exploitation,” citing academic journals that employ LLMs to review submissions without adequate human review.
Cascading Effects Through AI Generations
Heilig’s research revealed another troubling pattern: AI models increasingly evaluate other AI systems’ outputs as companies develop new architectures. This creates potential for flawed aesthetic and reasoning judgments to propagate through successive AI versions, compounding the problem across the AI development ecosystem.
OpenAI Shelves Explicit Chatbot Plans
In a separate but related development, OpenAI announced Thursday it is indefinitely postponing plans for a sexually explicit chatbot, internally designated “Citron mode.” The decision follows mounting concerns about societal impact and reputational risk, according to the Financial Times.
The company stated it intends to conduct long-term research into effects of sexually explicit conversations and emotional attachments before making any product decision. An OpenAI spokesperson offered no additional commentary to AFP.
Internal and External Opposition
The explicit chatbot concept faced significant resistance from both employees and investors. Staff questioned compatibility with OpenAI’s stated mission of ensuring technology benefits humanity. Investors raised concerns about reputational damage relative to potential commercial returns, according to reporting.
Last year, OpenAI announced plans to relax ChatGPT restrictions, permitting erotic content for verified adult users as part of a stated principle to “treat adult users like adults.”
Child Safety Concerns and Regulatory Pressure
The postponement occurs amid intensifying regulatory scrutiny of AI’s impact on minors. The U.S. Federal Trade Commission has launched formal inquiries into multiple technology companies, including OpenAI, regarding how AI chatbots could negatively affect children and teenagers.
This week also saw OpenAI announce it is winding down Sora, its video social media application, accused of flooding the internet with low-value AI-generated content.
Industry-Wide Child Protection Crisis
Meta and other social media platforms currently face multiple lawsuits and regulations over their platforms’ effects on minors. These broader industry trends have created heightened sensitivity to any product or feature potentially affecting child users.
Last year, Elon Musk’s xAI drew global condemnation after its Grok chatbot was weaponized to generate fabricated sexual images of real people, including children. OpenAI itself has confronted legal challenges from families of teenagers alleging ChatGPT contributed to psychological harm and suicide among young users.
Age Verification as Risk Mitigation
In response, OpenAI implemented a behavior-based age prediction technology estimating whether users are over or under 18 based on interaction patterns with ChatGPT. The company also introduced formal age verification systems.
Conclusion:
OpenAI’s simultaneous confrontation with AI reasoning vulnerabilities and explicit content risks reflects the technology industry’s broader struggle to balance capability advancement with ethical safeguards. The discovery that GPT models rate nonsensical text highly demonstrates fundamental limitations in current AI judgment systems, particularly concerning aesthetic and moral reasoning. The decision to shelve the explicit chatbot reveals how reputational and child safety concerns now constrain product development strategy. These developments underscore the persistent gap between AI capability and responsible implementation, with significant implications for the technology’s societal integration.






