[OPR] Schneider: Rethinking Reference and Authorship: On the Philosophical Status of LLM-Generated Verbal Products

On this page you can download the discussion paper that was submitted for publication in the Journal for Media Linguistics. The blogstract summarises the submission in a comprehensible manner. You can comment on the discussion paper and the blogstract below this post. Please use your real name for this purpose. For detailed comments on the discussion paper please refer to the line numbering of the PDF.

Discussion Paper (PDF)

Blogstract of

Rethinking Reference an Authorship: On the Philosophical Status of LMM-Generated Verbal Products

by Jan Georg Schneider

In this article, the status of LLM-generated verbal products is discussed in principle. While we have traditionally been socialized to automatically assume that there is an intelligent author ‚behind‘ verbal products that can be read as intelligent, we can no longer simply presuppose this close connection in the age of LLMs. In this sense, I name LLM-generated products ‚intelligible textures‘. As products, these intelligible textures can hardly, if at all, be distinguished from authorized, human-created texts, but the learning and usage processes differ fundamentally, especially with respect to reference acts. What consequences does this have for our understanding and general conception of the written word, as well as for our notion of authorship and related questions of responsibility for verbal products?

This fundamental question is discussed in the present article using the example of LLM-generated essay evaluations. In March 2025, I ran a test with ChatGPT4o to see if it can be useful for grading an essay. My test strategy was to deliberately change a standard essay to make it worse and then have it evaluated by the system. Although many other text types could be used here, essay evaluation is particularly suitable because it requires a high degree of judgment as well as reference to other texts, namely those to be evaluated, and also to their truthfulness.

Theoretically, the central problem of reference is discussed using Austin’s speech act theory as well as the concept of exemplification and denotation developed by Goodman and Elgin, which I consider very applicable in this research context. Following Goodman and Elgin, exemplifying and denoting are the two basic modes of reference acts. When exemplifying, people use something as an example of a ‚label‘ and emphasize certain relevant properties of it: a fabric sample, for instance, can be used as a sample for a type of fabric, whereby individual properties are emphasized as relevant: for example, the colour and the softness, but not the price or the date of manufacture. In that way, the sample itself becomes a symbol or sign. A sample is always a sample for someone in a concrete situation, and every individual process of symbol interpretation takes place within the framework of a customary practice. Denoting and exemplifying are mutually dependent on each other, because only through exemplifying in concrete situations do communicative practices arise in which, in turn, denoting takes place, that is, referring with a symbol to something in the world. Since this cultural anchoring is completely absent in machine learning, computers, from a philosophical point of view, do not perform any acts of reference. What does this mean for the status of LLM-based verbal products?

September 24, 2025September 24, 2025

4 Replies to “[OPR] Schneider: Rethinking Reference and Authorship: On the Philosophical Status of LLM-Generated Verbal Products”

Sarah BrommerOktober 20, 2025 at 12:38Reply

Lieber Jan,
danke für Deinen Beitrag; ich habe ihn mit Interesse gelesen.
Vorneweg: Ich habe mich aufs Inhaltliche beschränkt und nichts sprachlich-stilistisch angemerkt; dafür reicht mein Englisch nicht aus. Wenn Du hierzu noch eine Rückmeldung möchtest, müsste das jemand anderes übernehmen.
Zum Inhaltlichen: Ich sehe keinen großen Überarbeitungsbedarf; der Aufsatz ist insgesamt schön rund und konsistent. Punktuell habe ich kommentiert, die Anmerkungen spiegeln aber eher meinen Leseeindruck/meine Gedanken wieder, als dass es Korrekturen wären. Mir scheint, Deine Überlegungen tangieren drei Problemfehler, die meines Erachtens noch stärker herausgearbeitet werden könnten:
Punkt 1: Die „Performance“ von ChatGPT (so nennen Karina und ich das in unserem Beitrag, siehe Anm. im Text): Du gehst sowohl auf die produktiven wie auch die rezeptiven/evaluativen Fähigkeiten von KI/ChatGPT ein. Das liegt für mich auf verschiedenen Ebenen. (Warum) braucht es in Deinem Fall den Blick auf das Textgenerieren von ChatGPT, wenn es doch um die Beurteilung geht? Wie hängt das eine mit dem anderen zusammen?
Punkt 2: KI-Output, auf der Oberfläche betrachtet: Wie ordnet ChatGPT das ein, was er(/sie/es) macht? Mit welchen Verben hantiert ChatGPT bzw. wie bezeichnet ChatGPT sein Handeln? Ich finde es schon spannend (und letztlich bedenkenswert), dass ChatGPT sein Texturteil ähnlich begründet, wie es ein Mensch tun würde (Der Text ist konsitent, weil… Die Argumentation ist nachvollziehbar, weil …). Auf der Oberfläche betrachtet, könnte man annehmen, dass ChatGPT in ähnlicher Weise zu seinem Urteil gelangt wie der Mensch (zumindest verkauft ChatGPT sein Urteil so, als wäre es auf diese Weise entstanden).
Punkt 3: KI-Output, betrachtet mit Blick auf die dahinterliegenden Prozesse: Tatsächlich gelangt ChatGPT auf völlig anderem Weg zu seinem Urteil. Das zeigst Du ja auch deutlich auf. Ich finde aber, dass der Kontrast zu Punkt 2 deutlicher gemacht werden könnte. Denn liegt das Problem nicht vor allem darin, dass ChatGPT so tut, als ob 2, aber letztlich funktioniert wie 3? Letztlich ist es vor allem ein Problem der Transparenz, oder?
Dies nur als Anregung…
Herzliche Grüße, Sarah
Johannes LenhardOktober 21, 2025 at 08:41Reply

Large language models (LLMs) are sweeping across many types of text generation like a tsunami. Academia, both students and teachers, are especially enthusiastic and at the same time overwhelmed. We are all called upon to reflect on the consequences of using LLMs and, in particular, to consider whether and how the status of seminal concepts could change. In this text, Schneider takes aim at reference and authorship; his work is timely and relevant.
In the existing literature, two strategies can be distinguished that study LLMs from opposite directions. The top-down approach starts from general principles and then identifies the limitations of LLMs. The bottom-up approach observes capabilities of LLMs in specific examples and then asks whether and how these capabilities call for adjustments to the underlying concepts.
Schneider pursues both strategies. The first part (section 2) analyzes a case study on writing and correcting essays with ChatGPT “bottom-up”, while the second part takes a top-down approach, starting from a philosophical consideration of reference and judgment, which AI does not do justice to. Schneider’s statements on the astonishing, albeit limited, possibilities in the first part and on the problems regarding reference and authorship in the second part are clear and supported by convincing arguments. However, elaborating on the relationship between the bottom-up and top-down parts of the text would be, or so I believe, a step toward clarifying what is meant by the intended “re-thinking” of fundamentals. I have broken down my suggestions for clarification into a small series of three interdependent points, an extra, and a final note.
1/3 In what sense can LLMs replace?
The situation resembles a Turing-style imitation game when Schneider asks whether an LLM “can replace” a human teacher’s essay rating. He starts out reporting earlier work with Zweig on a product named “e-rater” for this purpose. This tool uses word statistics of an essay to predict the grade that a teacher (likely) would give. By design, the e-rater cannot give a justification of this grade and, thus, does not actually rate because rating would necessarily include the ability to justify.Now, an LLM like ChatGPT can do more: writing and correcting an essay. Schneider attests to an “impressive performance”. But still, can it replace?
This question is seriously ambiguous. Is it the student who must be satisfied, for instance by getting a plausible justification (even if an LLM creates this justification differently from a human teacher)? Or is it the philosopher who requires that any replacement must be near-equivalent?
If something is achieved and a potential replacement (an essay, its correction, its grade) to what extent does this presume that the product must be created by the same mechanisms? Turing famously wants to be very liberal on this condition. Schneider at some point agrees, but at others seems to insist that differences in mechanism matter. In other words, is “near-equivalent replacement” a mode for bottom-up? Or rather a back door for top-down – if some principle[!] shows that the replacement cannot be equivalent?
Schneider is cautious when he assesses the capabilities of LLMs. And I agree that the evaluation of the quality of an essay requires knowledge. However, I wish for a more articulated and possibly sharper conclusion. The last sentence of section 2, for instance, mentions an unreported experiment that leads to a more negative outcome. If this has weight, it should be reported in detail. What was learned from the bottom-up strategy?
2/3 The strength of exemplication
The linguistic and philosophical terminology is rich with concepts that illustrate the difference between humans and LLMs. The text makes apt use of some of these, such as context vs. co-context, text vs. texture, rhetorical act, and phatic act à la Austin. A special focus is on exemplification as a crucial part of dealing with reference. I find this an excellent choice, because exemplification does not decide the matter (can LLMs replace?) right from the start. When Schneider remarks that “the word hand can exemplify the word noun”, this kind of exemplification seems, in principle, not beyond reach for LLMs. The interesting point is that there are different sorts of exemplification. LLMs are skilled in one sort: they are able to mass statistically exemplify, as Schneider aptly points out. Quite different from humans who exemplify with intentionality, which is inaccessible to LLMs because they lack intentionality.
Why focus on exemplification? The fruitful question seems to be what distinguishes two types of exemplification, what one type can achieve and the other not. But Schneider does not venture into this question, instead notes that LLMs lack intentionality. This is correct, but why then bring in exemplification at all, when the lack of intentionality is the superior point?
3/3 Intentionality and gaming
Intentionality is a key concept in phenomenology, which can be considered to be the most advanced manifestation of subject-philosophical epistemology. Language philosophy brings to the fore a decidedly social and pragmatic component of epistemology. Schneider is quite ecumenical when he supposes that an argumentation via Wittgenstein would run in parallel to the one via intentionality: LLMs fundamentally miss language game competence. This is an interesting point that is not immediately clear to me. I’d like to see a somewhat elaborated argumentation here. Wouldn’t the notion of “game” invite an exploratory venture into statistics? Like the student playing an essay-justification game with the teacher? Or with an LLM? Maybe a game where the LLM can “near-equivalently replace” the teacher?
At the very end, Schneider seems to converge to this point, when he asks where mass syntactical exemplification is sufficient to compensate. Yes, I’d like to shout. This is the question of part one. Here, the re-thinking starts. However, the text ends.
Extra
Schneider is bringing in Kant who argued that the “free use of judgment” is necessary for any intelligent decision. This is a strong point, but independent of recent examples. Isn’t this more a reminder that one should not be impressed by successes of LLMs? Like one would argue: do not re-think, rather stick to the fundamental concepts that remain unaffected by LLMs.
Finally
Schneider highlights an issue that I find most important: responsibility. LLM-generated texts seem to miss an author. Consequently, nobody takes responsibility for what is written. Here, the philosophical and the political meet. Who is responsible for some product or action is a question of societal practice. This practice should soon and urgently adapt to the use of LLMs. Non-responsibility, I would say, serves sinister interests.
Recommendation
This is a rich and timely text that I recommend to accept with minor revisions. Actually, none of the three points I raised strictly must be addressed in a revision. It is more that I am curious how the author might reply.
When writing some reply, I would recommend to also address the following minor points.In section 3.3, it appears to me that it is the embedding that defines or induces similarity. That point could be articulated more clearly. Also, “the” certain independence should likely be “a” certain independence.
In section 3.4, the claim about new or rare assertions and about missing taste comes a bit unprepared. Maybe motivate by an example? Same passage: is “significant lack of judgement” – a call for bottom-up? If so, maybe an example would be instructive. Or is it argued top-down? In the sense that there cannot possibly be judgement, therefore the lack is significant?
In section 4, the claim that LLMs are “prone to error” is again a bit unprepared, even seems to run against the findings in section 2.In section 4, “Hence … Fuchs is right…”. To me, the logic that the text pursues seems to be reverse: Fuchs is right, hence ChatGPT is no more intelligent.. and has no judgment.
Similarly, or so it seems to me, Schneider’s investigation does not so much support the findings of Fuchs, rather, he assumes Fuchs is right and presents an argument analogous to Fuchs’ in the fashion of linguistics.
Jan Georg SchneiderNovember 26, 2025 at 13:43Reply

Danke, liebe Sarah, für Dein Gutachten, insbesondere auch für die zusätzlich gesendete kommentierte Fassung, die sehr hilfreich war und von der ich einiges aufgenommen habe; u.a. habe ich den berechtigten Hinweis umgesetzt, dass es sich beim Problem von Referenz und Autorschaft nicht um ein, sondern um mindestens zwei, ja sogar mehrere Probleme handelt, sodass ich hier nun den Plural verwende. Unter anderem habe ich auch über Deinen Hinweis nachgedacht, dass es hier – in Anbetracht der Leserschaft – vielleicht nicht nötig ist, basal darüber zu schreiben was ein GPT ist. Ich habe die kurze Stelle aber dann doch beibehalten, a) weil der Text für möglichst viele rezipierbar sein soll und b) weil ich ja schon bei dieser Erläuterung den rhetischen Akt ins Spiel bringe, der nicht unbedingt als bekannt vorausgesetzt werden kann.
Herzliche Grüße
Jan
Jan Georg SchneiderNovember 26, 2025 at 13:48Reply

Dear Johannes,
thanks for your interest and inspiring comments! There are so many aspects I would like to consider. In this answer I will mention 4 of them:
1) You would like to see a clearer and possibly more concise conclusion at the end of section 2. I think this conclusion would be even more effective if it were formulated after section 3. I have therefore placed the following sentence fairly close to the beginning of section 4: „Since the system operates on a purely statistical basis, it does not check its own output on its own ‘initiative’; this must always be done by a human being as a last instance, and thus the LLM chatbot cannot substitute a human evaluation.“
2) Inspired by your comments, I have added a sentence to clarify what I mean by the term “language game competence.” I could elaborate on this, but then the section would probably become too long.
3) It is very interesting that you suggest to distinguish between two types of exemplification, which I would spontaneously characterize as ‚intralinguistic‘ versus ‚contextual‘. Since I do not make this distinction in the article, but diagnose a lack of intentionality on the part of the chatbot, you seem to think that I ultimately consider intentionality to be the crucial point. I would like to disagree with that. Although intentionality always plays a decisive role, I consider the concept of exemplification to be crucial, as it highlights the importance of practice-boundness (including referentiality) as the most important difference.
3) Your comment on my reference to Kant’s concept of judgment triggered a process of reflection regarding the title of my article. You write that the reference to Kant’s concept of judgement is a strong point here and you immediately ask whether this is not more of a reminder that one should not be impressed by successes of LLMs. „Like one would argue: do not re-think, rather stick to the fundamental concepts that remain unaffected by LLMs.“ I think you are right. The main point is that the way LLMs work can remind us how robust established philosophical concepts (such as ‚judgment‘) are and how important they remain today. In my opinion, however, this is also a kind of ‚re-thinking‘. Nevertheless, the statement quoted above made me think about the title of this article and ultimately led me to make the more reflective decision to keep it after all. What else could I expect from a review? Thank you (see also footnote 12).

Blogstract of

4 Replies to “[OPR] Schneider: Rethinking Reference and Authorship: On the Philosophical Status of LLM-Generated Verbal Products”

Leave a Comment Antwort abbrechen