[OPR] Lotze/Greilich: Semantic Coherence and Topic Continuity in Interactions with AI

On this page you can download the discussion paper that was submitted for publication in the Journal for Media Linguistics. The blogstract summarises the submission in a comprehensible manner. You can comment on the discussion paper and the blogstract below this post. Please use your real name for this purpose. For detailed comments on the discussion paper please refer to the line numbering of the PDF.

Discussion Paper (PDF)

Blogstract of

Semantic Coherence and Topic Continuity in Interactions with AI

by Netaya Lotze & Anna Greilich

Human-Computer Interaction (HCI) has evolved dramatically with advances in artificial intelligence, particularly with the advent of large language models (LLMs) and generative transformer models (GPTs). Despite these technological strides, the linguistic quality of AI dialogue systems remains inconsistent, often exhibiting surface-level cohesion without genuine semantic coherence. This discrepancy poses challenges for the usability and reliability of AI systems, especially in conversational contexts where maintaining topic continuity is paramount.

In human-human communication (HHC), interlocutors effortlessly draw on shared knowledge as common ground (Stalnaker 2002) and complex semantic relations to ensure coherent interaction (Brinker/Hagemann 2001). In contrast, AI systems frequently produce responses that appear linguistically cohesive but lack deeper coherence, a phenomenon I have previously termed quasi-coherence (Lotze 2016, 2025). The recent AI discourse labels similar system behaviours as hallucinations, highlighting a persistent gap between linguistic form and semantic function. We do not define hallucinations and quasi-coherence as synonymous, but rather as partially overlapping concepts that share certain intentions while diverging in their analytical scope — a distinction we elaborate on in detail throughout this article.

Therefore, we to this day propose three forms of semantic coherence in HCI: (1) user-maintained logical coherence, (2) system-generated illusions of coherence through successful keyword parsing or LLM generating, and (3) quasi-coherence marked by surface cohesion without meaningful semantic coherence (Lotze 2016, 289).

This article aims to explore semantic coherence (Hallyday/Hasan 1976) and topic continuity (Givón 1983) within HCI by applying a multidisciplinary methodological framework (following Lotze 2020). We contrast classic rule- and plan-based chatbot architectures with modern voice user interfaces (VUIs), such as Amazon Alexa, and earlier generative models exemplified by ChatGPT3. By investigating these systems through qualitative and quantitative lenses, we seek to elucidate the linguistic mechanisms underlying coherence and its breakdowns in AI dialogue.

Our analysis draws upon three empirical studies from our research group, encompassing written and oral HCI across different interaction scenarios. Through this comprehensive examination, we aim to advance theoretical understanding of semantic continuity in HCI and inform the development of more coherent and user-responsive dialogue systems (following Lotze 2020).

3 Replies to “[OPR] Lotze/Greilich: Semantic Coherence and Topic Continuity in Interactions with AI”

  1. RedaktionApril 28, 2026 at 14:22Reply

    Review for the submission

    Semantic Coherence and Topic Continuity in Interactions with AI

    reviewed by Doris Dippold

    Recommendation: major revisions

    Fit with the journal’s scope

    The present paper fits the scope of the Journal für Medienlinguistik well. It investigates how language and communication changes due to the influence of media, in this case chatbots. The overall focal point of ‚coherence‘ is an innovative way of looking at user-chatbot dialogue, and the article draws on empirical evidence from a wide range of different chatbot types.

    Originality and innovation

    The originality of this article is in the application of the concepts discussed, in particular ‚quasi-coherence‘ to different types of human-machine interactions / human-AI interaction. In particular, the paper does not only look at text-based, but also at voice-based interactions. I do feel though that the paper misses a trick in its conclusion. Given the wide-ranging insights gained through empirical analysis, what are the implications for practice? By this, I mean the design of ’special purpose‘ chatbots (e.g. appointment booking, customer complaints etc.)? Or the design of home assistants such as Alexa? 

    However, the theoretical conclusions drawn through the empirical analyses are sound and interesting. I enjoyed reading your discussion of the CASA / MASA concept through the eyes of your data and conceptual framework and follow your conclusions that quasi-coherence “challenges the fundamental communicative assumptions that typically govern social interaction”. Given these insights, it would be even more critical to hear about your implications for practice.

    Methodological appropriateness and implementation

    I was somewhat surprised about the methodological grounding of the paper in Conversation Analysis, given that HCI is also defined as ‘emerging sociolinguistic’ practice. The sociolinguistic framing suggests a lack of fit to a CA framework, which looks at interactional mechanisms in a local context. The analysis in this paper fit this CA framework well, but I am less sure about the sociolinguistic framing.

    Plausibility and coherence of the argumentation

    The overall line of argument of this article is suitable, but I have some concerns relating to content as well as the coherence of the argument. I will list them here in bullet point form.

    • in the introduction, the authors argue that „LLMS have evolved from so-called „stochastic parrots […] into highly responsive, user-friendly and – most notably – context-sensitive dialogue partners“. This argument requires some more nuance. Whilst there is no need to follow Bender’s argument with Bender’s own level of determination, LLMs still create responses based on statistical probability. The notion of ‚context-sensitivity‘ is also questionable, as research has repeatedly shown that the ‚context‘ LLMs follow is often western-centric. It is also interesting that, after Figure 3, you explicitly say that generative transformer models have difficulties in distinguishing between statistically probably responses and contextually appropriate ones – this somewhat contradicts your earlier argument.
    • The notion of ‚quasi-coherence‘ is first mentioned before Example 1 but only defined much later (after Figure 1). The article would benefit from featuring this definition earlier.
    • I am not sure what ‚HCI‘ refers to – the term is never defined or introduced. Do the authors mean ‘Human-Computer-Interaction’? If so, would HMI (Human-Machine-Interaction) or HAI (Human-AI-Interaction) be a better term? The term needs to be defined at the very least but should be discussed more critically too.
    • Similarly, the notion of ‘context’ requires better definition. When, for example, at the end of section 2’, you say that your findings can inform the design of more responsive, context-aware conversational agents, do you mean the local micro-context, or the social macro-context of the interaction? Probably not the latter, but say it more explicitly.
    • Figure 7 – I don’t understand Figure 7. What are these ‘socialbots’, are these the ‘newer systems’ you refer to in the paragraph prior? The names in the figure should be explained too – I am not sure whether ‘Kim’ and ‘Mildred’ are the names of the bots or the users.
    • Examples 7 to 9: Where do these examples come from? This is not initially explained, but upon Table 1 it appears as if those data derive from this experiment.
    • In section 8, it is also not clear until quite a few pages in that the focus of the analysis is now on prosodic variation. Amazon Alexa is mentioned earlier on, but I initially expected that the analysis was only lexical. Overall, I found this section quite hard to follow and the link to the overall analytical framework was not entirely clear.
    • Figure 12 – I don’t understand the purpose of the example – it distracts from rather than adds to your subsequent conclusion that HCI will inevitably become increasingly anthropomorphic as systems continue to improve”. You don’t need this example to show this.

    Structure of the contribution and linguistic form

    I have made some comments on structure above, mostly about moving definitions and explanations of process etc. to an earlier place within the text. Generally, the text is well written and engaging.

     

    Recommendation

    Overall, I recommend this paper for publication with major revisions, to focus on

    • A discussion of the implications of these findings for practice (conversation design, etc.)
    • A clear definition of HCI (and a critical discussion of related concepts such as HMI, HAI
    • A critical discussion of the methodological grounding within CA, which contradicts the sociolinguistic framing. Linked to this, a more critical discussion of what is meant by ‘context’
    • Content clarifications, e.g. Figure 7, origin of examples 7-9
    • A clearer structure of section 8

    I hope that the authors can make these revisions as I am looking forward to seeing this article published.

     

  2. RedaktionApril 28, 2026 at 14:24Reply

    Review for the submission

    Semantic Coherence and Topic Continuity in Interactions with AI

    reviewed by Florina Zülli

    Recommendation: major revisions

    The article examines semantic coherence in human–machine interaction (HCI) and, building on earlier work (Lotze 2016), further develops the concept of quasi-coherence. The article represents a theoretically informed and empirically ambitious contribution that draws on a range of mixed-methods approaches—conversation analysis, corpus analysis, and experimental procedures—to investigate how AI systems, both in written and spoken modalities, produce responses that may be characterized as coherent, quasi-coherent, or incoherent. The manuscript engages productively with key theoretical frameworks, including the CASA paradigm and Searle’s speech act theory, though it remains at times on well-trodden ground. The discussion of AI systems‘ lack of genuine understanding, while solid, offers limited new theoretical impetus. Situating this argument more explicitly within current debates—such as those surrounding alignment or the pragmatics of large language models—would considerably strengthen the manuscript’s contribution.

    The overall recommendation is to ACCEPT, subject to the following revisions:

     

    Formal remarks:
    The manuscript contains a few orthographic errors that should be corrected prior to publication. These include missing letters (e.g., the letter t in „yet robust framework capable of addressing he micro-level“), misspellings such as „useres“ instead of „users“, and the recurring „Nobelprice“ instead of „Nobel Prize“). The authors are asked to carry out a thorough proofreading pass.

     

    Minor content remarks:
    The article states that both older and contemporary systems are examined and refers to ChatGPT-3 and Amazon Alexa as representatives of the latter category. And although Alexa is continuously updated, it is questionable whether the label „contemporary“ is fully warranted: Alexa has been on the market for over a decade, and the data analyzed in the article were collected in 2019. It would have been instructive to examine a more recent voice assistant system — such as Sesame (2025; see https://app.sesame.com/), which shows notable improvements over Alexa precisely with regard to the phenomena investigated in this article.

    • Furthermore, several key concepts are introduced later in the manuscript than would be ideal. In particular, what is meant by „social bots“ or „socially oriented bots“, and what „the three forms of dialogue coherence“ (Lotze 2016) specifically encompass, remain unclear for a considerable portion of the text. Earlier introduction of these terms would improve readability and comprehension.
    • Finally, while the empirical analyses are methodologically varied and largely convincing, the connections between the individual studies could be framed more explicitly, so as to make the overall argumentative arc of the manuscript more transparent.

     

    Major content remark: Hypothesis formulation and interpretation of results

     

    Chapter 8:
    The labelling of the hypotheses as H1 and H2 warrants critical attention. H1, as formulated, expresses a null expectation—namely, the absence of systematic deaccentuation—and thus corresponds more closely to an H0. H2, by contrast, posits a directional effect and therefore constitutes the actual alternative hypothesis (H1). Renumbering the hypotheses would render the hypothesis structure considerably more transparent.

    Beyond this formal concern, the interpretation of the results deserves closer scrutiny. The authors characterize H1 as „partially confirmed“, yet the data not only fail to show the expected effects but, in some respects, point in the opposite direction: pitch measurements show a “tendency for rising contours, rather than the HHC-like lowering “, and duration measurements for the second stimulus theme indicate that “the main referent was lengthened during utterance” rather than shortened. The authors are asked to address this discrepancy explicitly and to discuss possible explanatory accounts.

    With regard to H2, the authors offer what amounts to a somewhat ambivalent conclusion. The reported findings—higher mean F0 values following coherent system turns, increased intensity after quasi-coherent responses in stimulus theme 1—are inconsistent across parameters and stimulus themes and represent, at best, isolated tendencies that would benefit from more substantive interpretation.

     

    Chapter 9:
    The conclusion that Alexa-directed speech exhibits „distinct patterns of (de)accentuation depending on the preceding reply“ appears empirically overstated. The authors themselves concede that „apart from the duration measurements in the first stimulus theme, we did not observe prosodic patterns typically associated with topic continuity, even when the system’s response appeared coherent.“ If only a single acoustic parameter shows a reduction in a single condition, it is difficult to speak of “distinct patterns”. This concern is compounded by the fact that the Alexa study is a single-case design (one participant, n = 24 tokens), which substantially limits the generalizability of the findings. A more cautious formulation—to the effect that the data provide initial indications of prosodic variation as a function of coherence condition—would more accurately reflect the actual level of evidence. 

    Finally, the claim that quasi-coherent responses provoke „cognitive uncertainty, hesitation, and disorientation“ in users is empirically unsubstantiated within the studies presented. The acoustic data from the Alexa study provide no direct evidence for this assertion, and it remains unclear to the reader on what basis this conclusion is drawn. The authors are asked either to provide a more detailed account and, where possible, empirical grounding for this claim, or to reframe it as a hypothesis to be tested in future research.

     

    Overall assessment:
    These remarks do not diminish the manuscript’s value. The studies presented offer a methodologically varied and theoretically well-grounded contribution to the field. The concept of quasi-coherence is developed in a convincing and original manner, and the systematic comparison of written and spoken HCI represents a productive analytical approach. Particularly noteworthy is the manuscript’s careful delineation of quasi-coherence from hallucination—a distinction that is not only analytically precise but also timely, given current debates around generative AI systems. In establishing quasi-coherence as a fourth, analytically distinct category alongside coherence, incoherence, and hallucination, the authors make a contribution that extends well beyond the immediate empirical findings. Furthermore, the data collected are of genuine interest and hold potential; the revisions requested above are intended not to question their value, but to ensure that the conclusions drawn are commensurate with what the data can, at this stage, robustly support.

     

  3. RedaktionApril 28, 2026 at 14:36Reply

    Based on the reviewers’ comments, we ask the authors to revise the manuscript (major revisions).

     

Leave a Comment

Bitte nutzen Sie Ihren Klarnamen für Kommentare.
Please use your real name for comments.