[OPR] Kohonen-Aho & Vatanen: Participation and co-presence in the virtual world of Second Life

Blogstract of

Participation and co-presence in the virtual world of Second Life. Transitioning from a gathering to an encounter

by Laura Kohonen-Aho & Anna Vatanen

Social situations where people are present with someone in a shared space can be divided into ‘encounters’ and ‘gatherings’ (Goffman 1963). In encounters, participants share a joint orientation (e.g., by having a conversation), whereas in gatherings participants are co-present without a joint focus (e.g., strangers in an elevator). Mondada (2009) and De Stefani and Mondada (2018) have shown how in face-to-face situations a common interactional space is achieved multimodally, transforming silent co-present persons in a gathering to co-participants in an encounter by way of using ‘embodied pre-beginnings.’

In computer-mediated communication, gatherings do not easily exist, since communication technologies are primarily developed for connecting people to have focused encounters – i.e., to talk together – across distances. Virtual worlds (VWs), however, provide for the occurrence of both gatherings and encounters between participants. VWs are designed both for creating users a sense of being present (and together with others) in a joint space, and for joint interaction via the use of virtual characters (avatars).

Our paper explores how gatherings that occur in a VW turn into encounters. We present close sequential analysis of moments when, after a silent gathering, interaction among the team is gradually resumed, and focus especially on the embodied avatar conduct in this process. The data comprise 12 video-recorded three-person team interactions in the VW of Second Life. All teams follow a similar interaction structure including alternation between teamwork episodes (pre-planned encounters) and individual questionnaire filling episodes (pre-planned gatherings). All participants are faster than expected in completing their questionnaires, which results in unplanned ‘surplus time’ until the initiation of the next collaborative task. We examine the transitions from a gathering into an encounter across 40 episodes.

Our findings show that these transitions are accomplished via two different processes: 1) a gathering turns into an encounter by using verbal means only (i.e., someone ‘just’ starts to talk), and 2) a gathering turns into an encounter through an embodied pre-beginning phase (i.e., someone first moves her/his avatar before anyone talks). We observe that like in face-to-face situations, also in VWs the participants much more often use embodied resources rather than rely only on verbal means to achieve the transition. However, the embodied practices in a VW have unique characteristics when compared to face-to-face situations. For example, since mutual gaze is not easily available for the participants, avatar movement is often used for establishing joint attention.

We discuss the ways in which participants use embodied pre-beginnings in a VW to display what we call encounter-readiness, instead of displaying potential lack of presence by avatar stillness. Virtually embodied behavior during the ‘surplus time’ signals one’s readiness and availability to move to an encounter. It seems that if avatars are not moved, the team members have very few cues about whether the others are ready to move into an encounter or not, and therefore observing the co-participants’ avatar behavior is used to gain information on their availability for interaction.


De Stefani, Elwys/Mondada, Lorenza (2018). Encounters in Public Space: How Acquainted Versus Unacquainted Persons Establish Social and Spatial Arrangements. Research on Language and Social Interaction, 51(3), 248–270.

Goffman, Erving (1963). Behavior in public places. New York: The Free Press.

Mondada, Lorenza (2009). Emergent focused interactions in public places: A systematic analysis of the multimodal achievement of a common interactional space. Journal of Pragmatics, 41(10), 1977–1997.

  1. Leelo KeevallikJuly 28, 2020 at 14:37Reply

    The paper ”Participation and co-presence in the virtual world of Second Life” looks at how people in Virtual World (VW) go from gatherings to focused encounters, in Goffman’s terms. It is based on experimental data featuring group assignments from which key scenes have been systematically chosen for the analysis of verbal and embodied practices of achieving a focused encounter. The study is carefully carried out and provides valuable insights into the specificity of interaction in VW, in particular regarding the use of vocal and bodily resources in launching an interactional encounter ­– something that takes coordinated effort to happen. Before publishing, I would recommend a revision for this paper along the following lines.
    Transcripts as analysis
    Even though VR interaction takes place in a three-dimensional space the raw data needs to be transformed into a readable two-dimensional format not only for the sake of the readers but also to establish the appropriate level of analytical rigor. The authors have done a great job on multimodal transcription, following the principles worked out by Lorenza Mondada. For obvious reasons, these principles are always in flux and need to be adjusted to every specific dataset. The authors of this paper have, for example, invented a convention for the body of the avatar (e.g., JaaA). There are, however, other conventions that may not have come across quite systematically. Here are some of my suggestions that would amount to minor fixes but bring the transcripts in line with some recent developments in Mondada style that tends to be seen as the gold standard in multimodality research. 
    Excerpt 1 line 02
    “does not move but stands still” sounds too wordy for a transcription that should convey information as concisely as possible. (See also in other places.)
    Excerpt 1 line 03
    There is unnecessary repetition in Jaa “sits still, gaze toward his screen –>”
    If a behavior continues across several lines the number of the last line can be added after –>, e.g., (l. 04)
    (Here it is the next line, so this strategy is not necessary but this is something I noticed in several places. This helps to avoid too much repetition and ease readability.)
    Excerpt 1 line 04
    There is double action in Sus: either she is sitting still or stretching her shoulders. Has she been sitting still from line 01? Then that comment should go there.
    I would also consider (as I am sure you have) keeping the transcription lines for one person together, rather than strictly following the timeline. As a reader one would perhaps want to have a “narrative” of one body at a time. You do that in excerpt 2.
    Excerpt 1 line 05
    “places hands” is a momentous action and should therefore not be combined with –> An option would be to rephrase it with “hands on keyboard”, which would be an extended action. (I noticed the same issue in other transcripts, so I encourage you to think about the exact phrasings of momentous vs. extended actions in all the examples. For instance, line 07 “starts to use” features the same issue, although here the reader can easily infer that “using” will be the temporally extended action.)
    As far as I know, Mondada and many other researchers along with her, would have the original language in bold rather than the translation. This is not only for the ideological reasons of respecting different languages but also because the utterances were actually performed in Finnish, which constitutes the primary data.
    Excerpt 2 line 02
    “keeps touching the whiteboard” is not a behavior that starts at the moment you have marked. I suggest moving that description to line 01. (Is this something that everybody does while answering the questionnaire?)
    Excerpt 2 line 06 should be in bold
    Excerpt 6 lines 08-09 the word order is not good in English. In addition, consider removing the translation of “so” for “että”, which I think is not idiomatic. I would say “että” is untranslatable in this function.
    In other respects you may have idiosyncratic solutions, such as not marking the position of figures with a # in the timeline, but I can see how those make sense and are easily readable anyway.
    Overall, I suggest consulting https://www.lorenzamondada.net for the latest developments in her transcription system in addition to her journal articles.
    The title is slightly remote to what the paper does. I would consider using the central terms of your argument, such as “encounter”. I realize that you are struggling with two alternative sets of concepts, Goffmanian and those from VW research, but I guess you need to settle on one of them in order not to lose the main thread. You could discuss more why establishing “the experience of psychological involvement” is problematic from the EMCA perspective and thus argue for the benefits of the Goffmanian terms. Of course, you also need to draw the parallels to the terms that may be more familiar to some of the readers of this journal. Even that can be handled in the title, with parenthesis or other means. 
    One thing that I find striking in comparison to RL findings is that the verbal stream of interaction can be totally disconnected from the avatars who are ignorable. This could hardly ever happen in a crowded street, which could be underlined more in the arguments. Other underdiscussed aspects include the activity differences between having committed to a research experiment within a certain timeframe with a set group of people vs. randomly approaching someone in the street. The accountabilities for participants are quite different, resulting in crucial differences in expectability of interaction, and perhaps verbal interaction in particular. I see your setting as not necessarily making relevant a conversational activity at all, which could render questionable such heavy reliance on Hoey’s (excellent) research that is essentially about verbal action sequences. Did you consider entirely non-verbal openings?
    Having myself worked on (re-)openings, I believe that an opening can only be one in case it is responded to, as otherwise it is treated as self-talk. You do not make that very clear in describing how you went about establishing your collection. Furthermore, you are likewise not actually working with openings but re-openings or re-entries because the participants are stuck there with the joint task across a longer period of time – a nice parallel to my stable setting – and the correct term should thus be re-opening.
    Also, more can be made of the action types that both me and Szymanski worked with in the cited papers. I wonder why you chose to categorize the openings topically (especially in sections 4.2.1-4.2.3) rather than in terms of action (which does more justice to the EMCA method) and what that particular topical classification and counting of respective instances actually contribute to your analysis. 
    I am particularly struck by the fact that in contrast to my stable setting you apparently cannot build an opening on an existing potential joint attention foci as defined by the participants bodies, nor the manner in which the speaker uses their own body (straight/bent, gazing up/down, mumbling/clear speech). You cannot actually know whether there is “joint visual focus” at any moment, as the RL participant may be attending to other matters behind the screen. The RL bodily engagements are neither public nor accountable in interaction in your setting and sometimes I wondered what the relevance of their RL behavior was at every moment. I believe that the paper would benefit from a proper discussion of both what is public and the related accountability, which are central to all human interaction.
    If we are to believe the analysis of excerpt 1 as an instance of a verbal launch of interaction, the lowering of hands from the whiteboard needs to be explained as not constituting an embodied pre-beginning. It is currently puzzling for the reader, even though you later mention briefly something about this hand removal in particular. To me it seems like a clear candidate for a display of readiness to interact after assignment completion.
    I hardly noticed any issues apart from some word order and expression problems that can be fixed by a native proofreader. The paper is very well written. 
    Line 786 breaths > breathes

  2. RedaktionJuly 30, 2020 at 13:53Reply

    Review by Sylvaine Tuncer


    Pre-scriptum: One consequence of this review system is that I had access to Leelo Keevallik’s review (whom I greet here!) as mine was ready but not published yet. I decided to read the other review and try to publish my own as unchanged as possible, which I did (fortunately I found no major disagreement!).


    Kohonen-Aho and Vatanen’s paper studies mediated interactions between players of an online game. The topic is original, they have collected a very good set of data, and analyses were rigorously conducted. Taking an EMCA approach, the paper opens new directions to investigate computer-mediated communication and online games, but I also think it would benefit from a more specific grounding in the literature, and from a more detailed description of the data, setting and activity under study.


    Main issues


    1.     A review and discussion of CA literature on pre-beginnings (an extensively studied concept since Schegloff 1979), even brief and non-exhaustive, is lacking. Stating with their constitutive elements and what they commonly do in interaction would make the paper a more informed contribution to the topic. Equally relevant as the former a short review on resuming interaction in an open state of incipient talk (which the manuscript doesn’t mention at all at this stage) seems in point (Goffman, 1963; Schegloff and Sacks, 1973; Szymanski 1999; Szymanski et al., 2006, and probably more). Szymanski’s work is all the more relevant that it is about technology-mediated communication, in that case radio communication. While it dates back from the 1990s, I find much to relate to with the phenomena under study here. See references below.


    2.     More details on the data and the activities under study are needed, as early as in the introduction and then expanded on in the “Data” section. This would in turn give a clearer view of what is amenable to analysis and what is not, and how much the data gives analysts access to participant’s perspective as they are playing. I would be happy to know more about (a) players’ resources in the game, as well as their access to each other’s actions; (b) how data was collected (too superficial in the current manuscript); and (c) how much the data capture players’ resources (connecting (a) and (b)). Here are a few questions that came to my mind—they’re connected so one element could answer several questions and perhaps not all the question need be distinctively answered:

            What does the computer interface look like?

            What are players’ resources to control their own avatars?

            What indications, signs or traces of other players and their actions does each player have?

            Can they see other avatars only when they are in their field of vision?

            Do participants feature to each other in any other way than through their avatar?

            The recording set-up: were participants located in different places? (I think this is vaguely mentioned quite late in the paper)


    3.     The next point concerns the different perspectives involved in these “fractured ecologies” (Luff et al. 2003). It seems to me that the analyses take an all-inclusive perspective in terms of access to participants’ actions: that of analysts who have recordings of the embodied of all participants’ in their distributed physical spaces. The transcripts as well as the analyses below each excerpt include not only the activities taking place in the virtual world, but also “each team member’s gaze direction, facial expressions, body movements and hand movements on the keyboard”. But from what I understand, players do not have access to each other’s physical space (and the analyses do not show that they orient in any ways to each other’s embodied conduct). So, I wonder how much of this should be included in the analyses (although it’s available in the data). Perhaps focusing on what happens in the virtual world would work better, especially if the focus of the paper, as the introduction states, and the criterion for the classification of extracts in two different processes, is embodied conduct in the game. I think this issue is the most challenging one to resolve, yet I would like the authors to give it serious consideration as it might shed different lights on the phenomena under study.


    Minor points

    As much as I agree that the Jefferson + Mondada transcription method is the most heuristic for analysis, and extensively use it myself, I find the transcripts very hard work for the reader, because of the type of data and phenomenon. Even I gave up on getting to grips with each of the excerpts. That’s why I would suggest the use of graphic transcripts (see Laurier 2014; and for an example of use: Laurier, Brown and McGregor 2016). They do not require extensive work, and more importantly (an issue too often neglected), this would make the paper accessible, and enjoyable, to a much broader audience than CA-specialists only. Or, if the authors choose to keep this type of transcripts (which I really don’t recommend), they should at least lighten them as much as possible.

    You could mention in the introduction (around line 26) previous work in CSCW on technologies that enable unfocused gatherings, studies of “media spaces”: substituting for copresence in workplaces and offering the possibility for “gatherings” at a distance. Heath and Luff (1992) highlights the difficulties of transitioning from co-presence to interaction, might be interesting to discuss in relation to your focus.

    The section on face-to-face initiation of encounters (in 2.1) need not be so long and so detailed considering the present study does not build on this area of research.

    I would suggest removing reference to a personal “submitted” paper.


    For further consideration…

    The paper brought to my attention a question I find interesting regarding interactions in virtual worlds: how much of avatar’s embodied conduct is produced and treated as intentionally public and socially meaningful. Integral to Goffman’s approach is the fact that we, humans, perceive from within our bodies and have an immediate control of our bodies (leaving aside what we ‘give off’, as the authors mention). In VW, on the other hand, the mediated and most probably partial characters of participants control over their avatars’ embodied conduct lets us wonder how much socially meaningful embodied behaviour can be; and it is not either perceived from avatars’ eyes.

    I am not sure how amenable to EMCA research this area of questioning could be, but for the least I was wondering if/how/when players actually treat the embodied conduct of other players’ avatars as interactionally significant and relevant, as potential ‘displays of intention’ (see, e.g., Smith 2017). In the paper there is a hint to that (paragraph lines 295-316) from the perception side (“the user’s ongoing activities are still far less obvious for others to detect than they are in real life”), but the production side is also intriguing. Would it be possible to include in the paper a sort of description of how easy or difficult it feels to control and direct one’s avatar?



    Berger, Israel, Rowena Viney and John P. Rae (2016). Do continuing states of incipient talk exist? Journal of Pragmatics 91, 29–44.

    Heath, Christian and Paul Luff (1992). Media space and communicative asymmetries: Preliminary observations of video-mediated interactions. Human-Computer Interaction 7, 315-346.

    Laurier, Eric (2014). The graphic transcript: Poaching comic book grammar for inscribing the visual, spatial and temporal aspects of action. Geography Compass 8(4), 235-248.

    Laurier, Eric, Barry Brown and Moira McGregor (2016). Mediated pedestrian mobility: Walking and the map app. Mobilities 11(1), 117.134.

    Luff, Paul, Hideaki Kuzuoka, Jon Hindmarsh, Keiichi Yamazaki, Shinya Oyama and Christian Heath (2003). Fractured ecologies: creating environments for collaboration. Human-Computer Interaction 18(1-2), 51–84.

    Schegloff, Emanuel A. and Harvey Sacks (1973). Opening up closings. Semiotica 4, 289-327.

    Smith, Robin J. (2017). Left to their own devices? The practical organization of space, interaction, and communication in and as the work of crossing a shared space intersection. Sociologica 2, 1-32.

    Szymanski, Margaret H. (1999). Re-engaging and dis-engaging talk in activity. Language in Society 28(1), 1-23.

    Szymanski, Margaret H., Erik Vinkhuyzen, Paul M. Aoki and Allison Woodruff (2006). Organizing a remote state of incipient talk: Push-to-talk mobile radio interaction. Language in Society 35, 393-418.


