Update (22.03.2023): The Open Peer Review for this submission has been completed. Based on the Open Peer Review, the article has been approved for publication in the Journal for Media Linguistics and is available at: https://doi.org/10.21248/jfml.2021.30.
On this page you can download the discussion paper that was submitted for publication in the Journal for Media Linguistics. The blogstract summarises the submission in a comprehensible manner. You can comment on the discussion paper and the blogstract below this post. Please use your real name for this purpose. For detailed comments on the discussion paper please refer to the line numbering of the PDF.
This submission is a contribution to the special issue “Co-constructing presence between players and non-players in videogame sessions”.
Blogstract of
Participation and co-presence in the virtual world of Second Life. Transitioning from a gathering to an encounter
by Laura Kohonen-Aho & Anna Vatanen
Social situations where people are present with someone in a shared space can be divided into ‘encounters’ and ‘gatherings’ (Goffman 1963). In encounters, participants share a joint orientation (e.g., by having a conversation), whereas in gatherings participants are co-present without a joint focus (e.g., strangers in an elevator). Mondada (2009) and De Stefani and Mondada (2018) have shown how in face-to-face situations a common interactional space is achieved multimodally, transforming silent co-present persons in a gathering to co-participants in an encounter by way of using ‘embodied pre-beginnings.’
In computer-mediated communication, gatherings do not easily exist, since communication technologies are primarily developed for connecting people to have focused encounters – i.e., to talk together – across distances. Virtual worlds (VWs), however, provide for the occurrence of both gatherings and encounters between participants. VWs are designed both for creating users a sense of being present (and together with others) in a joint space, and for joint interaction via the use of virtual characters (avatars).
Our paper explores how gatherings that occur in a VW turn into encounters. We present close sequential analysis of moments when, after a silent gathering, interaction among the team is gradually resumed, and focus especially on the embodied avatar conduct in this process. The data comprise 12 video-recorded three-person team interactions in the VW of Second Life. All teams follow a similar interaction structure including alternation between teamwork episodes (pre-planned encounters) and individual questionnaire filling episodes (pre-planned gatherings). All participants are faster than expected in completing their questionnaires, which results in unplanned ‘surplus time’ until the initiation of the next collaborative task. We examine the transitions from a gathering into an encounter across 40 episodes.
Our findings show that these transitions are accomplished via two different processes: 1) a gathering turns into an encounter by using verbal means only (i.e., someone ‘just’ starts to talk), and 2) a gathering turns into an encounter through an embodied pre-beginning phase (i.e., someone first moves her/his avatar before anyone talks). We observe that like in face-to-face situations, also in VWs the participants much more often use embodied resources rather than rely only on verbal means to achieve the transition. However, the embodied practices in a VW have unique characteristics when compared to face-to-face situations. For example, since mutual gaze is not easily available for the participants, avatar movement is often used for establishing joint attention.
We discuss the ways in which participants use embodied pre-beginnings in a VW to display what we call encounter-readiness, instead of displaying potential lack of presence by avatar stillness. Virtually embodied behavior during the ‘surplus time’ signals one’s readiness and availability to move to an encounter. It seems that if avatars are not moved, the team members have very few cues about whether the others are ready to move into an encounter or not, and therefore observing the co-participants’ avatar behavior is used to gain information on their availability for interaction.
References
De Stefani, Elwys/Mondada, Lorenza (2018). Encounters in Public Space: How Acquainted Versus Unacquainted Persons Establish Social and Spatial Arrangements. Research on Language and Social Interaction, 51(3), 248–270.
Goffman, Erving (1963). Behavior in public places. New York: The Free Press.
Mondada, Lorenza (2009). Emergent focused interactions in public places: A systematic analysis of the multimodal achievement of a common interactional space. Journal of Pragmatics, 41(10), 1977–1997.

The paper ”Participation and co-presence in the virtual world of Second Life” looks at how people in Virtual World (VW) go from gatherings to focused encounters, in Goffman’s terms. It is based on experimental data featuring group assignments from which key scenes have been systematically chosen for the analysis of verbal and embodied practices of achieving a focused encounter. The study is carefully carried out and provides valuable insights into the specificity of interaction in VW, in particular regarding the use of vocal and bodily resources in launching an interactional encounter – something that takes coordinated effort to happen. Before publishing, I would recommend a revision for this paper along the following lines.
Transcripts as analysis
Even though VR interaction takes place in a three-dimensional space the raw data needs to be transformed into a readable two-dimensional format not only for the sake of the readers but also to establish the appropriate level of analytical rigor. The authors have done a great job on multimodal transcription, following the principles worked out by Lorenza Mondada. For obvious reasons, these principles are always in flux and need to be adjusted to every specific dataset. The authors of this paper have, for example, invented a convention for the body of the avatar (e.g., JaaA). There are, however, other conventions that may not have come across quite systematically. Here are some of my suggestions that would amount to minor fixes but bring the transcripts in line with some recent developments in Mondada style that tends to be seen as the gold standard in multimodality research.
Excerpt 1 line 02
“does not move but stands still” sounds too wordy for a transcription that should convey information as concisely as possible. (See also in other places.)
Excerpt 1 line 03
There is unnecessary repetition in Jaa “sits still, gaze toward his screen –>”
If a behavior continues across several lines the number of the last line can be added after –>, e.g., (l. 04)
(Here it is the next line, so this strategy is not necessary but this is something I noticed in several places. This helps to avoid too much repetition and ease readability.)
Excerpt 1 line 04
There is double action in Sus: either she is sitting still or stretching her shoulders. Has she been sitting still from line 01? Then that comment should go there.
I would also consider (as I am sure you have) keeping the transcription lines for one person together, rather than strictly following the timeline. As a reader one would perhaps want to have a “narrative” of one body at a time. You do that in excerpt 2.
Excerpt 1 line 05
“places hands” is a momentous action and should therefore not be combined with –> An option would be to rephrase it with “hands on keyboard”, which would be an extended action. (I noticed the same issue in other transcripts, so I encourage you to think about the exact phrasings of momentous vs. extended actions in all the examples. For instance, line 07 “starts to use” features the same issue, although here the reader can easily infer that “using” will be the temporally extended action.)
As far as I know, Mondada and many other researchers along with her, would have the original language in bold rather than the translation. This is not only for the ideological reasons of respecting different languages but also because the utterances were actually performed in Finnish, which constitutes the primary data.
Excerpt 2 line 02
“keeps touching the whiteboard” is not a behavior that starts at the moment you have marked. I suggest moving that description to line 01. (Is this something that everybody does while answering the questionnaire?)
Excerpt 2 line 06 should be in bold
Excerpt 6 lines 08-09 the word order is not good in English. In addition, consider removing the translation of “so” for “että”, which I think is not idiomatic. I would say “että” is untranslatable in this function.
In other respects you may have idiosyncratic solutions, such as not marking the position of figures with a # in the timeline, but I can see how those make sense and are easily readable anyway.
Overall, I suggest consulting https://www.lorenzamondada.net for the latest developments in her transcription system in addition to her journal articles.
Argument
The title is slightly remote to what the paper does. I would consider using the central terms of your argument, such as “encounter”. I realize that you are struggling with two alternative sets of concepts, Goffmanian and those from VW research, but I guess you need to settle on one of them in order not to lose the main thread. You could discuss more why establishing “the experience of psychological involvement” is problematic from the EMCA perspective and thus argue for the benefits of the Goffmanian terms. Of course, you also need to draw the parallels to the terms that may be more familiar to some of the readers of this journal. Even that can be handled in the title, with parenthesis or other means.
One thing that I find striking in comparison to RL findings is that the verbal stream of interaction can be totally disconnected from the avatars who are ignorable. This could hardly ever happen in a crowded street, which could be underlined more in the arguments. Other underdiscussed aspects include the activity differences between having committed to a research experiment within a certain timeframe with a set group of people vs. randomly approaching someone in the street. The accountabilities for participants are quite different, resulting in crucial differences in expectability of interaction, and perhaps verbal interaction in particular. I see your setting as not necessarily making relevant a conversational activity at all, which could render questionable such heavy reliance on Hoey’s (excellent) research that is essentially about verbal action sequences. Did you consider entirely non-verbal openings?
Having myself worked on (re-)openings, I believe that an opening can only be one in case it is responded to, as otherwise it is treated as self-talk. You do not make that very clear in describing how you went about establishing your collection. Furthermore, you are likewise not actually working with openings but re-openings or re-entries because the participants are stuck there with the joint task across a longer period of time – a nice parallel to my stable setting – and the correct term should thus be re-opening.
Also, more can be made of the action types that both me and Szymanski worked with in the cited papers. I wonder why you chose to categorize the openings topically (especially in sections 4.2.1-4.2.3) rather than in terms of action (which does more justice to the EMCA method) and what that particular topical classification and counting of respective instances actually contribute to your analysis.
I am particularly struck by the fact that in contrast to my stable setting you apparently cannot build an opening on an existing potential joint attention foci as defined by the participants bodies, nor the manner in which the speaker uses their own body (straight/bent, gazing up/down, mumbling/clear speech). You cannot actually know whether there is “joint visual focus” at any moment, as the RL participant may be attending to other matters behind the screen. The RL bodily engagements are neither public nor accountable in interaction in your setting and sometimes I wondered what the relevance of their RL behavior was at every moment. I believe that the paper would benefit from a proper discussion of both what is public and the related accountability, which are central to all human interaction.
If we are to believe the analysis of excerpt 1 as an instance of a verbal launch of interaction, the lowering of hands from the whiteboard needs to be explained as not constituting an embodied pre-beginning. It is currently puzzling for the reader, even though you later mention briefly something about this hand removal in particular. To me it seems like a clear candidate for a display of readiness to interact after assignment completion.
Formalia
I hardly noticed any issues apart from some word order and expression problems that can be fixed by a native proofreader. The paper is very well written.
Line 786 breaths > breathes
Review by Sylvaine Tuncer
–
Pre-scriptum: One consequence of this review system is that I had access to Leelo Keevallik’s review (whom I greet here!) as mine was ready but not published yet. I decided to read the other review and try to publish my own as unchanged as possible, which I did (fortunately I found no major disagreement!).
Kohonen-Aho and Vatanen’s paper studies mediated interactions between players of an online game. The topic is original, they have collected a very good set of data, and analyses were rigorously conducted. Taking an EMCA approach, the paper opens new directions to investigate computer-mediated communication and online games, but I also think it would benefit from a more specific grounding in the literature, and from a more detailed description of the data, setting and activity under study.
Main issues
1. A review and discussion of CA literature on pre-beginnings (an extensively studied concept since Schegloff 1979), even brief and non-exhaustive, is lacking. Stating with their constitutive elements and what they commonly do in interaction would make the paper a more informed contribution to the topic. Equally relevant as the former a short review on resuming interaction in an open state of incipient talk (which the manuscript doesn’t mention at all at this stage) seems in point (Goffman, 1963; Schegloff and Sacks, 1973; Szymanski 1999; Szymanski et al., 2006, and probably more). Szymanski’s work is all the more relevant that it is about technology-mediated communication, in that case radio communication. While it dates back from the 1990s, I find much to relate to with the phenomena under study here. See references below.
2. More details on the data and the activities under study are needed, as early as in the introduction and then expanded on in the “Data” section. This would in turn give a clearer view of what is amenable to analysis and what is not, and how much the data gives analysts access to participant’s perspective as they are playing. I would be happy to know more about (a) players’ resources in the game, as well as their access to each other’s actions; (b) how data was collected (too superficial in the current manuscript); and (c) how much the data capture players’ resources (connecting (a) and (b)). Here are a few questions that came to my mind—they’re connected so one element could answer several questions and perhaps not all the question need be distinctively answered:
– What does the computer interface look like?
– What are players’ resources to control their own avatars?
– What indications, signs or traces of other players and their actions does each player have?
– Can they see other avatars only when they are in their field of vision?
– Do participants feature to each other in any other way than through their avatar?
– The recording set-up: were participants located in different places? (I think this is vaguely mentioned quite late in the paper)
3. The next point concerns the different perspectives involved in these “fractured ecologies” (Luff et al. 2003). It seems to me that the analyses take an all-inclusive perspective in terms of access to participants’ actions: that of analysts who have recordings of the embodied of all participants’ in their distributed physical spaces. The transcripts as well as the analyses below each excerpt include not only the activities taking place in the virtual world, but also “each team member’s gaze direction, facial expressions, body movements and hand movements on the keyboard”. But from what I understand, players do not have access to each other’s physical space (and the analyses do not show that they orient in any ways to each other’s embodied conduct). So, I wonder how much of this should be included in the analyses (although it’s available in the data). Perhaps focusing on what happens in the virtual world would work better, especially if the focus of the paper, as the introduction states, and the criterion for the classification of extracts in two different processes, is embodied conduct in the game. I think this issue is the most challenging one to resolve, yet I would like the authors to give it serious consideration as it might shed different lights on the phenomena under study.
Minor points
As much as I agree that the Jefferson + Mondada transcription method is the most heuristic for analysis, and extensively use it myself, I find the transcripts very hard work for the reader, because of the type of data and phenomenon. Even I gave up on getting to grips with each of the excerpts. That’s why I would suggest the use of graphic transcripts (see Laurier 2014; and for an example of use: Laurier, Brown and McGregor 2016). They do not require extensive work, and more importantly (an issue too often neglected), this would make the paper accessible, and enjoyable, to a much broader audience than CA-specialists only. Or, if the authors choose to keep this type of transcripts (which I really don’t recommend), they should at least lighten them as much as possible.
You could mention in the introduction (around line 26) previous work in CSCW on technologies that enable unfocused gatherings, studies of “media spaces”: substituting for copresence in workplaces and offering the possibility for “gatherings” at a distance. Heath and Luff (1992) highlights the difficulties of transitioning from co-presence to interaction, might be interesting to discuss in relation to your focus.
The section on face-to-face initiation of encounters (in 2.1) need not be so long and so detailed considering the present study does not build on this area of research.
I would suggest removing reference to a personal “submitted” paper.
For further consideration…
The paper brought to my attention a question I find interesting regarding interactions in virtual worlds: how much of avatar’s embodied conduct is produced and treated as intentionally public and socially meaningful. Integral to Goffman’s approach is the fact that we, humans, perceive from within our bodies and have an immediate control of our bodies (leaving aside what we ‘give off’, as the authors mention). In VW, on the other hand, the mediated and most probably partial characters of participants control over their avatars’ embodied conduct lets us wonder how much socially meaningful embodied behaviour can be; and it is not either perceived from avatars’ eyes.
I am not sure how amenable to EMCA research this area of questioning could be, but for the least I was wondering if/how/when players actually treat the embodied conduct of other players’ avatars as interactionally significant and relevant, as potential ‘displays of intention’ (see, e.g., Smith 2017). In the paper there is a hint to that (paragraph lines 295-316) from the perception side (“the user’s ongoing activities are still far less obvious for others to detect than they are in real life”), but the production side is also intriguing. Would it be possible to include in the paper a sort of description of how easy or difficult it feels to control and direct one’s avatar?
References
Berger, Israel, Rowena Viney and John P. Rae (2016). Do continuing states of incipient talk exist? Journal of Pragmatics 91, 29–44.
Heath, Christian and Paul Luff (1992). Media space and communicative asymmetries: Preliminary observations of video-mediated interactions. Human-Computer Interaction 7, 315-346.
Laurier, Eric (2014). The graphic transcript: Poaching comic book grammar for inscribing the visual, spatial and temporal aspects of action. Geography Compass 8(4), 235-248.
Laurier, Eric, Barry Brown and Moira McGregor (2016). Mediated pedestrian mobility: Walking and the map app. Mobilities 11(1), 117.134.
Luff, Paul, Hideaki Kuzuoka, Jon Hindmarsh, Keiichi Yamazaki, Shinya Oyama and Christian Heath (2003). Fractured ecologies: creating environments for collaboration. Human-Computer Interaction 18(1-2), 51–84.
Schegloff, Emanuel A. and Harvey Sacks (1973). Opening up closings. Semiotica 4, 289-327.
Smith, Robin J. (2017). Left to their own devices? The practical organization of space, interaction, and communication in and as the work of crossing a shared space intersection. Sociologica 2, 1-32.
Szymanski, Margaret H. (1999). Re-engaging and dis-engaging talk in activity. Language in Society 28(1), 1-23.
Szymanski, Margaret H., Erik Vinkhuyzen, Paul M. Aoki and Allison Woodruff (2006). Organizing a remote state of incipient talk: Push-to-talk mobile radio interaction. Language in Society 35, 393-418.
We would like to thank Professor Leelo Keevallik for her insightful comments on how to improve our paper, especially on the categorization of our findings according to social actions in Section 4, and our transcripts. We are happy to have this opportunity to revise our paper and have done our best improving it by addressing all the comments and revisiting the corresponding parts in the text. Below, we have copied all comments from the review and added our responses under each comment.
COMMENT:
Transcripts as analysisEven though VR interaction takes place in a three-dimensional space the raw data needs to be transformed into a readable two-dimensional format not only for the sake of the readers but also to establish the appropriate level of analytical rigor. The authors have done a great job on multimodal transcription, following the principles worked out by Lorenza Mondada. For obvious reasons, these principles are always in flux and need to be adjusted to every specific dataset. The authors of this paper have, for example, invented a convention for the body of the avatar (e.g., JaaA). There are, however, other conventions that may not have come across quite systematically. Here are some of my suggestions that would amount to minor fixes but bring the transcripts in line with some recent developments in Mondada style that tends to be seen as the gold standard in multimodality research.
ANSWER:
Thank you for evaluating our transcripts in such detail!
COMMENT:
Excerpt 1 line 02“does not move but stands still” sounds too wordy for a transcription that should convey information as concisely as possible. (See also in other places.)
ANSWER:
We agree that “does not move but stands still” and many other descriptions of embodied actions in our transcripts were indeed too wordy. We have now shortened them and paid attention to the accuracy of our expression.
COMMENT:
Excerpt 1 line 03There is unnecessary repetition in Jaa “sits still, gaze toward his screen –>”If a behavior continues across several lines the number of the last line can be added after –>, e.g., (l. 04)(Here it is the next line, so this strategy is not necessary but this is something I noticed in several places. This helps to avoid too much repetition and ease readability.)
ANSWER:
Thank you for noticing this repetition, it is now corrected. We have also added the line where the behavior ends after the arrow when behavior continues across several lines.
We have also removed unnecessary transcripts of RL videos. We explain this choice in Section 3.3. The decision to reduce the amount of RL video transcripts relates to the comment of our second reviewer, Dr. Sylvaine Tuncer: the actions in RL videos are witnessable only to the analysts, not the participants themselves, and as such they only provide additional information for the analysts.
COMMENT:Excerpt 1 line 04There is double action in Sus: either she is sitting still or stretching her shoulders. Has she been sitting still from line 01? Then that comment should go there.
ANSWER:
This is very true. We have now paid attention to the accuracy of describing embodied actions. These specific RL actions by Sus are now removed altogether, because, after all, they do not provide relevant information for the participants (nor the readers).
COMMENT: I would also consider (as I am sure you have) keeping the transcription lines for one person together, rather than strictly following the timeline. As a reader one would perhaps want to have a “narrative” of one body at a time. You do that in excerpt 2.
ANSWER:
It is true that the transcripts are more readable this way. We have now grouped actions per participants in all transcripts, which is also what Mondada recommends.
COMMENT:Excerpt 1 line 05“places hands” is a momentous action and should therefore not be combined with –> An option would be to rephrase it with “hands on keyboard”, which would be an extended action. (I noticed the same issue in other transcripts, so I encourage you to think about the exact phrasings of momentous vs. extended actions in all the examples. For instance, line 07 “starts to use” features the same issue, although here the reader can easily infer that “using” will be the temporally extended action.)
ANSWER:
This is an excellent point. We have now paid close attention to the wordings in momentous vs. extended actions in the transcripts.
COMMENT:As far as I know, Mondada and many other researchers along with her, would have the original language in bold rather than the translation. This is not only for the ideological reasons of respecting different languages but also because the utterances were actually performed in Finnish, which constitutes the primary data.
ANSWER:
Very true. We have now changed the original language in bold.
COMMENT:Excerpt 2 line 02“keeps touching the whiteboard” is not a behavior that starts at the moment you have marked. I suggest moving that description to line 01. (Is this something that everybody does while answering the questionnaire?)
ANSWER:
Good observation. We have now marked in line 01 in each excerpt that the avatars are pointing their arms towards the whiteboard already from the beginning of the questionnaires.
COMMENT:Excerpt 2 line 06 should be in bold
ANSWER:
Thank you for noticing, we corrected this.
COMMENT:
Excerpt 6 lines 08-09 the word order is not good in English. In addition, consider removing the translation of “so” for “että”, which I think is not idiomatic. I would say “että” is untranslatable in this function.
ANSWER:
Thank you for noticing. We have corrected the English translation in Excerpt 6, currently lines 10-11
COMMENT:
In other respects you may have idiosyncratic solutions, such as not marking the position of figures with a # in the timeline, but I can see how those make sense and are easily readable anyway.
ANSWER:
Great, we are happy to hear this! All figures in relation to the data excerpts are now included in graphic transcripts, which we decided to add in addition to the Mondada style transcripts. The exact place of the figures in relation to the transcript is now quite approximate, but we hope that this way our data examples are more readable than in the discussion paper.
COMMENT:
Overall, I suggest consulting https://www.lorenzamondada.net for the latest developments in her transcription system in addition to her journal articles.
ANSWER:
We visited the website to ensure that our transcript conventions are in line with Mondada’s latest developments. As pointed out in the beginning of this review, we also had to make our own solutions for transcribing (such as Name and Name-A to separate the participants and their avatars). In addition, we have preserved our choice to transcribe the lines that are not in the focus more roughly, especially when it comes to the exact timing of the embodied behavior. The target lines are more precisely transcribed.
COMMENT: ArgumentThe title is slightly remote to what the paper does. I would consider using the central terms of your argument, such as “encounter”. I realize that you are struggling with two alternative sets of concepts, Goffmanian and those from VW research, but I guess you need to settle on one of them in order not to lose the main thread. You could discuss more why establishing “the experience of psychological involvement” is problematic from the EMCA perspective and thus argue for the benefits of the Goffmanian terms. Of course, you also need to draw the parallels to the terms that may be more familiar to some of the readers of this journal. Even that can be handled in the title, with parenthesis or other means.
ANSWER:
Thank you, we now realize that the title did not sufficiently reflect our main phenomenon. We have modified the title in line with this suggestion: “(Re-)Opening an encounter in the virtual world of Second Life: On types of joint presence in avatar interaction.”
We think that we found a satisfactory way of including also the concept of presence in the title, as “joint presence” relates to both copresence (gathering) and social presence (encounter).
We also added a clarification of why we are mainly using Goffmanian terms (encounter and gathering) instead of copresence and social presence. This can be found in Section 1.
COMMENT:One thing that I find striking in comparison to RL findings is that the verbal stream of interaction can be totally disconnected from the avatars who are ignorable. This could hardly ever happen in a crowded street, which could be underlined more in the arguments. Other underdiscussed aspects include the activity differences between having committed to a research experiment within a certain timeframe with a set group of people vs. randomly approaching someone in the street. The accountabilities for participants are quite different, resulting in crucial differences in expectability of interaction, and perhaps verbal interaction in particular. I see your setting as not necessarily making relevant a conversational activity at all, which could render questionable such heavy reliance on Hoey’s (excellent) research that is essentially about verbal action sequences. Did you consider entirely non-verbal openings?
ANSWER:
Thank you for pointing out these issues, which are indeed highly relevant for the analysis. We have, among other issues, added a discussion of non-verbal openings (which, based on participant orientation, do not exist in the data) in Section 3.3. We also now discuss some issues regarding the accountability of behavior in Sections 1 and 6.
COMMENT:
Having myself worked on (re-)openings, I believe that an opening can only be one in case it is responded to, as otherwise it is treated as self-talk. You do not make that very clear in describing how you went about establishing your collection. Furthermore, you are likewise not actually working with openings but re-openings or re-entries because the participants are stuck there with the joint task across a longer period of time – a nice parallel to my stable setting – and the correct term should thus be re-opening.
ANSWER:
We agree on this, and have changed the term we use to “re-opening”. We also added a note on the turns not being self-talk (they always receive a response) to Sections 3.3 and 4, and discuss this further in Section 6.
COMMENT: Also, more can be made of the action types that both me and Szymanski worked with in the cited papers. I wonder why you chose to categorize the openings topically (especially in sections 4.2.1-4.2.3) rather than in terms of action (which does more justice to the EMCA method) and what that particular topical classification and counting of respective instances actually contribute to your analysis.
ANSWER:
Thank you, re-categorizing the findings in line with this comment was a good choice. We have revised the findings in Section 4, which are now categorized according to actions – noticings and information requests – and added more references to Szymanski’s work, too. We still discuss the topical classification and the person who (re-)opens the encounter (the one who moves vs. other).
We changed one data example to equalize the amount of noticings and information requests in relation to avatar movement and virtual space. We removed previous Excerpt 5 from the discussion paper and added the current Excerpt 2.
COMMENT:I am particularly struck by the fact that in contrast to my stable setting you apparently cannot build an opening on an existing potential joint attention foci as defined by the participants bodies, nor the manner in which the speaker uses their own body (straight/bent, gazing up/down, mumbling/clear speech). You cannot actually know whether there is “joint visual focus” at any moment, as the RL participant may be attending to other matters behind the screen. The RL bodily engagements are neither public nor accountable in interaction in your setting and sometimes I wondered what the relevance of their RL behavior was at every moment. I believe that the paper would benefit from a proper discussion of both what is public and the related accountability, which are central to all human interaction.
ANSWER:
Yes, this is very true, and as we describe above, we now discuss the lack of accountability and the difficulty of achieving joint attention foci with virtually embodied interaction in Sections 3.3 and 6. We also reduced the amount of RL video transcripts because they are not indeed public for the participants in the setting. Thus, embodied RL actions are not relevant for the participants, they mostly are just a tool for the analysts.
COMMENT: If we are to believe the analysis of excerpt 1 as an instance of a verbal launch of interaction, the lowering of hands from the whiteboard needs to be explained as not constituting an embodied pre-beginning. It is currently puzzling for the reader, even though you later mention briefly something about this hand removal in particular. To me it seems like a clear candidate for a display of readiness to interact after assignment completion.
ANSWER:
This is a really good point. We now explain already in relation to Excerpt 1 that lowering an avatar’s arm from the whiteboard does not constitute an embodied action that could be relied on as an embodied pre-beginning. The first reason for this is that the verbal (re-)openings are produced only after avatars are moved in a more visible manner (walking, jumping, running). In general, the participants do not show orientation to the arm movement. There is only one example of the arm lowering working as an embodied pre-beginning (Excerpt 2), and in this case it is something that the team member doing it orients to and uses as a resource for (re-)opening the encounter (it is not something that other team members use as a cue of the team member’s readiness for an encounter). The second reason is that in many cases (e.g., Excerpt 6) the avatar does not lower its arm from the whiteboard when the questionnaire is finished but only after a relatively long delay. Thus, it seems that lowering the arm might be something that the participants do not even realize happening (or not happening) after the questionnaires are finished.
COMMENT:
Formalia
I hardly noticed any issues apart from some word order and expression problems that can be fixed by a native proofreader. The paper is very well written. Line 786 breaths > breathes
ANSWER:
Thank you, we are happy to hear this and the typo is now corrected. We have also checked the paper for other linguistic issues.
We are very thankful for Dr. Sylvaine Tuncer for her constructive and helpful comments for improving our paper. Revising the paper according to her recommendations gave us many insights especially in relation to the data and how it should be presented in the analysis and transcripts, as well as the literature recommendations. We have done our best to take into account all her comments, especially those regarding the presentation of our data and the revision of transcripts in a more readable form. Below, we have copied all comments from the review and added our responses under each comment.
COMMENT:
Main issues
A review and discussion of CA literature on pre-beginnings (an extensively studied concept since Schegloff 1979), even brief and non-exhaustive, is lacking. Stating with their constitutive elements and what they commonly do in interaction would make the paper a more informed contribution to the topic. Equally relevant as the former a short review on resuming interaction in an open state of incipient talk (which the manuscript doesn’t mention at all at this stage) seems in point (Goffman, 1963; Schegloff and Sacks, 1973; Szymanski 1999; Szymanski et al., 2006, and probably more). Szymanski’s work is all the more relevant that it is about technology-mediated communication, in that case radio communication. While it dates back from the 1990s, I find much to relate to with the phenomena under study here. See references below.
ANSWER:
Thank you for pointing this out! These bodies of literature are indeed relevant as a background for our study. We have now incorporated them in Section 2.1.
COMMENT:
More details on the data and the activities under study are needed, as early as in the introduction and then expanded on in the “Data” section. This would in turn give a clearer view of what is amenable to analysis and what is not, and how much the data gives analysts access to participant’s perspective as they are playing. I would be happy to know more about (a) players’ resources in the game, as well as their access to each other’s actions; (b) how data was collected (too superficial in the current manuscript); and (c) how much the data capture players’ resources (connecting (a) and (b)). Here are a few questions that came to my mind—they’re connected so one element could answer several questions and perhaps not all the question need be distinctively answered:
– What does the computer interface look like?
– What are players’ resources to control their own avatars?
– What indications, signs or traces of other players and their actions does each player have?
– Can they see other avatars only when they are in their field of vision?
– Do participants feature to each other in any other way than through their avatar?
– The recording set-up: were participants located in different places? (I think this is vaguely mentioned quite late in the paper)
ANSWER:
Thank you for asking for these clarifications. We have now added answers to these questions briefly in section 1 and expanded them in Sections 3.1 and 3.2.
COMMENT:
The next point concerns the different perspectives involved in these “fractured ecologies” (Luff et al. 2003). It seems to me that the analyses take an all-inclusive perspective in terms of access to participants’ actions: that of analysts who have recordings of the embodied of all participants’ in their distributed physical spaces. The transcripts as well as the analyses below each excerpt include not only the activities taking place in the virtual world, but also “each team member’s gaze direction, facial expressions, body movements and hand movements on the keyboard”. But from what I understand, players do not have access to each other’s physical space (and the analyses do not show that they orient in any ways to each other’s embodied conduct). So, I wonder how much of this should be included in the analyses (although it’s available in the data). Perhaps focusing on what happens in the virtual world would work better, especially if the focus of the paper, as the introduction states, and the criterion for the classification of extracts in two different processes, is embodied conduct in the game. I think this issue is the most challenging one to resolve, yet I would like the authors to give it serious consideration as it might shed different lights on the phenomena under study.
ANSWER:
This comment made us realize how our transcripts, which included actions in the RL videos as well, can indeed be misleading in terms of what the participants can actually orient to. Even though RL videos are a necessary tool for the analysts (e.g., seeing when a questionnaire is completed), showing them in the transcript may give an impression that we, as analysts, assume that the participants have an access to them as well. We certainly do not assume the participants to have access to the embodied RL actions and do not want to give this impression to readers either. Thus, we have removed most of the RL actions from the transcripts. We left only certain RL actions to inform readers about the moments when questionnaires are completed and some actions that occur in RL during silences when not much seems to happen in the virtual world. We now explain in section 3.3 that RL video transcripts are relevant only for the analysts.
COMMENT:
Minor points
As much as I agree that the Jefferson + Mondada transcription method is the most heuristic for analysis, and extensively use it myself, I find the transcripts very hard work for the reader, because of the type of data and phenomenon. Even I gave up on getting to grips with each of the excerpts. That’s why I would suggest the use of graphic transcripts (see Laurier 2014; and for an example of use: Laurier, Brown and McGregor 2016). They do not require extensive work, and more importantly (an issue too often neglected), this would make the paper accessible, and enjoyable, to a much broader audience than CA-specialists only. Or, if the authors choose to keep this type of transcripts (which I really don’t recommend), they should at least lighten them as much as possible.
ANSWER:
It is true that even though we aimed for readable transcripts, they were rather difficult for the readers to comprehend. In line with this comment, we have now simplified our transcripts by removing the majority of RL video transcriptions, by sharpening our expression, and by grouping embodied actions per each participant. In addition, we included both a Mondada style transcript and a graphic transcript of each data excerpt (e.g., Excerpt 1a and 1b). We hope that reading our data excerpts is now more enjoyable, and that the transcripts are more accessible to non-CA-specialists, too 🙂
COMMENT:
You could mention in the introduction (around line 26) previous work in CSCW on technologies that enable unfocused gatherings, studies of “media spaces”: substituting for copresence in workplaces and offering the possibility for “gatherings” at a distance. Heath and Luff (1992) highlights the difficulties of transitioning from co-presence to interaction, might be interesting to discuss in relation to your focus.
ANSWER:
Thank you for reminding us about Heath and Luff’s article (1992). We have now incorporated it in the text, in Sections 1, 2.2 and 5.
COMMENT:
The section on face-to-face initiation of encounters (in 2.1) need not be so long and so detailed considering the present study does not build on this area of research.
ANSWER:
We have now slightly shortened this section. Even though our article is not about face-to-face encounters, we build on these studies, too, to find out what is special and noteworthy in the VR interactions we study.
COMMENT:
I would suggest removing reference to a personal “submitted” paper.
ANSWER:
This article has just been accepted for publication. We have updated the reference.
COMMENT:
For further consideration…
The paper brought to my attention a question I find interesting regarding interactions in virtual worlds: how much of avatar’s embodied conduct is produced and treated as intentionally public and socially meaningful. Integral to Goffman’s approach is the fact that we, humans, perceive from within our bodies and have an immediate control of our bodies (leaving aside what we ‘give off’, as the authors mention). In VW, on the other hand, the mediated and most probably partial characters of participants control over their avatars’ embodied conduct lets us wonder how much socially meaningful embodied behaviour can be; and it is not either perceived from avatars’ eyes.
I am not sure how amenable to EMCA research this area of questioning could be, but for the least I was wondering if/how/when players actually treat the embodied conduct of other players’ avatars as interactionally significant and relevant, as potential ‘displays of intention’ (see, e.g., Smith 2017). In the paper there is a hint to that (paragraph lines 295-316) from the perception side (“the user’s ongoing activities are still far less obvious for others to detect than they are in real life”), but the production side is also intriguing. Would it be possible to include in the paper a sort of description of how easy or difficult it feels to control and direct one’s avatar?
ANSWER:
We find this question of intention very intriguing. In our data, there are examples of unintentional avatar behavior that leads to re-opening an encounter (e.g., Excerpt 4). We now mention intentionality and propose future research about this phenomenon briefly in Section 6.