Licoppe Verdier and Dumoulin 2013: Courtroom Interaction as a Multimedia Event
Electronic Journal of Communication

Volume 23 Numbers 1 & 2, 2013

Courtroom Interaction as a Multimedia Event:
The Work of Producing Relevant Videoconference Frames in French Pre-Trial Hearings

Christian Licoppe and Maud Verdier
Institut Mines/Telecom Paristech
Paris, France
Laurence Dumoulin
NS Cachan
Cachan, France

Abstract: In this article, the authors discuss the uses of camera motions and video communication in a courtroom setting. Recent evolution in the technology of video communication systems has made the issue of camera motion more central. Further, the use of videoconference systems has become almost pervasive in French courtrooms today. The uses of videoconferencing in judicial settings are studied from an interactional perspective, using Conversation Analysis and Situated Action perspectives to understand the common sense interactional concerns members orient to regarding their handling of the camera and to demonstrate how the sense making and practical procedures they rely upon are tightly articulated with the sequential organization of courtroom conversation.

The different uses of videoconference systems and media spaces began to be extensively studied in the 1990s. The focus of early research was on understanding how video links could support distributed team collaboration and informal professional meetings (Fish et al., 1992; Gaver, 1992; Dourish et al., 1996) and on the kind of interaction problems that were raised by video communication, such as the turn-by-turn organization of talk and its coordination with gaze and body orientation (Heath & Luff, 1992; Relieu, 2007), the “frailty of the interaction frame” (De Fornel, 1994) and the difficulties raised by the accomplishment of pointing and other embodied performative actions in “fractured ecologies” (Luff et al., 2003). In most of these studies the u sers were “mobile” in front of the cameras, but the cameras were either fixed or not moved. Where cameras were mobile, the focus was on their motion during video communication as a resource to show task-relevant features of the users’ environments and to constitute “video as data” (Mondada et al., 1996), with the orientation of the camera providing access to a shared field of interaction (Whittaker, 2003; Mondada, 2007).

Recent evolutions in video communication systems have made the issue of camera motion more central. In everyday settings, it is possible to re-orient the camera with little effort because of the mobility and portability of video devices (e.g., laptops and webcams in Skype interactions, mobile phones in mobile video calls), thus making the question of what to show at any one time relevant in itself (Morel & Licoppe, 2009). In professional systems and telepresence rooms, the devices are much less easily moved, but the camera can usually be oriented within a rather wide angle. Also, a discrete set of camera orientations can be pre-programmed on the remote control, which is particularly interesting in multi-party settings such as the one we will be studying here. So the subject of camera motions, and more specifically the relevance of camera motions with respect to the ongoing interaction, has become a topic in itself.

The object of this article is to discuss the uses of camera motions and video communication in a courtroom setting. Our ethnographic research project is related to the historical fact that videoconference systems have almost become pervasive in French courtrooms today. In 2007 the Ministry of Justice, under pressure to demonstrate is ability to cut down on running costs, initiated a program to equip every court and prison in France with this technology. The Ministry strongly encouraged judges to use the equipment to cut down, among other things, on mobility costs in certain types of hearings. In these cases, the prisoner may “attend” the proceedings from the videoconference room in his or her prison, which cuts down on the costs related to his or her being taken to court for a day under police surveillance (Dumoulin & Licoppe, 2011). Videoconference technology was thus used in France as a tool for managerial reform, in line with the more general trend pushing towards the rationalization of management in judicial administrations worldwide, with information and communication technologies being a key resource to achieve this (Fabri & Contini, 2001). In practice, only some judges in France have implemented videoconferencing in the workings of the hearings they preside, with many others remaining reluctant to do so.

The uses of videoconferencing in judicial settings have not yet been studied from an interactional perspective and we focus here on an analysis of the way the camera is moved in the courtroom to accomplish relevant social actions. The participants in our study have not received any training on how to carry out this kind of camera work and there is little if any institutional “stock of interactional knowledge” (Peräkylä & Vehviläinen, 2003) they can draw upon to guide them and help them understand the relevance of such camera work to courtroom interaction. We will use a Conversation Analysis and Situated Action perspective to understand the common sense interactional concerns members orient to regarding their handling of the camera (such as “showing the current speaker on screen”) and to demonstrate how the sensemaking and practical procedures they rely upon are tightly articulated with the sequential organization of courtroom conversations.

The presence of the video link and the possibility to move the camera have implications with respect to the kind of work that has to be done in the courtroom. The introduction into the courtroom of screens and live recording systems for high profile trials has been shown to have bearings on the work done by judges at a cognitive and organizational level (Lanzara & Patriotta, 2001). Having remote participants and videoconference screens also introduces additional tasks and competence requirements for the legal professionals involved. This puts some strain on their usual routines for managing courtroom proceedings, as we will try to show in detail. They have to become “videoconference literate,” that is, to behave as producers, filmmakers and editors of sorts, who must articulate the video images they produce of the unfolding courtroom interaction on a moment to moment basis. When remote participants attend through a video link, the judicial hearing becomes a multimedia event that has to be monitored, staged and produced, and that must ceaselessly be oriented to as such.

The Research Setting

The research was conducted in a regional Court of Appeal (“Chambre de l’Instruction”) in France. The Chambre holds mostly pre-trial hearings in which defendants who are in custody appeal against the decision to be remanded in custody during the inquiry, the preliminary investigation often still being in progress. The Chambre decides whether or not to release the defendant. The defendant appears before a court composed of the president of the court, two other judges who assist him and the prosecutor. The three judges decide on the sentence. There is no jury. The Court of Appeal of Rennes has jurisdiction over about one sixth of France, so the defendants are often incarcerated in prisons that are at a distance of three hundred kilometers from the Court. The government has pressed the regional courts to cut down on prison transportation costs and to use videoconferencing as much as possible. Pre-trial hearings, which are short, functional and do not judge the facts of the case, have been targeted as a field of application for this technology, but it has spread very unevenly because of the magistrates’ reluctance to engage with the use of this technology. However, the practice has caught on in this particular courtroom we studied, because the presiding judge there is favorable to technological innovations and has himself developed the skills for handling the remote control of the camera during hearings. On a typical Thursday morning at the Chambre de l’Instruction,” 3 to 5 defendants might appear from their prison through a video link, and a similar number be present in the dock. The videoconference cases are usually debated first in order to avoid planning troubles due to an accumulation of delays.

When videoconferencing is being used, the courtroom becomes spatially distributed (Figure 1). The videoconference system in the courtroom is composed of a large plasma screen placed on the left side of the bench with a camera on top of it.

In the courtroom, the videoconference camera is placed just above the computer screen.
Figure 1. The Position of the Screen and Camera in the Courtroom (from left to right: the usher, standing, the clerk, one of the two deputy judges and the presiding judge).

When the video connection is working, as in the picture below, the screen is split in two, with the image from the prison on the left, and the control image of the courtroom, which is also the one that the defendant is watching in prison, on the right (Figure 2). The president moves the camera according to what is going on, sometimes providing a broad view of the bench (Figure 2a) and sometimes focusing on a participant, such as the prosecutor when she is making an accusatory statement (Figure 2b).

Prison view Courtroom view Prison view Courtroom view
a) The defendant and his counsel watch and listen to the prosecutor who is speaking in the courtroom. b) The defendant and his counsel watch and listen to the prosecutor who is speaking in the courtroom.
Figure 2. The Videoconference Screen in Action.

Most of the time, the president in charge of this Chambre de l’Instruction chairs the debates and handles the remote control at the same time (this will be the case in the extract 1 below). When he is not chairing and the deputies who replace him do not want to manage the video, this task is taken up by the usher (see the section “Embodied patterns in the usher’s orientation towards producing a proper video frame” for an example). The camera is mobile, but only enough to record part of the room at a time. For instance, it is unable to show all of the public attending the hearing.

We were able to observe and video-record the public proceedings of this courtroom over a year and to constitute in this way a video corpus of about sixty cases.

Camera Motions as a Resource for Courtroom Interaction: Referring to and Showing a Third Party

As the camera can be moved (though its rotation is not wide enough to show the whole room) and zoomed, it constitutes a resource for the president to show various features of the courtroom and particularly various persons attending the hearing. It has thus become routine for him to mention the presence of the defendant’s counsel at the start of the hearing and to show the counsel on screen at the same time, so that the latter may become visible to his client for the first time[1]. How are such sequences accomplished, and what does their organization tell us about the use of videoconference technology in the courtroom?

Courtroom Interaction as Multimedia Performance: Staging the Appearance of a Third Party

The following extract provides an example in which the president uses the video to make the counsel visible.

Extract 1

P is the presiding judge, D is the defendant (in the videoconference room of the prison), and C is his counsel. Names have been changed for anonymity purposes and we use the transcription conventions of Conversation Analysis (Jefferson, 2004), with two supplementary markers: star signs are used to mark camera motions and to place them with respect to the conversation. One star (*) signals the beginning of a camera motion and two stars (**) when the motion stops. Pound signs (£) are used to signal significant changes in the participants’ embodied conduct.

Figure 3. Control image of the court before the judge begins to move the camera towards the counsel (Line 5). This is the image of the court that is also visible from prison. It appears in the courtroom as the right half of a large screen (the other half of the screen shows the image from the prison).

Figure 4. Images on screen at the end of the second part of the panning movement to the right (end of line 8), which makes the lawyer appear on screen for the first time.

Figure 5. Images on screen after the multiple frame adjustments in lines 11-16. A larger and closer view of the standing counsel can be seen.

The greeting in line 1 was preceded by the audio and video connections and various preliminary exchanges - some involving the warden - during which the defendant was already present. The greeting itself is produced with the object of moving the interaction forward with respect to this particular type of institutional meeting. The president then prefaces a camera motion by announcing that he will be showing someone. The announcement is immediately followed by a rapid camera rotation towards the defendant’s counsel, and conversationally by hesitation markers, a possessive with a lengthening of its end (which may also count as a hesitation marker), a slight pause, and then finally an explicit nominal reference to the counsel. Such a temporal organization of the turn suggests that the orientation towards reframing the image and showing the lawyer - a kind of typified and routine action - comes first, before the president is able to muster the particulars of the situat ion (i.e. that is, here, the name of the lawyer). The production of the name of the lawyer is delayed further by a relational reference “your excellent counsel Maître Petit,” which emphasizes the relationship between the counsel and his client. Interestingly, the camera motion is done in two separate moves, the first at the beginning of the turn constructional unit and the second near its end. One consequence of this is that the lawyer appears on screen at the precise moment that his name is uttered. This emphasizes the way the actual visual appearance of the lawyer constitutes a multimodal accomplishment, a mundane multimedia performance which interweaves the design of turns-at-talk and camera motions.

We have seen that the panning motion of the camera towards the counsel was achieved in two steps. The image actually freezes in an intermediate position in which the counsel is not yet visible and, in the middle of line 8, at a moment in which the flow of the turn also breaks down (lengthened discourse markers and pause). After this, the president makes several successive small corrections to the counsel’s video shot (Lines 13-23). The start and end of these video frame adjustments also contain pauses, breaks, hesitations and repeats in the developing turn of the president (for instance in lines 15 and 17). The production of a proper turn-at-talk is a multimodal accomplishment that is sensitive to the changing contingencies of the situation in which it unfolds. Here, a specific contingency is the fact that the president is engaged in another stream of action, that is, producing an appropriate video shot, while he utters his turn-at-talk. Some of the hesitations an d pauses seem to reflect the fact that he also wants to initiate or end a camera motion at the same time. Conversely, the placement of some of these camera moves suggests it is easier or more convenient for the president to initiate them when the immediate constraints on producing a relevant turn-at-talk are in part relaxed, that is, during pauses and hesitations. Talking while moving the camera is more than just the juxtaposition of two separate courses of action, and their temporal articulation on a fine time scale testifies to the kind of strain that this particular form of multi-activity generates. It is characteristic of the situation in which the judge both presides the courtroom hearing and produces/edits it as a multimedia performance, as he is handling the remote control.

The kind of multimedia event the judge is orienting towards is the appearance of the lawyer on screen at the moment in which he is referring to him. Appearances are a specific kind of mediated event that can be accomplished by many means (Licoppe, 2012) and which count as a recognizable type of social action with sequential import (Licoppe, 2010). The recognizable appearance of someone in a given situation is a noticeable event produced as such, and it offers a slot for conditionally relevant responses. In situations in which a person appears on screen through a video link, the most relevant response is the production of a greeting, usually initiated by the party the person appears to. For instance, in an early domestic videoconference experiment in which the video connection was only made after the audio connection, openings involved two pairs of greetings, one after the audio connection, when the participants had sta rted to talk but could not yet see one another, and another when they became mutually visible, even though they had already greeted one another once (De Fornel, 1994). Greeting the person who appears seems to be a proper sequential response to his or her appearance on screen. The situation here is slightly different for two reasons: a) the appearance is announced and initiated by another person than the one who becomes visible; b) this is an institutional setting in which the turn allocation system is constrained, which makes it difficult for the defendant to “simply” self-select and greet his lawyer. In the courtroom, when the judge turns the camera towards other parties, either the counsel or family members, and so forth, he also refers to them verbally, and that verbal reference is like part of their appearance, just as their image is. What usually happens then is that these persons produce token verbal or nonverbal gestures (hand waves, nods, and so forth) to greet the remote participants. So greetings still appear relevant in this setting for video appearances, but it is the person who is being referred to and who visually appears who initiates them. Their minimal character seems to orient to the fact that they constitute a side sequence embedded in the judge’s current management of the courtroom interaction.

In the present case, the defendant only produces a repeated nod at line 10. As it is placed after the judge has announced that he is showing the lawyer, it can be interpreted as a token of acknowledgement. At the same time, as it is placed at the moment the counsel appears on screen, it could also be interpreted as a token greeting. This shows how two articulated orders of sequentiality are potentially relevant in the production of actions in such video environments, that is, the sequential organization of “talk-in-interaction” and that of “video-in-interaction.” The lawyer does not provide any recognizable response. He was looking at the judge during the defendant’s brief nod and did not have access to a microphone then, which might explain why he does not produce any recognizable form of greeting. It is the judge himself who produces a kind of indirect greeting after a short moment has elapsed (line 13): “il vous salue (he g reets you) …” which takes the unusual form of the report of an event which has actually not occurred. It displays the production of a greeting as noticeably absent and points towards its proper sequential position (just after the video appearance, while the judge mentions that the lawyer has traveled all the way to the courtroom). This confirms our analysis of the relevance of a greeting by the person who is thus presented on screen in this setting. It also shows how the judge is attuned to the sequential implications of the mundane and minute multimedia performances that he is “staging.”

Managing the On Screen Appearance of a Participant

The judge also does some interactional work to shape the potential reception of the images he is producing. For instance, he mentions that the counsel’s greeting he is “reporting” is “performed” “under our gaze” (Line 13), an ambiguous reference that could at this point mean the court professionals as well as the whole attendance in the courtroom. In all cases, the referent group indicated by the use of the first person possessive lies outside of the video image shown at that time. It is the fact that these people are visually unavailable which makes it relevant to state explicitly that the counsel’s “greeting” is a public gesture performed in front of an attendance. The judge’s statement makes salient a central interactional property of videoconference settings, that is, that much of one of the participant’s context is visually unavailable to the other and vice and versa, thus breaking rout ine expectations about the reciprocity of perspectives (Heath & Luff, 1992). The judge’s utterance acts as a possible reminder of this fundamental source of interactional asymmetries in videoconference settings. It provides an interpretive framework for how to read this particular video moment and more generally a template for how the defendant should read the future video images of the hearing that will be made available to him: he should read everything that will happen on screen as performed in front of a partly invisible audience.

Moreover, the judge indicates that he considers the video images should be interpreted with the idea that the defendant is “present” in the courtroom, as he makes clear in lines 13-15 (“since of course (.) er you have understood it you are in the courtroom”). The instructions on how to read the video conference images and establishing that the defendant’s is (tele-)present in the courtroom are posited as mutually constitutive. It is because he is actively attending a hearing that the defendant should read the images the judge is producing as public performances on screen accomplished in front of an often invisible audience. It is when the defendant is able to read the images in this way that he is able to experience being somehow “present” in the courtroom, which would not be the case if he thought that he was just “watching” his counsel on a TV screen.

So in this very short sequence and very few words, the judge uses the video and the possibility to move the camera as a resource to orchestrate the visual appearance of the counsel. He displays an orientation towards a) the sequential import of such a multimedia performance (that is, it projects a greeting as a relevant next); b) designing the reception of the image he produces in a way which makes instructionally salient a proper interpretation of the images and the idea of the remote defendant being “present” there in the courtroom. The judge therefore acts here as a producer, filmmaker and editor all in one: he orchestrates the continuous production of video sequences and works to frame their reception in a way that is relevant to the production of proper courtroom participation frames and interactional sequences. All this is done through an artful articulation of video and conversational resources.

The Work of Providing a Proper Image On Screen during Distributed Courtroom Interaction

The sequence we will be examining next occurred another day in the same court. It is a kind of deviant case with respect to the spatial organization of participants, for it was the only time in our observations in which the defendant appeared in the courtroom and her counsels appeared from a remote site at their request and as authorized by the judge. It is an example of how the availability of videoconference technology in the courtroom has become a resource to manage various mobility-related practicalities and to produce configurations of spatial distribution in the courtroom which were not really foreseen by the legislation. The situation was made even more unusual by the fact that the presiding judge was not present (he could not sit at the hearing because he had been associated with the case at an earlier stage) and one of his deputies was in charge of the proceedings. Unlike him, she was reluctant to handle the remote control, and she had delegated the task of man aging the video frames to the court usher.

During her long summary of the case, the president had several exchanges with the defendant to clarify various points. As it was impossible to produce a wide enough view of the court to show both the judge and the defendant at the same time, the usher was faced with a dilemma: either to leave the camera fixed and focused on one of the participants (but then which one? The presiding judge asking the questions or the defendant answering them?) or to try to move the camera in a way that would be somehow relevant to the unfolding interaction (but what does relevant mean here?). The usher chose the latter, apparently on his own initiative, thus taking an active part in the production and constant editing of the video frame. We will analyze the way in which the usher managed the production of appropriate video frames for the remote counsels in part of this long sequence (it lasted about half an hour) of questions and answers between the judge and the defendant.

Embodied Patterns in the Usher’s Orientation towards Producing a Proper Video Frame

In this section we will focus more closely on the usher’s embodied behavior in extract 2, during a moment in between two camera moves (see next section below). During this part of the sequence (lines 1-22 below), the image on screen is that of the court (Figure 6a) and the usher is leaning near the screen with the middle and lower segments of his body oriented in the direction of the judge (Figure 6b). In this particular ecology, this is the only orientation which allows him to rapidly switch his gaze orientation between the judge, the defendant, the clerk (who sits at the end of the table) and the screen by just turning his head (Figure 6b).

a) The images shown on screen during extract 2. b) The middle and lower body orientation of the usher at the same time (Lines 1-22). His involvement with the ongoing talk will be displayed by shifts of the head.
Figure 6. Images relating to extract 2.

Extract 2

P is the presiding judge, D is the defendant (who is in the courtroom), U is the usher, Cam refers to the camera recording what goes on in the courtroom.

Figure 7. The usher turns his head to look at the defendant after her repair initiation (Line 9).

Figure 8. The usher turns to the judge and back twice, during the turn in which the judge justifies her initial claim (Lines 12 and 14) in response to the defendant’s potential challenge (Line 9).

a) The usher turns his head towards the judge when she self-selects to pursue (Line 16). b) He turns his head back towards the defendant when she begins to speak in she begins to speak in overlap
Figure 9. The Videoconference Screen in Action.

Figure 10. The usher orients his whole body towards the screen at a juncture in which it is visible that the defendant has won the turn (Lines 20 and 22).

In the part during which his lower body and torso are oriented towards the judge, he turns his head to look at the defendant (Figure 7) just after her turn designed as a repair initiation (Line 9) which can be heard as a challenge and is treated so by the judge. His head motion is produced as the judge utters a marked agreement, which in effect reasserts her initial claim and thus can be heard as a counterclaim (Line 10). His head then switches to and fro twice between the judge and the defendant (Figure 8) when the judge begins to elaborate a justification in support of her counterclaim, finishing on the defendant as the judge comes to a projectable end of her turn construction unit. He then turns back to look at the judge (Figure 9) when she self-selects to go on (Line 16) and back again to the defendant when she starts to justify her challenge in overlap (Line 18). Soon after this, the usher orients h is whole body towards the screen (Line 20, Figure 10) at a point where it has become perceptible that: a) the judge has relinquished the turn, as evidenced by her lowering her voice as the overlap extends and her stopping in mid-sentence (Line 16); b) the talk of the defendant who has just won the turn projects some continuation, since she begins her probable justification with a subordinate clause (Line 18). This reorientation of the usher’s whole body can be seen as a preliminary to his orienting the camera towards the defendant (Line 23 and discussion in the next section below).

Taking into account the direction of these head moves towards previous or current speakers with respect to the talk underway, this suggests that the usher engages in a particular form of multi-sensorial inspection of the situation at moments in which it is difficult to anticipate what the talk projects and in which the participants whom he is gazing at are plausible candidates for the next turn and thus may become current speakers on whom he may have to focus the camera. This is a more “active” and engaged kind of scrutiny than his usual aural monitoring (which does not seem to involve looking), and he is probably using all available interactional resources to find relevant cues for moving the camera during the ongoing interaction. His gaze thus switches between the participants during the judge’s justification of her initial claim (Lines 12-14). In such an argumentative environment it is likely that the defendant will provide a justification of her ow n at the first sequential opportunity (which she does, in line 20, though in overlap with the judge’s talk). He also moves back to the defendant when she begins to speak in overlap and orients towards the screen to change the image and get her on screen when it becomes clear that she has won the turn and that she is engaged in a justificatory turn, the content and design of which is potentially relevant to her counsels, and which sequentially does not project a foreseeable end. The fact that his gaze alternates in a way that is so adjusted to the sequential organization of “talk-in-interaction” displays his special attentiveness to who the next speaker might be and to what kind of turn she might produce. It also shows how such an attentiveness is oriented to the issue of producing relevant video images and to how it is embodied, for the usher behaves as if he has to look as well as listen.

The gazing practices of the usher therefore make clear that there are two possible ways for him to react to the proceedings in the courtroom. The first is simply to monitor what is going on aurally, which does not require him to gaze at any particular participant. This mode of moderate engagement is particularly relevant when he is not about to have to take action and corresponds to the way he behaves in the courtroom most of the time. However, at “critical” moments[2] in which a potential camera motion becomes a salient option, he engages his attention on what is going on in a more focused way, by simultaneously listening to and gazing at a relevant participant, relevant in the sense of the participant being a possible candidate for an upcoming turn. These gazes therefore occur at moments when the usher may soon have to accomplish camera-related actions and the current talk-in-interaction does not c learly project relevant transition points or next speakers. So he has to perform a kind of “active listening” (Hutchby, 2005), in this case “actively listening-and-gazing,” in order to catch conversational cues that he can treat as opportunities to trigger camera motions. This in turn makes visible and public his current understanding of the sequential organization of the courtroom interaction underway.

His “active listening-and gazing” and the camera motions he initiates are tightly coupled. One could say he is scanning the sequential organization of the conversation for “affordances” (Gibson, 1979; Hutchby, 2001) in the conversation, which makes immediately relevant the production of changes in the camera orientation. Even if the sensemaking procedures he relies on are common sense for everyday conversationalists (probably supplemented by a specialized stock of inferences regarding the institutional organization of courtroom proceedings with which he is quite familiar, thanks to his work), the fact that this particular mode of engagement is marked as special by the presence of oriented gazes is significant. He is also perceiving the talk with respect to the demands related to the moment to moment provision of proper screen views. He is acting and “listening” here also as any produc er and editor of multimedia performances would do, a stance which requires to be simultaneously able to watch the scene and visualize how it might look on screen, so as to maintain a proper articulation of both at all times. Building on Charles Goodwin’s argument about “professional vision” (1994), one could say he accomplishes and makes visible a kind of multimedia-oriented professional way of attending to the contingencies of courtroom interaction in the presence of videoconference technologies.

Entangled Sequential Orders: Talk and Video Frame Production

We will now analyze the continuation of this sequence and more particularly focus on: a) how the usher’s camera work displays an understanding of the talk-in-interaction, and b) a normative orientation towards putting the current speaker on screen. The irony of the whole sequence is that such an orientation puts him in an interactional quandary in this very special configuration of the distributed courtroom.

Extract 2 (continued):

Figure 11. The screen as it figures in line 22. During the entire sequence, the usher only modifies the right half of the split screen, figuring the court.

a) b) c) d)
Figure 12. At the beginning of the defendant’s justificatory turn (Line 21), the image is on the judge as in a). The usher then moves the camera slightly towards the right in the direction of the defendant until the end of her turn, reaching the intermediate frame figured in b). Then when the president self selects (Line 24,) after another small motion to the right, as in c), he brings back the camera towards the president as she proceeds with her turn, going back to the original camera position (Line 28) as in d), without having put the defendant on screen at all.

Figure 13. The usher makes a swift and ample camera motion to put the defendant on screen as she answers (Lines 33 to 36).

Figure 14. The usher makes the reverse sweeping camera motion to put the judge on screen as she looks at the file (Lines 41-43).

The simple visual inspection of the video sequence immediately shows that the camera alternates, or tries to alternate, between two fixed and pre-determined positions, one focused on the judge and the other on the defendant. For instance, the usher begins to move the camera towards the defendant just after the start of her turn at line 33. This camera motion takes some time and it is only achieved at the moment the defendant produces a probably final turn construction unit (TCU), that is, her “no” at line 36 (Figure 13). The judge then starts to speak and utters what will actually unfold as a rather extended turn, starting at line 38. The camera only moves back towards the judge when she is well into her turn (at line 41) and stays on her thereafter, making it visible that the proper camera position is one that shows the judge while she speaks. This provides a first piece of evidence for some orderliness in the production of a proper vide o frame: the usher is trying to focus the camera on the current speaker. He orients towards a specific common sense interactional concern, that is, to “show the face of the current speaker.” The efforts of the usher to conform to that normative orientation are made more remarkable by the fact that the president is openly willing to go on with the just the audio when the camera is not focussed on her, thus making it clear that the video frame is not that important to her. However, the usher’s conduct can be better understood if one takes into account the fact that such a concern is at the heart of the organization of video-in-interaction in everyday uses of videoconference communication in which the camera is mobile, as in many instances of Skype or mobile communications (Morel & Licoppe, 2012).

Producing a proper video frame is therefore grafted on the issue of who the current speaker is and when the next speaker is going to start, that is, with sequential concerns proper to the organization of talk-in-interaction. From its inception, Conversation Analysis has concerned itself with treating who will speak and when as an endogenous feature of the orderly management of conversation (Sacks, Schegloff & Jefferson, 1974; Schegloff, 2007). Producing a proper video frame can be a tricky issue, particularly in multi-party settings with remote participants, for its management relies on the scrutiny of what the ongoing talk projects in sequential terms. It is therefore built on the same common sense resources as the sequential organization of conversation.

Conversely, the way the camera is managed to produce a proper video frame makes visible the way the sequential organization of the ongoing talk is understood. In our case, what the usher does and the camera motions he initiates displays his understanding, to the courtroom attendance and analyst alike, of who the current speaker is, for how long the speaker might talk and who the next speaker might be, based on inferences built on the sequential organization of turn-taking. For example, what sense does it make for the usher to move the camera towards the defendant at the very start of her turn in line 33, when she has not said enough to give a clue about what she will say? However, it happens that such a turn is uttered in an argumentative environment: the preceding turn of the judge (Lines 21-32) questioned her own perception of herself when she has been drinking. Such a counterclaim offers a slot for an account (Antaki, 1994; Coulter, 1990). So the judge’s preceding challenge conditionally projects a rather extended response by the defendant as relevant, for the latter will need to account for her conduct and possible inconsistencies. Therefore it makes sense in sequential terms for the usher to begin to move the camera at the very start of her turn. However, it happens that the defendant concludes her turn just after the camera has come upon her. When the judge starts her own turn, the usher does not move the camera immediately. Again, the moment he does so again makes sense in sequential terms. It is difficult at the start of her turn to anticipate her reaction, since as the president she has several options in the management of the dispute (Antaki, 1994). However, since she begins her turn with a preface stating that she will move on to a summary of another element of the file, which projects a rather extended turn, it makes (sequential) sense for the usher to begin to move the camera when the judge has uttered enough of her preface for it to be recognizable as such. And this is exactly what he does.

The camera motions make visible and public the practical reasoning of the usher with respect to the sequential organization of the current conversation, as well as the fact that the latter is the main resource he can build on to produce a proper video frame. Talk-in-interaction, however, unfolds in a non-deterministic way and projected occasions for transition may not materialize or be seized upon (Sacks et al., 1994), so that camera motions are often out-of-synch with respect to turn transitions. During the utterance of the defendant at lines 21-23, the camera comes upon the defendant only at what proves to be the end of her turn, so that for most of it she was never on screen. The camera is on her at the start of the president’s next turn, and it is only because it is extended that the usher has the time to get the camera back on the judge while she is speaking. In line 21, the camera starts to move towards the defenda nt (which makes conversational sense since an account is also expected there and she has started with a clause that projects further developments), but when she finishes the camera is not yet on her. At that point it is the judge who produces a turn, which projects further developments since it is framed as a counterclaim (Line 24), so it makes (sequential) sense for the usher to move the camera back towards the judge before even getting the defendant on screen.

The cases in which the camera motions are initiated at the precise accomplishment of the turn transition appear as felicitous and relatively rare exceptions (we only had one clear instance in this sequence which lasts half an hour). It is easy to understand why, for it can only happen when the talk-in-interaction which unfolds independently of the camera motion (at least in this setting) happens to have transition points that coincide with the end of a camera motion. Since it means the camera motion has to be started before the next speaker begins, this can only happen when unforeseen contingencies happen to allow it. It appears easier to get speakers on screen at the moment of their utterance, or at least for large chunks of their utterances, when the turns-at-talk get longer or the turn transitions are more constrained. The out-of-synchness of the video frame with respect to the ongoing talk-in-interaction is almost unavoidable because of the constitutive properties of the turn transition system in conversation. However, synchronization may occur more frequently when the courtroom interaction becomes highly institutional (Drew & Heritage, 1992; Heritage & Clayman, 2010), as for instance when the president gives formally the floor to the prosecutor and the counsels for their “requêtes” and “plaidoieries.” The degree of out-of-synchness of the video image with respect to the talk appears to work as an inverse index of the predictability and institutional character of the sequential organization of the talk underway.

The camera work done by the usher to provide a “proper” image displays the ceaseless entanglement and articulation of the sequential order of video-in-interaction with that of talk-in-interaction, as well as that of the normative concerns that are constitutive of such sequential orders (e.g., one speaker at a time and minimization of silence in the case of conversation, and showing the current speaker on screen in the case of videoconferencing). Both are mutually elaborative, and the efforts done by the usher to orient to both orders in the face of the ceaseless contingencies of courtroom interaction shows that distributed courtroom interaction must be considered as a multimedia performance that is accomplished on a moment to moment basis and the proper management of which may become particularly tricky in multi-party videoconference settings where more than three participants become relevant at the same time.

The presence of videoconference technologies introduces new tasks into the courtroom, because the hearing also has to be produced and edited as a multimedia performance, which may prove tricky when more than two participants become relevant to be shown on the same screen at the same time, as is the case here. The usher, having received no training for learning how to accomplish these new tasks in these particular situations, is therefore no more skilled at it than any other person present in the courtroom. These particular tasks introduce a specific cognitive and interpretive burden, which the gazing patterns and actions of the usher make visible. One can even surmise that what he endeavours to accomplish here could not have been done by the judge herself, for she would then have had to simultaneously produce the questions in the question-answer sequence and scrutinize its sequential organization, which as conversationalists we usually do unreflectively and without think ing (Coulter,2005), while at the same time inspecting the ongoing talk for its video-related affordances.


We have studied here the kind of work which is accomplished to provide relevant video images on screen during distributed courtroom hearings where some parties participate from a remote site by videoconference. We have shown how this kind of camera work is made possible by and relies on the fact that in most videoconference systems today the camera is mobile and can be oriented within a relatively wide angle. We have also shown that camera motions were required here because of a normative orientation towards having relevant participants to the ongoing interaction on screen, and especially current speakers or soon-to-be current speakers. In our first case, the lawyer was shown on screen at the moment in which his multimodal appearance made him relevant and opened a slot for him to produce a greeting. In our second example, the camera work of the usher displayed a constant effort to conform to the concern of showing the relevant speaker within a problematic configuration in which two participants were talking and only one could be shown at a time on the screen. Such a normative orientation is more general and also appears in Skype or mobile video communications. It is particularly significant here, not only because it is an institutional setting, but because courtroom hearings are multi-party interactions during the course of which (much) more than two parties can become ratified speakers. As we have seen in our two examples, this may occur within the same bit of interaction, and such occasions put strain on the camera person who must accomplish constant adjustments of the video frame. To be able to do this, the camera person monitors the ongoing talk to initiate relevant camera moves. This is because of the orientation towards the common sense interactional concern, which requires to “show the current speaker on screen.” The sequential organization of the video frames is tightly grafted to the sequential organization of talk-i n-interaction, the purpose of which is precisely to provide resources for the orderly production of turns-at-talk. The camera motions can therefore be treated as displays of the current understanding of the sequential organization of the interaction underway by the camera person. Articulating talk-in-interaction and video-in-interaction introduces additional cognitive and interactional burdens for that person, whether such an articulation is a resource for the production of a multimodal event, such as getting a counsel who was there, but not yet “fully there,” to “appear” verbally and on screen, or the site of a specific tension as in our second example, where showing two parties on screen was simultaneously relevant, but impossible to do with a single view. When it was the judge who was handling the camera while he talked, the strain of having to subtly adjust the conversation and video frames to stage the “appearance” of the counsel in the videoconference situation was reflected in the details of the way the judge produced his ongoing utterance. In our second example, it was displayed in the “active listening” of the usher, who was focusing aurally and visually on who was a likely candidate to be the next speaker, scrutinizing what was going on for affordances relative to camera motions.

This gets whoever is wielding the camera in the courtroom, be it the judge or the usher, to act all at once as a producer, filmmaker and film editor on a moment to moment basis with respect to the ongoing hearing. While the detailed management of the screen goes largely unnoticed in the courtroom, the difficulty of managing these new participatory role at the same time as their usual functions of producing and regulating courtroom interaction has made many judges wary of handling the remote control themselves, even if they are legally accountable for the way the hearing proceeds, to a point where some have become reluctant to rely on videoconferencing for their hearings. Moreover, no training has been arranged to learn how to accomplish such delicate tasks, nor have any guidelines been given to manage screen views, which is striking in such a sensitive institutional setting as a judicial hearing, where equity in terms of access and participation rights and resources is c rucial. Not to mention that in an economic context such as the present one, in which the Ministry of Justice is exerting increasing pressure on the courts for them to economize on the costs of moving defendants from prisons to courts. Since distributed judicial hearings that rely on videoconference technologies are and necessarily have to be staged and produced as multimedia events, and since there is very little professional stock of interactional knowledge available to legal professionals about the interactional management of such events, the type of research we have done here based on Conversation Analysis is particularly useful for identifying the relevant phenomena and their organization. Moreover, the developments we have observed in such “distributed courtrooms” are consonant with larger trends that suggest that we will increasingly be inhabiting a world of cameras and screens, the images of which will be brought to bear reflexively on all kinds of social occasions.


[1] There is a legal obligation for the president to facilitate interactions between lawyers and their incarcerated clients. For instance, if the counsel wants to confer with her client in prison and the counsel is in the courtroom, she may ask the president that everybody leaves so she may confer privately with the defendant. This happens relatively rarely, only twice in about thirty hearings in our setting. However showing the counsel at the start of the hearing when the latter has not asked for it is not part of these legal obligations.

[2] They are also made “critical” by his gazing. The special character of these sequential occasions and the relevance of his gazing at a participant are mutually constitutive.


