ACM - Computers in Entertainment

Improving Gameplay in First Person 3-D Games using Multiple Displays

By Anderson Maciel, Fernando Roman, Luciana Nedel

Improving Gameplay in First Person 3-D Games using Multiple Displays

Games are everywhere and, with the sharp improvement of graphics in the later years, a new challenge is to create better interfaces to amplify the sensorial experience of game players. In this context, the present work proposes a desktop-based CAVE system using a variable number of displays with dynamic angles between them. Our hypothesis is that such a system provides an improvement in the players' peripheral vision for 3-D first person games. This would even benefit the player performance. In our implementation we used augmented reality and computer vision techniques to calibrate the monitors. Graphics libraries (ARToolK ARToolKit it and OpenGL) have been used to detect and calibrate monitors within the same 3-D space and calculate the angles between them. The open source game AssaultCube has been modified to support 12 monitors and different camera angles. The game is not only entertaining, but has been implemented as a use case for user tests, with a configuration of three monitors. Tests have shown the desktop CAVE system allows for performance improvement once the players make significantly less look-around movements with the mouse while keeping the average number of kills and deaths favorable in relation to a conventional one monitor setup.



Video games and other digital games are becoming more and more present in people's homes and lives. Rich sources of visual content as advertising on TV, YouTube videos, and 3-D movie theaters have made the games audience pickier, forcing the game industry to offer similar visual quality in games. As a consequence, the entertainment industry is investing huge amounts of money in games that nowadays offer, besides impressive graphics, very dynamic virtual environments (VE) full of life and adventure. People want to be part of these worlds, and video games are a reachable window to these places. However, for the experience within such worlds to be satisfactory in video games, the visualization interface must be at least as good as the games themselves. Current technology offers TV sets and monitors with very high resolutions. Devices with vertical resolutions of 1080p or Full HD (Full High Definition), which correspond to 1920 by 1080 pixels, are very common today at sizes of over 50 inches.

All these advances contributed to increase the feeling of immersion and presence of a player inside the games' virtual [1]. However, immersion and presence are concepts borrowed from virtual reality (VR), a domain in which they are the ultimate goal and specific devices are available to provide them. Devices such as head mounted displays (HMD) and cave automatic virtual environments (CAVE) [2] could also be used in games to improve the feeling of immersion and presence to players. Both are complex systems that require expensive equipment and large physical spaces, which are acceptable in industrial applications or exhibitions but unviable for the average player and even for the extreme player.

In this context, the present work proposes the implementation of a display system for desktop games based on a set of conventional monitors to build up a kind of desktop CAVE. A standard CAVE configuration with 90 degrees angles between screens might be too claustrophobic for a desktop configuration, the system also allow arbitrary screen angles with automatic camera calibration. In this work we also propose an evaluation of our desktop CAVE system in the context of a first person shooter game (FPS). An open source game, AssaultCube, has been chosen and adapted to take full advantage of the desktop-CAVE system. We also aim to evaluate how the proposed system can improve immersion, presence, and gameplay in such games as FPS or race games where peripheral vision seem to play an important role. We perform tests in which a number of aspects are observed, as shooting accuracy, number of deaths, number of kills, look-around motion, etc. We hypothesize an improvement in player performance based on the fact the field of view is increased, which amplifies the peripheral vision, allowing the player to perceive the presence of enemies more quickly than they do through one single monitor, or even more than monitor aligned in the same plane. Moreover, by increasing immersion, such approach should also provide a more pleasant gameplay. 

The system is implemented using conventional LCD monitors to build up a CAVE-like multi-display where a conventional setup consider one frontal and two lateral screens, as can be shown in Figure 1. Varying angles between monitors, as well as the amount of monitors define which regions of the VE will be shown. A calibration system based on a webcam and computer vision techniques is used to automatically calculate screen angles. Finally, user experiments compare players' performance using the three monitors CAVE with the performance using a traditional one flat screen setup. 

Figure 1. Desktop-CAVE with three monitors in an open angular configuration.

In Section 2 we present the main VR concepts used in the work as well as related works in games. Section 3 presents the system design, introducing the qualitative parameters that guided the project of the desktop CAVE in contrast with other visualization systems. Section 4 presents the system implementation, including necessary hardware, software libraries and engines used and also how virtual cameras are calibrated from monitor angles. In sections 5 and 6 we describe and discuss user tests planning and execution as well as an analysis of the results. Conclusions and future works are addressed in Section 7.



The development of computer graphics started a few decades ago and has evolved ever since. In the games world, such evolution brought realism to the virtual environments, making the virtual world more and more realistic to users and players. This level of realism is only possible due to more faithful graphics and advanced physics-based simulation. However, besides graphics and physics, new interfaces for visualization and interaction have increased the feeling of immersion and presence in games and other 3-D virtual environments today [3, 4]. A brief review of the evolution of games and interfaces is presented in this section to contextualize the reader before we describe our desktop-CAVE interface in sections 3 and 4. 

2.1 Games  

The games market moves billions of dollars and has already overtaken the movie industry. Both PC games and console-based video games attract more and more people and companies interested on a share of this profitable market.

The first video game ever was created less than 50 years ago by a group of students in the MIT. The Spacewar! ran on a DEC PDP-1 in 1961. In the game, two human players fight against each other controlling a spaceship that could fire missiles. The game has later been distributed with DEC computers and was the first to ever be played by people not directly involved in the project. 

In the 1970s, the golden age of the arcade games, Nolan Bushnell and Ted Dabney created Computer Space—1,500 arcade machines were sold that year. In 1972 the game Pong became the first popular video game. It was released by Atari, which sold 19,000 machines. The same year, the Magnavox Odyssey, represented the first generation of game consoles. After the Space Invaders, in 1978 by Taito, the Atari Asteroids in 1979 and Pac-Man in 1980, the second-generation consoles already used ROM-based cartridges. 

The 1980s were a time for the games to diversify. Many new genres appeared such as adventure, fight, labyrinth, platform, race, RPG, and so on. The machines also evolved. In that decade computers as the Commodore Vic20, Commodore 64, and the Apple II took the place of the second- generation consoles. Then came the third-generation consoles with the 8 bits Nintendo NES. It was an immediate success due to the game Super Mario Bros. Others like Sega's Master System with the game Sonic also appeared at that time. It was the beginning of famous series as The Legend of Zelda and Final Fantasy. 

In the ‘90s the fourth generation, 16 bits consoles determined the victory of the consoles over the arcade. It was also then that the introduction of 3-D graphics into video games caused an important revolution. The First Person Shooter (FPS) and the Real-Time Strategy (RTS) games become very much appreciated. Many famous games as SimCity, Ultima Online, Mortal Kombat and the precursors of the FPS: Wolfstein 3-D (1992) by idSoftware, which used techniques as texture mapping; and Doom (1993) also by idSoftware, appeared at that time. Then it was the time for the fifth-generation consoles, e.g., Sony Playstation and Nintendo 64, which revolutionized the graphics again. With their huge computing power they offered much better graphics and innovations for FPS. Mission based games with rich stories and small in-game movies appeared.

At the end of the 1990s the sixth-generation consoles included support for online gaming, as the Dreamcast. In 2000, Sony's Playstation 2 became the best seller of all time. Microsoft's Xbox and the Nintendo's GameCube also appeared, increasing the number of available games to hundreds. With the widespread of the Internet, online games gave a new impulse to many games, which allow playing with friends and players from around the world. 

The latest game consoles—Xbox 360, PlayStation 3 and Nintendo Wii—with very high general processing power and also very high graphics processing power due to dedicated chips, allowed game designers to create games with incredibly realistic and dynamic graphics and with richer stories in which every little action of the player influences the game sequence. Initiatives to move the player out of the desktop to a more natural contact with the game environment are becoming more and more common. Some are directly associated with body motion [5]. Tangible and physical interactions [6] have also been introduced in the market. Firstly with the Nintendo WiimoteTM, NunchukTM and MotionPlusTM, and recently with the PlayStation MoveTM and the KinectTM for the XboxTM console. Experiences are also being made with mobile devices to adapt well-known game styles as FPS to smartphones and similar devices [7]. 

In addition to these recent innovations, the display technology is also evolving quickly and, for instance, is potentially improving the gameplay and immersion experience. 50 inches 3-D LED TVs with full high definition are a reality in many homes nowadays and are seriously influencing the development of new games that explore the use of 3-D stereo. Also, advanced GPUs support the use of arrays of displays, which are being fully exploited by the game industry. Enlarging the field of view, players can achieve better performances, while experiencing more fun. 

2.2 The CAVE Automatic Virtual Environment

Sutherland created the Head-Mounted Display (HMD) in 1966. It is a device to be worn as a helmet, with a small screen adapted to each of the eyes. With this device he introduced the idea of a window to the virtual world, increasing the immersion of the user in a virtual environment [8]. However, the use of a HMD involves complex equipment for rendering, tracking, and interaction. It is also cumbersome, with wires connecting the HMD, gloves and other peripherals. 

Motivated by the limitations of the HMD, the first CAVE (Cave Automatic Virtual Display) was presented in the SIGGRAPH 1992, by the Electronic Visualization Lab of the University of Illinois at Chicago [2]. A basic CAVE is a rectangular room where three of the walls and the floor are screens upon which high definition images are displayed. A user inside the room is surrounded and can interact with the virtual environment. 

A CAVE is actually a virtual reality interface that promotes immersion. This provides a feeling of presence for a user as they are surrounded by the virtual world. The CAVE design eliminates many problems of other environments, as the interaction limited to one user and low-resolution images. Another interesting aspect is that the objects can be seen from both sides and different perspectives depending on the user posture and position. 

In a CAVE, each wall is a rear-projection screen and the user can wear polarized or shutter-glasses with stereoscopic projected images. For stereoscopy it is required that two images are rendered for each frame, one for the left and one for the right eye. Shutter-glasses, for example, are synchronized with the projection frequency to send the images for the respective eye, blocking the other one. This produces the illusion of a three-dimensional image, i.e., the objects seem to be floating in the air and can be seen from different points of view. 

Some conceptual ideas are linked to the CAVE environment. The willing suspension of disbelief is one of them [2]. The term has long been used in literature and cinema to define the will of the audience to believe in the images as if they were real, replacing the reality surrounding. The audience enters in a state that they agree to replace the judgment of the truth by the entertainment. 

Another concept is the perspective centralized on the user. This idea is based on the camera position along an axis extended orthogonally from the center of the screen. The centralized perspective simulates the view from the viewer position. To keep it, a sensor must constantly send affine tracking information from the simulation. 

CAVEs are used today in several areas and are present in many universities. Engineers use CAVEs to improve product development [9]. For example, to design a car part, a CAVE is used to visualize the part in 3-D and provides a more accurate mental image of the final part before it is actually produced, improving the quality and reducing the cost of the design. Another example is to design a new car model and see it from the driver's or the passenger's perspective with everything in place before manufacturing any physical element. 

Simplified CAVE systems have been proposed. The HybridDesk (Figure 2) aimed at creating a workspace aggregating elements of traditional WIMP interfaces (Windows, Icons, Menus, and Pointer) with other resources (hardware and software) to allow interactive 3-D task [10].

Figure 2.  A schematic of the HybridDesk [10].


This section describes how we designed the desktop-CAVE interface, a simplified CAVE system. We first overview the system conception and basic mechanisms, then we present a qualitative comparison of the desktop-CAVE concept with alternative visualization systems to guide the implementation of the system that is presented in Section 4.

3.1 Overview

The system is built upon a set of identical LCD monitors (Figure 1) and a desktop PC with two graphics cards. The two cards are necessary because each of the cards used allows connecting only two monitors at a time. After the monitors are connected, the operating system is configured to extend the workspace over all monitors creating a single wide desktop area. Then, the game is configured to a higher screen resolution so that the game display spread through the three monitors.

A last and most important task has also been made, which is the monitor calibration. Calibration is accomplished using a webcam. The webcam is placed in front of the monitors in a way that it can see them all. Meanwhile, each monitor displays a different fiducial marker. Computer vision techniques are used to calculate the angle between the monitors by comparing how the markers appear in the video captured by the webcam [11]. Such angles are then saved and used in the game to define the position and orientation of three virtual cameras, one for each monitor, which are finally used to render the game view to a player (see more details in Section 4.4).

3.2 Display Quality Parameters

3.2.1 Immersion parameters

These parameters measure the level of visual simulation provided by a virtual reality interface, also known as suspension of disbelief [2]. Some parameters involved in creating a willing suspension of disbelief are described below. 

Field of view. The field of view (FoV) is the maximum angle a viewer can see without moving their head. Considering a screen with W inches wide and being D the distance from the viewer to the screen, the FoV can be obtained from the expression:

FoV = 2 tan-1 W/2D          (1)

Comparing some of the main visualization interfaces: LCD monitors; head-mounted display (HMD); CAVE; three monitors interconnected—we see distinct scenarios. One LCD monitor provides a variable but limited field of view depending on its size and the viewer distance. For instance, a 19-inch monitor seen at a distance of 18 inches provides a FoV of near 458. With HMDs, as the screens are mounted at a constant distance from each of the eyes, the FoV is constant and angles between 1008 and 1408 are common. The FoV in a CAVE depends on viewer position, but as the screens surround the viewer, in the ideal case the interface offers a FoV of 3608. However, if shutter glasses are used to provide stereoscopy, the glasses frame will eventually limit the FoV perceived by the user.

A system with three monitors as we propose, in comparison with a single monitor, provides an increment of the field of view, which depends on the monitor angles. In comparison with a traditional CAVE, the three monitors setup present a narrower FoV. However, if a greater number of monitors are used, the FoV could be similar to a CAVE and with a potentially higher resolution.

Panorama. Immersion is also related to the capability of an interface to allow the virtual environment to surround the user. This idea is called panorama and differs from the FoV as the user's head motion is taken into account.

With any monitor-like interface, the idea of panorama is not directly applicable because a monitor is rather a fixed window to the virtual world and does not surround the user whatsoever. With HMD the panorama is a strong feature as everything the user sees is the simulated world. Whatever side they turn the head or move the eye, they will see some portion of the virtual environment. This also happens within a CAVE, as the users are surrounded by the projections. In CAVEs that are not completely closed by four walls—notice that the typical setup uses three walls, the panorama is interrupted when the user turns towards the empty side. Nevertheless, one advantage to the CAVE in relation to the HMD is that no user tracking is needed to ensure a good level of panorama.

With the three monitors setup, panorama will be higher than one monitor as a user moving their head to the sides will actually see different parts of the VE, which are shown on the side monitors. However, due to the size and because the monitors are fixed, HMDs and CAVEs still provide a higher level of panorama.

Intrusion. Intrusion is related to how much the user senses are restricted while using a given interface. The HMD is the most intrusive because it isolates the user from viewing and often listening the real world. A user stepping forward will not know on what they are going to step. Special HMDs for augmented reality can render the real environment mixed with virtual elements, which reduce the intrusion of this interface. Even so, the intrusion persists as the HMD reduces the FoV, causing the user to see a narrow area of the real world. Monitors and CAVEs are less intrusive as they allow the user to move freely, always maintaining knowledge of what are the real and what are the virtual elements of the environment.

The desktop-CAVE with three monitors is not more intrusive than a single monitor. The fact that the user knows where the monitors are placed allows them to clearly separate what is virtual and what is real, interacting with both worlds with a very low probability of confusion.

3.2.2 Visualization parameters

These parameters measure how effective a visualization interface is. Such measure is important because an interface must offer a satisfying visual experience. Some of these parameters are described below.

Visual acuity. The quality of a display is often measured by its resolution, i.e., the number of pixels or points composing the image. The quality of a virtual reality interface is better measured using a combination of resolution and field of view. This measure is called visual acuity of a display.

A typical method to calculate visual acuity uses the resolution and the distance from the viewer to the center of the screen. A screen with H pixels of horizontal resolution and W inches wide presents a pixel pitch of P=W/H inches per pixel. Considering a distance D, the angle represented by one pixel on the retina is given by Eq. 2 and is measured in minutes. Thus, in visual acuity, the portion of a pixel occupying one minute on the retina is given by the inverse of this angle [2] [5]:

1/ tan-1 P/D                (2)

Another metric used to measure vision is the fraction of Snellen, 20/X. Such fraction shows that an observer situated at 20 feet can see the same as an observer with normal vision can see at X feet with the naked eye. For example, a person with vision 20/40 placed at 20 feet from a scene can see the same details a normal person could see at 40 feet from the same scene. Then, this person has a vision considerably poorer than most of the people.

If we consider, for example, a 19-inch monitor with 1280 x 1024 pixels of resolution and a distance of around 18 inches from the screen we obtain a visual acuity of 20/45. Compared with the international standards for issuing driver licenses, this is an unsatisfactory visual acuity to drive light vehicles, category B. The minimum recommended acuity is 20/30. With a CAVE, considering that each projector has a horizontal resolution of 1280 pixels for a 7-foot screen, the visual acuity is around 20/110. If it is possible to increase the resolution of the projectors, the visual acuity will increase proportionally. Current HMDs, in turn, often worsen drastically the visual acuity to near 20/425. This is due to the screen size and the very low resolution. In terms of acuity, a CAVE is not so bad as an HMD, but is also not as good as monitor.

With three similar 19-inch monitors the visual acuity is equal to the one of a single monitor, as the size of the screen increases together with the total number of pixels.

Look-around. This parameter represents the possibility for a viewer to move around an object and see it from different perspectives [2].

Visualizations with this property can be used in many applications. It makes possible to model a new product in 3-D and inspect it from different angles much before the physical product is manufactured or built.

The look-around property does not work with common LCD monitors. When one moves to the side, they begin to see a smaller area of the screen. Moving even further, the screen can no longer be seen. With an HMD it does not happen. Anytime the user moves to the side, or looks around, the system recalculates their position in relation to the virtual world and displays the right portion of the VE to the user. Look-around is also plenty valid within a CAVE. As they move, users displace their field of view to different areas of the projections causing them to see elements of the VE from a perspective spatially associated with their real position. Wearing shutter-glasses for stereoscopic vision the feeling is even more intense. Closer objects will be seen as if they were floating in the air in front of the user who is able to move and see them from any of the sides.

When using three monitors configured as a desktop-CAVE, it is possible for a user to look-around and see different parts of the VE. However, due to the size of the monitors, the freedom to move is restricted, and looking-around is only possible at a limited range.

Collaboration. Visualization interfaces can also be classified according to their potential to allow collaboration. It means that they are more collaborative if they allow more than one user to see and/or interact with a VE at the same time.

This is possible with a monitor-like interface, but the perspective will be that of only one of the users. The same happens with a CAVE and a desktop-CAVE, but in these cases the surrounding screens and the use of shutter-glasses allow individual user perspectives to be perceived. With HMDs it is only possible to have collaboration if every user wears an individual HMD and the system is able to render every individual perspective of the same VE in time.



This section describes the implementation of the desktop-CAVE interface.

4.1 Third party software

We begin by presenting the existing software frameworks, toolkits and libraries chosen to integrate our implementation.

ARToolKit is a very popular toolkit for fast development of augmented reality (AR) applications. It is widely used in part because it is open source, inviting the users to run, study and modify the available examples at will.

The toolkit is implemented in C and C++, and offers support to the development of AR applications with low computational cost. It implements computer vision algorithms, which are essentially used for optical tracking. It is able to estimate in real time the position and orientation of markers in relation to the video capture device, usually a webcam. In AR applications this is used to place and orient virtual elements inside a real scene captured by the camera, building a mixed reality scenario.

For AR applications, the use of the toolkit can be summarized in the following steps:

1)    set video parameters; initialize camera; read marker files

2)    capture a video frame

3)    detect and identify markers on the video frame

4)    calculate the relative transformation between marker and camera reference frames

5)    render virtual object on the marker reference frame

One limitation of the ARToolKit is the range of distance between the camera and the marker. If they are too far, the marker can be too small to be identified; if they are too close, parts of the marker may stay out of the video frame, being impossible to identify it. The size of the marker can be adjusted to minimize this problem.

In the present work we use the tracking capabilities of ARToolKit to calibrate the relative angles between monitors in the desktop-CAVE configuration (see more details in Section 4.4).

AssaultCube—a screenshot of the game is shown in Figure 3a—is an open source, first person shooter game based on the Cube engine and game. It debuted in 2004 with the name ActionCube, a member of the Cube community was behind its launch. The official release date was in November 2006, and in May 2007 the name was changed for AssaultCube to avoid ambiguity with the name of another game, Action Quake. The game is a more realistic version of the original Cube game, which keeps the simplicity and velocity of the original one.

In the game, characters, including players, are divided in two factions: the Cubers Liberations Army (CLA) and the Rabid Viper Special Forces (RVSF). A player chooses which of the factions they want to join before starting the game. The game can be played online with and against human players or with virtual enemies (bots). There are 12 different game modes: Capture the Flag, Keep the Flag, Team Keep the Flag, Deathmatch, Team Deathmatch, One Shot One Kill, Team One Shot One Kill, Last Swiss Standing, Survivor, Team Survivor, Pistol Frenzy, and Hunt the Flag. Among them, only the modes Deathmatch, Team Deathmatch, and One Shot One Kill are available to play against bots.

4.2 Hardware

Three LCD monitors have been used in this work—all having the same size, 19 inches, and the same resolution, 1280x1024 pixels.

A desktop PC with two graphics cards (GPUs) has also been used. The GPUs are both from Nvidia. One is a GeForce 9600 and the other is a GeForce 8600. Two identical GPUs are recommended for safe compatibility, but this is not a hard requirement.

For monitor angles calibration, an ordinary 1.3 megapixels webcam has been used. This resolution is recommended as lower resolution cameras may have difficulties to detect the markers appropriately.

Figure 3.  A scene from the game AssaultCube as shown in a single display (a) and in three viewports with a continuous image (b).

4.3 What changes in the game code?

The original game source code has been modified to support the rendering of three independent viewports (one for each monitor) and to allow angle calibration. This is done by first replicating the virtual camera twice to render one frame for each of the viewports. This causes the three monitors to render the same view. Then two of the cameras are rotated laterally to visualize the sides of the scene in such a way that a continuous horizontal field of view is provided sewing the three viewports at the vertical edges of neighboring monitors (Figure 3b).

Minimaps, life gauges and ammunition information are displayed only on the central/frontal screen, which has the focus.

Source-code is presented in Algorithm 1. It shows how each of the viewports are defined, how the cameras are rotated and that a frame is rendered by calling the function gldrawframe.

As the angles between monitors can vary, one last step is necessary to obtain a consistent visualization for every angle. If the angle between them is equal to the field of view, we have an ideal case in which the default camera projection parameters are correct. However, as the angles may be changed, we also had to proceed with modifications in the function setperspective of the Cube engine. Such changes aim at configuring the view frustum according to the angles calibration (Section 4.4). As shown in Figure 4 the frustum defines the visualization volume within which are placed the objects that will be rendered.

With our method we use three cameras, and each of them has its own frustum and field of view. Thus, to obtain the widest possible visualization, the angles between monitors should be equal to the field of view of each camera, see Figure 5. When the angles between monitors are different from the FoV, the viewer sees only part of the original total FoV, see Figure 6.

Algorithm 1. Definition of the three viewport


glViewport(0, 0, screen->w/3, screen->h);

viewportNum = 1;

player1->yaw = player1->yaw-dynfov();

computeraytable(camera1->o.x, camera1->o.y, dynfov());

gl_drawframe(screen->w, screen->h,

fps<lowfps ? fps/lowfps : (fps>highfps ? fps/highfps : 1.0f), fps, viewportNum);

if(frames>4) SDL_GL_SwapBuffers();

player1->yaw = player1->yaw+dynfov();



glViewport(2*screen->w/3, 0, screen->w/3, screen->h);

viewportNum = 3;

player1->yaw = player1->yaw+dynfov();

computeraytable(camera1->o.x, camera1->o.y, dynfov());

gl_drawframe(screen->w, screen->h,

fps<lowfps ? fps/lowfps : (fps>highfps ? fps/highfps : 1.0f), fps, viewportNum);

if(frames>4) SDL_GL_SwapBuffers();

player1->yaw = player1->yaw-dynfov();



glViewport(screen->w/3, 0, screen->w/3, screen->h);

viewportNum = 2;

computeraytable(camera1->o.x, camera1->o.y, dynfov());

gl_drawframe(screen->w, screen->h,

fps<lowfps ? fps/lowfps : (fps>highfps ? fps/highfps : 1.0f), fps, viewportNum);

if(frames>4) SDL_GL_SwapBuffers();

The precalculated monitor angles (see Section 4.4) are read from a text file and stored as global variables in the game code. In the function setperspective it is necessary to check if the angles between monitors are the same as the FoV angles. If they are, the original frustum parameters are used with the function glFrustum. Otherwise, a correction must be made to obtain a coherent visualization.

Such correction is made by increasing the frustum parameter left of the leftmost viewport and decreasing the frustum parameter right of the rightmost viewport. To calculate the amount of increase and decrease we first have to compute how much a degree represents in relation to the size of the viewport. For example, if the angles between monitors are 30º, the frustum left is equal to -50, the frustum right is equal to 50 e the FoV angle is 60º, each degree corresponds to 100/60 =1.667. This value can be used to calculate the frustum size with a different angle. We then multiply the angle between monitors (30º) by the value corresponding to 30º (1.667). We obtain 50, which is the new size of the visible frustum for this monitor. Finally, to obtain the frustum left for the left viewport, we subtract the new size from the frustum right. In the example the new left will be 0.

If such example represented the right viewport, we would have to calculate the frustum right, which would be the left plus the size, which is equal to 0. As this narrower visualization area occupies the whole screen, the final image being displayed looks stretched. Some objects will appear horizontally larger on these viewports, even if the user placed in a central position will not notice that.

Figure 4. Frustum definition.

Figure 5. Angle between monitors equal to the FoV angle (60º).

Figure 6. Angle between monitors (30º) different form the FoV angle (60º).


4.4 CAVE angles calibration

To calibrate the angles between monitors we created an application based on the ARToolKit. With the application, a webcam is used to film the monitors while these latter present fiducial markers on the screen (Figure 7). The application, using computer vision algorithms from ARToolKit, identifies the markers and calculates the angles between the webcam and each marker. As a result, the relative angles between the markers and consequently between the monitors can be obtained.

Fiducial markers are images containing visual features that are easy to be extracted. Often, they are black and white square figures, which contain identifiable symbols. The full marker must be visible for the marker to be successfully tracked by a camera and identified by vision algorithms. Examples of markers are shown on the three screens of Figure 7.

Figure 7.  Schematic of the monitors being tracked by the webcam.


Figure 8. Relation between marker and camera coordinates [12].


Figure 9. Transformation matrix.

The tracking procedure processes the image, extracts image information like vertices for detection or identification features, and estimates marker position and orientation. Pattern recognition is performed by identifying the four vertices of square regions contained on a video image, which is then converted to a binary image (black and white). The symbol inside the vertices is compared to the template inputs by the user or developer [9]. Whenever the information contained in the extracted square is similar to any of the registered markers, the system identifies the marker and determines its relative pose to the camera.

Marker pose (position and orientation) is determined by relating marker and camera coordinates as in Figure 8.

A 4x4 homogeneous transformation matrix T containing the relation between marker and camera coordinates is obtained from ARToolKit for every video frame (Figure 9). The multiplication of T by a 3-D point in the marker (Xm,Ym,Zm) allows obtaining the corresponding point in the camera coordinates system (Xc, Yc, Zc).

This matrix allows extracting the angle between a marker and the camera. As a marker is shown for each monitor, after storing the camera-marker angles for each marker/monitor, to calculate the angle between the three monitors is straightforward.

4.5 A general approach beyond the CAVE

The CAVE-desktop implementation contemplates the today’s common availability of three monitors per desktop. However, other display configurations can also benefit from our approach. In Figure 10 we show a display wall of 12 LCD monitors. The angular relations between monitors can be obtained by computer vision with a webcam and the markers displayed on the screens as in Figure 10a. The only extension necessary for this generalization is to add a different marker for each additional screen. As the monitors’ positions are determined in relation to the camera, their relative angular information can be calculated as explained in Section 4.4.

After the relative angles are calculated, additional viewports have to be produced. This is made defining the number of virtual cameras to be used and calculating the frustum for each of the cameras. Figure 10b shows an example with 3 viewports where the central viewport extends to the area of the 6 center monitors, while the left and right columns of 3 monitors are assigned to the corresponding lateral viewports. In this example, the visible area of the central frustum is twice as large as in the previously described CAVE-desktop setup. Figure 11 presents a comparative view of a single plane and viewport (Figure 11a), 4 viewports mapped on 3 planes with 2 of the viewports on the central plane (Figure 11b), and a player in action (Figure 11c), where a sharp increase of the total view can be detected.



The proposed system has been assessed through user experiments. With the tests we aimed to address the following question: Does a desktop-CAVE help in improving gameplay and immersion for players of first person shooter games? We assume that the increase of peripheral vision obtained with the desktop-CAVE should improve performance.

5.1 Tests and subjects

Most of the tests have been performed during an Open Day event at our University. Random players—most of them teenagers—from the academic community and outside have been invited to participate. Twenty-five people have been submitted to the test, most of them have some experience with digital games, and more than half of them has at least 8 years of experience (Figure 12). According to their own judgment, they have a varied skill level for FPS games: 32% is weak; 24% average skills; 24% consider they are fairly good; 20% affirm they are very good. None believe they are among the best players around (Figure 13).

Figure 10.  A display wall of 12 LCD monitors. The angular relations between monitors can be obtained by computer vision with a webcam and fiducial markers (a). In (b), a game view with this setup. Observe that the central viewport extends to the area of the 6 center monitors, while the left and right columns of 3 monitors are assigned to other 2 corresponding lateral viewports.

Figure 11. Comparative view of a single plane and viewport (a), 4 viewports mapped on 3 planes with 2 of the viewports on the central plane (b), and a player in action (c), where a sharp increase of the total view can be detected. Notice the stairway in the left side where any approaching enemy can be easily spotted.

Even if most people have experience with digital games, very few have experience with other types of 3-D applications (Figure 14). Examples of 3-D applications are CAD tools and authoring tools like Blender and 3D StudioMax.

The test itself consists of playing the first person shooter game AssaultCube. In the test, the player has one minute to get used with the controls and the game velocity. Just after, two tests lasting two minutes were performed, one using only one display and the other using the desktop-CAVE approach with three displays with. These setups where chosen because they are the most common for domestic use. The picture in Figure 1 was taken during the tests and shows the setup used for the tests.

The order of the tests (one or three monitors) was randomized to avoid the learning curve to interfere in the results. Two maps of the game have been chosen: the map acgothic and the map accomplex, and they were also randomly selected.

5.2 Procedure and variables

Before the experiment starts, users were invited to fill a self-characterization questionnaire. A short text explaining the test, introducing the game, and presenting instructions to play was given to all volunteers. The tasks were then accomplished and the volunteers answered a post-experiment survey. The post-test survey should answer if the players subjectively felt more immerse in the VE of the game using the desktop-CAVE instead of a single display.

During the execution of the tests, the following dependent variables were collected in a log file:

1)    Shooting precision is the number of hits divided by the total number of shootings

2)    Number of deaths is the number of times the player died, shot by enemies, during the 2 minutes test;

3)    Number of kills is the number of enemies killed by the player;

4)    Number of injuries is the number of times the player was hit by a shot of an opponent;

5)    Horizontal movement represents the total horizontal angular motion executed by a player during the test. This motion is applied moving the mouse to both sides and causes the player vision to turn to the corresponding sides as if they were turning the head. The angles turned at every move-and-stop of the mouse, in degrees, were summed up to compose this parameter.



This section is divided in two parts. In the first one, the results achieved from the tests with users are presented, while in the second part we show other desktop-CAVE configurations we tested.

6.1 About the user tests

The influence of the desktop-CAVE in the performance of the user has been evaluated taking into account five variables: precision, number of deaths, number of kills, number of injuries and horizontal look-around motion. All of them gave better averages when the player was using the desktop-CAVE with three monitors than the conventional one monitor display. However, better averages are not always enough to prove the hypotheses, as random elements can be more influential than an independent variable, especially with a small number of samples. Then, we performed Student's t-tests to evaluate the statistical significance of the higher averages.

Regarding precision, with one single monitor we computed an average of 21.75% shots that hit an enemy. With three monitors the average was 24.36%. Even with the higher average precision for the desktop-CAVE, the t-test with α = 5% resulted an F value of 1.04, which reveals a probability of 68.8% that our hypothesis hold. This is not enough to provide statistical significance, as at least 95% is usually required to overcome the random effects.

The average death was 4.6 with one single monitor. With the three monitor CAVE, the average was 3.36, meaning that players died less with the proposed interface. However, the t-test with α = 5% resulted in an F of 3.25, which reveals a probability of 92.24% that the hypothesis hold. This is a good probability. Even so, the required threshold of 95% was not reached and we cannot state positively that the desktop-CAVE helps the player to die less than with an ordinary monitor.

The average number of kills with one monitor was 3.04 per test, while with three monitors it was 3.76 kills per test. Analogously to the previous parameters, the t-test with α = 5% resulted in an F value of 1.03 revealing a probability of 68.52% only that these average resulted from the types of visualization interface tested. Thus, despite the better average, we cannot affirm that this difference is caused by the interface.

The average number of times the player was hit by an opponent's shot with one monitor was 21.88 injuries per test, and with three monitors it was 17.40. Again, the desktop-CAVE obtained a considerably better average. However, the t-test with α = 5% resulted in F = 1.51 which indicates a probability of around 77.5% that the hypothesis hold. Again, there is not enough statistical significance to prove the hypothesis despite the good results.

Concerning the horizontal motion, with one monitor, the average angle was 5676º per test, and with three monitors it was 3732º per test. The t-test revealed that this test is statistically significant, with a probability of at least 99.99% that the hypothesis hold. The t-test with α = 5% resulted in F = 28.88. This result supports the hypothesis that the system with three monitors improves the player's peripheral vision, as they see enemies approaching from the sides and only have to turn to shoot them or run away.

In consequence, the players also turn quickly enough to get less injury, die less and have more time to aim, obtaining a better rate of hits. This statement is supported by the good averages in all parameters tested. Concluding, we can say that the desktop-CAVE effectively participates in improving performance in this game.

Moreover, several volunteers stated that the three monitors system caused discomfort or strange feeling in the beginning. But as they play, they felt better. A few minutes are then required for adaptation. Another evidence of it is that the players commented that they did not look at the lateral monitors in the beginning, but since an enemy appeared there they could notice, and then, after sometime they started to look for enemies in the laterals too.

The post-experiment questionnaire revealed that 92% of the players like better the three monitors setup. The exact same percentage of testers believed that they played better using the desktop-CAVE, even when it was not supported by the logged data.

Figure 12. Subjective level of experience in games of the volunteer users.

Figure 13. Subjective skill level in FPS games of the volunteer users.

Figure 14. Subjective level of experience in other 3-D applications than games.


6.2 Other configurations

As mentioned in Section 4.5, even if our CAVE-desktop approach was developed focusing on the use of three monitors per desktop, other configurations were also possible and depend only on the graphics boards used and, of course, the availability of the displays.

The CAVE-desktop approach proposed in this work was tested with at least three configurations: using three monitors in an open angular configuration (as in Figure 1); with 12 monitors disposed in a 4x3 matrix layout (Figure 10); and with 8 monitors arranged in a semi-circular layout.

Figure 15 illustrates a more general case study in which eight viewports are displayed in eight monitors arranged on four different planes. This can be obtained by rotating the virtual camera to the orientation orthogonal to each of the planes. Remember that the angular relation between the planes have been defined using the markers. Frusta definitions follow the algorithm presented in Section 4.3.

General implementations of the approach allow adapting the player’s field of view to the challenge of each game. Observe in Figure 16 that the player sees the corridors in front of them, the wall to the left side, the waste container to the right, and the stairway to the far right. At this position, the player can be very confident that they are not in danger, as enemies arriving from anywhere can be easily spotted either in straight or peripheral view, allowing them to react quickly and avoid injury.



We presented the design and development of a desktop-CAVE interface for visualization in 3-D games. The interface is built originally using three LCD monitors as screens to compose a simplified CAVE, but can also be extended to other configurations and layouts. We modified an open source first person shooter game to analyze the influence of such interface in game performance. Different aspects as shooting precision, number of kills, injuries and deaths were studied along an experiment with 25 randomly selected users. We observed that players get more involved by the game, presenting better performance when using the desktop-CAVE in comparison with a conventional one monitor interface. We also noted that subjectively the volunteers felt more immersed in the game environment, which allowed a more satisfying experience.

The performance improvement has been attested especially with the measurement of the horizontal angular motion. Users actually do not have to look around all the time because the CAVE provides a considerable increase of the peripheral vision as compared with a single monitor. They identify enemies approaching from the sides more quickly and can promptly react. Even if some of the measured parameters do not present a large difference when comparing the averages, as a whole they always corroborate to the validity of the original hypothesis.

Improvements can be made to the system. One is to compute, besides relative angles, the relative monitor positions. This would allow placement of the monitors anywhere around the player, not constraining them to have a common edge as in our current implementation. Another improvement would be to calibrate dynamically during the game, but for this it would be necessary to track the monitors and the player face all the time. This could be made using a number of simple strategies as for example fixing a webcam somewhere on the player's head and markers attached to the monitors. The player would be able to move freely and anytime they look at a monitor the webcam would identify the marker and calculate the appropriate virtual camera position to render the VE on that screen.

Regarding the user tests, we believe that using a higher number of volunteer players, or even by grouping them according to similar profiles, the random effects in the tests would vanish and high statistical significance would be obtained for all parameters. This should happen because most of the analyzed parameters are strongly associated with how quickly the players acquire skills for a game they do not know.

Finally, we also would like to do user tests considering different layouts— changing the number of monitors and the angles between them—and other genres of games. Since each game has its own specificities and strategies, we believe that with more tests we can propose a set of layout's recommendations adapted to each game genre.  

Figure 15. Schematic of the general approach. 8 viewports are displayed in 8 monitors arranged on 4 different planes. This can be obtained by rotating the virtual camera to the orientation orthogonal to each of the planes. Such generalization allows adapting the player’s field of view to the challenge of each game. Observe that the player sees the corridors in front of them (b), the wall to the left side (a), the waste container to the right (c), and the stairway to the far right (d).

Figure 16. Four snapshots of a player using the semi-circular desktop-CAVE layout using four monitors. The second display, from left to right, is the focus of the game, while the other three show peripheral information.



Thanks are due to all the volunteers who kindly tested the system and had fun doing so.  This work was supported by CNPq-Brazil under the projects 309092/2008-6, 483947/2010-5, 483814/2010-5 and 302679/2009-0.



[1] FAGERHOLT, E. AND LORENTZON, M. 2009. Beyond the HUD - user interfaces for increased player immersion in FPS games. M.S. thesis, Chalmers University of Technology. 

[2] CRUZ-NEIRA, C., SANDIN, D. J., DEFANTI, T. A., KENYON, R. V., AND HART, J. C. 1992. The cave: audio visual experience automatic virtual environment. Communications of the ACM 35, 64–72. 

[3] KIRNER, C. AND KIRNER, T. G. 2007. Virtual reality and augmented reality applied to simulation visualization. Simulation and Modeling: Current Technologies and Applications 1, 391–419. 

[4] PRABHAT, FORSBERG, A., KATZOURIN, M., WHARTON, K., AND SLATER, M. 2008. A comparative study of desktop, fishtank, and CAVE systems for the exploration of volume rendered confocal data sets. IEEE Transactions on Visualization and Computer Graphics 14, 3, 551–563. 

[5] SILVA, M. G. AND BOWMAN, D. A. 2009. Body-based interaction for desktop games. In CHI ’09: Proceedings of the 27th International Conference Extended Abstracts on Human Factors in Computing Systems. ACM, New York, NY, USA, 4249–4254. 

[6] ZHOU, Z., TEDJOKUSUMO, J., WINKLER, S., AND NI, B. 2007. User studies of a multiplayer first person shooting game with tangible and physical interaction. In ICVR’07: Proceedings of the Second International Conference on Virtual reality. Springer-Verlag, Berlin, Heidelberg, 738–747. 

[7] WEI, C., MARSDEN, G., AND GAIN, J. 2008. Novel interface for first person shooting games on PDAs. In OZCHI ’08: Proceedings of the 20th Australasian Conference on Computer-Human Interaction. ACM, New York, NY, USA, 113–121. 

[8] CAKMAKCI, O. AND ROLAND, J. 2006. Head-worn displays: a review. Journal of Display Technology 2, 199–216.

[9] BUXTON, W., FITZMAURICE, G., BALAKRISHNAN, R., AND KURTENBACH, G. 2000. Large displays in automotive design. IEEE Computer Graphics and Applications 20, 4, 68–75. 

[10] CARVALHO, F., RAPOSO, A., GATTASS, M., AND TREVISAN, D. 2010. Um sistema hbrido semi-imersivo de baixo custo para interaes 2d-3-D. XII Symposium on Virtual and Augmented Reality 1, 153–162. 

[11] CLAUS, D. AND FITZGIBBON, A. W. 2005. Reliable automatic calibration of a marker-based position tracking system. In Application of Computer Vision. IEEE Computer Society, Breckenridge, CO, USA, 300–305. 

[12] KATO, H. ARToolKit Documentation, 2010. Retrieved September 15, 2010 from:

Copyright © 2019. All Rights Reserved