Our Work “MetaCast: A Self-Driven Metaverse Announcer Architecture Based on Quality of Experience Evaluation Model” Is Accepted By ACM MM’ 23

Highlight Human Factors Multimedia Publication

Recently, our work “MetaCast: A Self-Driven Metaverse Announcer Architecture Based on Quality of Experience Evaluation Model” is accepted to be published in The 31st ACM International Conference on Multimedia (MM’ 23).


Currently, there are various directions of metaverse development, such as gaming-based (Roblox, Minecraft, Fortnite), blockchain-based (Decentraland and The Sandbox), or extended reality-based such as Meta Horizon Worlds. The common idea behind these projects is to provide a virtual world for users to interact with. As the metaverse attracts more users, the demand for an announcer system emerges. Due to the sheer scale of the metaverse, users cannot cover it entirely with their own avatars. In this case, a metaverse announcer that broadcasts events in real-time can help users stay informed about ongoing events in the expansive virtual world. However, most existing video game observers mainly serve in esports competitions and driven by human director. As a parallel universe to the real world, the metaverse should be accessible anytime to anybody in the world, and so does its observer. This poses a challenge as human-driven observers are unable to afford 24-hour delivery in the practical metaverse. Currently, some researchers have developed virtual cinematographic systems to address specific needs, such as plot-based storytelling, cinematic sequence, and complex crowd tracking. Nonetheless, there is a lack of a general framework to fit the wide variety of situations occurring in the metaverse.


In this work, we propose a three-stage metaverse announcer architecture consisting of the Event Manager, Prose Storyboard Language (PSL) Interpreter, and Camera Controller. The Event Manager captures newsworthy events from the metaverse, such as crucial user actions or crowd gatherings. After determining the targets to be announced, the PSL Interpreter will choose several shot specifications for the event and translate them to corresponding camera positions. At last, the Camera Controller takes charge of the path planning, including transition time and curve.

Moreover, this paper identifies and analyzes various factors affecting the users’ experience of a metaverse announcer, mainly containing two subjective factors: the accuracy of catching events and perceived video quality. For quantitative evaluation, we build a quality of experience (QoE) model, named Metaverse Announcer User Experience (MAUE) model, which comprises four objective factors: the transition time between shots, frequency of switching views, threshold value of important events, and image composition. 

To determine suitable parameters for the MAUE model, we prepare video segments with different influencing factor settings and conduct user studies. Participants rate their satisfaction with the video segments through a 5-point Mean Opinion Score (MOS). Based on our experimental data, we obtain a series of announcer system settings that align with most users’ perceptions.


Our findings reveal that a 2:5 transition-to-shot-duration ratio receives the highest user satisfaction. The majority of users prefer an equal balance between local and global events. The importance threshold for events should adjust dynamically based on the online user count. We also provide a table of ranked specifications for image composition.