Welcome to

Lancashire Online Knowledge

Image Credit Header image: Artwork by Professor Lubaina Himid, CBE. Photo: @Denise Swanson


Quantization at the Edge: Evaluating Inference Performance and Quality for SLM Driven Conversational Agents in Virtual Worlds

Nisiotis, Louis orcid iconORCID: 0000-0002-8018-1352 and Markov, Nikita (2026) Quantization at the Edge: Evaluating Inference Performance and Quality for SLM Driven Conversational Agents in Virtual Worlds. N/A . (Submitted)

Full text not available from this repository.

Official URL: https://doi.org/10.36227/techrxiv.177273703.328884...

Abstract

Quantised small language models (SLMs) deployed at the edge are increasingly important for enabling high performance AI agents in interactive virtual worlds, where low latency operation is key for seamless user experience. This paper investigates the deployment of quantised SLMs at the edge to meet performance requirements and maintaining high response quality to support real time conversational agent interactions in an immersive virtual world. We developed a RAG based back-end AI system to support virtual agent conversational capabilities in a virtual world prototype, and experimented with two SLM families and three quantisation levels (8-bit, 4-bit, and 3-bit) to evaluate system performance through end-to-end latency, time-to-first-token, throughput, and memory headroom. We also assess response quality via expert evaluations of accuracy, relevance, safety, and human alignment against non-quantised baselines. Results show that moderate quantisation enables performant practical edge deployment with good response quality, demonstrating that edge-deployed quantised SLMs can support real-time narrative interactions and remain practically useful for virtual-world agents, providing evidence for deploying high performance conversational AI in immersive environments.


Repository Staff Only: item control page