Blog: At Sellers’ Service: Alibaba Automates Responses for Online Buyer Reviews
This article is part of the Academic Alibaba series and is taken from the paper entitled “Review Response Generation in E-Commerce Platforms with External Product Information” by Lujun Zhao, Kaisong Song, Changlong Sun, Qi Zhang, Xuanjing Huang, and Xiaozhong Liu. The full paper can be read here.
When small businesses boom, certain growing pains are to be expected. Still, for online merchants, rapid growth involves at least one customer service challenge unique to the e-commerce era: with the remarkable volume of business they can singlehandedly generate, sellers also receive a volume of customer reviews an entire team would struggle to keep up with.
Product reviews are essential to sellers’ visibility among competitors but also risky to their reputations. To retain customers and prevent their more negative opinions from influencing others, sellers need to be able to address complaints promptly and in persuasive detail lest success in the way of sales become the source of its own undoing. While sales can scale indefinitely, though, delivering a well-thought-out response to every incoming review eventually becomes impossible, leading many to opt for formulaic responses that likewise risk alienating buyers with their insincerity.
Now, in a novel application of deep neural networks only beginning to gain attention, researchers at Alibaba have developed a response-writing assistant that adapts previous work with natural language generation (NLG) to the unique demands of review responses. By identifying details from both product descriptions and buyers’ reviews of them, the model has proved able to automate the creation of responses that acknowledge specific issues and product features relating to buyers’ concerns.
Double the Attention: Building on Seq2Seq Methods
In text generation, sequence-to-sequence (Seq2Seq) deep neural networks have emerged as the dominant technology for applications like machine translation, dialogue generation, and text summarization. Using an encoder and a decoder as basic components and an attention mechanism that selectively focuses on key parts of sentences, these models read and encode a source sentence into a hidden vector and then output a target sentence based on it. This fails to meet the demands of review response generation, however, in that it severely limits the range of output responses to generic (and often trite) apologies or expression of commitment to service.
In the proposed model, researchers have advanced a stronger Seq2Seq approach that also reads and applies product information in its responses using a gated multi-source attention mechanism and copy mechanism. With two attention mechanisms for review information and product information, respectively, the model first obtains a review context vector before using it to calculate a product context vector. In this manner, it essentially reviews information about the product after hearing what someone has to say about it, more closely mimicking what a live person would do.
Following the above process, the model obtains a final context vector using a gated multimodal unit (GMU) that learns fusion transformations from multiple sources of information. Further, by incorporating a copy mechanism, the decoder can directly copy sequences derived from review and product information into its final output.
Reviewing the Responder: Testing and Results
To evaluate the proposed model, researchers subjected it to tests against four baseline models using four automated metrics and one subjective human metric (for which five volunteers rated the models’ responses). Data for the tests originated in Alibaba’s Taobao e-commerce environment, consisting of 100,000 review/description/response triplets from the platform’s clothing category.
As well as generally improving on competing models in all of the automated metrics, the proposed model excelled in the Distinct metric that measured diversity in response content, verifying researchers’ claim that product information is crucial to generating original responses. Further, in human evaluations, the model finished second only to actual people who were asked to respond to reviews, indicating it can more closely approximate their subjective powers than any other known neural network.
The full paper can be read here.