Old Stats, New Tricks: How PCIC Builds on Decades of Recommendation Research

Table of Links
Abstract and 1 Introduction
- Literature Review
- Model
- Experiments
- Deployment Journey
- Future Directions and References
2 LITERATURE REVIEW
One of the early reported work for Buy It Again recommendations came from Bhagat et al. [2] for Amazon shoppers’ data back in 2018. In this work, the authors model the repeat consumption pattern of products using a modified Poisson-Gamma (mPG) model. The mPG model is built over a simpler PG model which assumes repeat-purchase of a item at a customer level to be a possion process with a gamma prior for the purchase rate 𝜆. They also provide two simple customer agnostic item level models viz. Repeat Customer Probability (RCP) and Aggregated Time Distribution (ATD) which works as a baseline for the mPG model in experiments. Another work was also reported by Dey et al. [6] in 2016, but this was more towards capturing repeat purchase behavior in longer time durations for e.g. several weeks to months. They have used PG model for capturing repeat purchase as base and then further used Dirichlet model to predict purchase probablities of items in a category.
Apart from the above work, we have been exploring other related works in the repeat purchase domain. While there were not so many, but still some notable works in the domain of customer purchase modeling has been done historically (starting from 60’s era) where inspirations of modeling customer purchase events using statistical distributional assumptions can be taken. Once the mathematical expression of the unknown distributional parameters is rigorously derived, one can compute their estimates using data by calling simple math libraries / custom user defined functions etc. Several such works include, the Negative-Binomial distribution models (NBD) discussed in Enrehberg [1] and Grahn [9], the Erlang-2-Gamma model discussed by Chattfield and Goodhardt [3] etc. Later on, it was interesting to see works of Fader and Hardie on alternate versions of NBD model viz. Pareto-NBD, Beta-Geometric NBD [? ][7] etc. While these approaches because of its strong foundations, may have influenced many later work based on statistical distributions (for e.g. [2]), but still these were mostly useful in solving some of the popular marketing problems (often referred as Marketing Science) like predicting shopping probabilities of a customer for the next n days tending to predict chances of their attrition, predicting expected customer basket size, predicting customer life-time value etc. The problems are mostly related to a customer’s journey in a generic way and the solutions are often used to choose the right audience to whom retention policies needs to be deviced. When the notion of guest’s category/item behaviors comes into the picture, (such as similar items, buy it again etc.), we should not be limited to such approaches. Rather using these approaches as signals and applying additional layers of learning with some supervision (if possible) would intuitively be a positive step to take.
Several literature on recommender systems are available, which has abilities to recommend a customer or user’s personal taste on products. One of the older notable ones is the Grouplens project [16] by Konstan et al. on Usenet news data in the late 90’s, which used User based kNN (userKNN) approach of collaborative filtering to recommend personalized articles. Later on, another notable interesting approach we came across in the NBR domain was called the Factorised Personalised Markov Chain (FPMC) [19] by Rendle et al. in 2010. This work uses a combination of two popular approaches to solve an NBR problem viz. Matrix Factorization (MF) which captures user’s taste by factorizing observed user-item matrix and Marokov Chains (MC) which captures the sequential behavior of a user using transition graphs to predict the next action. Other similar works include, one by He et al. on sequential recommendation algorithms [11] in 2016 and another [10] in 2018 which builds on the approach of [19]. Another approach of using temporal dynamics on recommender algorithms was taken by Koren [17] in 2009, which is worth mentioning in this context. Our work certainly believes that temporal signals are important, but we have taken a different approach unlike integrating it directly with state-of-the-art recommender algorithms (viz. MF or MC) as done by [19], [11], [10] or [17]. We have considered or modeled it as separate signal and apply supervised learning on top it to cater to our problem.
More recently, with the popularity of neural network based applications, many other parallel and subsequent works have used a Recurrent Neural Network or LSTM or Transformer to more effectively capture the repeat purchase pattern. A more recent work by Hu et al. called Sets2Sets [12] has the encoder which maps the set of elements from each previous time step onto a vector, while the decoder, uses a set-based attention mechanism to decode the set of elements from each subsequent time step from the vectors. This approach outperforms several state-of-the-art methods. Another work done by Hu et al. called TIFUKNN [13] in 2020, propses a simpler method which outperforms even the RNN based approaches when it comes to NBR. It claims that personalized item frequency (PIF) provides critical signals for NBR, but existing methods including the RNNs fail to capture it. Their solution is an item frequencybased kNN method. It is to be noted that we also implement inter-category product ranking where item-frequency is a key signal, but our implementation is dependent on features derived from guest purchases while TIFUKNN depends on insights from similar guests using k-Nearest Neighbors. Another RNN approach developed by Yu et al. called DREAM [21] in 2016, where the input layer consists of multiple basket representations followed by a pooling operation on items in them to obtain a representation of the basket . Dynamic representation of the customer is obtained in the hidden layer and the output layer displays the customer’s scores towards all items. The approach of Ying et al. called SHAN [20], conststs of 2 stage attention layers called sequential hierarchical attention layers. The first layer captures customer’s long-term behavior, followed by a second layer which is a composition of long and short term behavior. Finally we explore the approach of Ren et al. called RepeatNet [18] developed in 2019. They capture repeat consumption by incorporating a unique repeat-explore mechanism in RNN, which consists of encoder and 2 decoders to learn the recommendation probability for each item in the two modes viz. repeat and explore.
There has been some work on hazard based approach by Kapoor et al. [15] in 2014, to predict customer’s return time. They proposed framework to evaluate factors that influence customer return for web services, using the Cox’s proportional hazard model [5]. This model can include several covariates. Compared to baseline regression and classification methods, the hazard-based model performs better in predicting user return time and categorizing users by their predicted return time. On top of this work, they also created a semiMarkov model [14] that predicts when users will return to familiar items. The model takes into account latent psychological factors such as sensitization and boredom that occur when the same items are repeatedly consumed.
While we noted learnings from the existent research that has been done in the NBR domain, but as per the best of our knowledge our approach has its uniqueness and while compared against many of the above solutions as baselines, we saw promising results. Our approach captures the importance of sequence models by considering time-series as a feature. It also accepts the success of hazard based approach and considers it to be an integral component of the solution. Also, it takes care of PIF to generate recommendations at category to item level – which has been a concern for traditional RNNs. On top of it, it has capability to capture complex (non-linear) relationships amongst the all signals through a simple usage of FC neural network.
Authors:
(1) Amit Pande, Data Sciences, Target Corporation, Brooklyn Park, Minnesota, USA ([email protected]);
(2) Kunal Ghosh, Data Sciences, Target Corporation, Brooklyn Park, Minnesota, USA ([email protected]);
(3) Rankyung Park, Data Sciences, Target Corporation, Brooklyn Park, Minnesota, USA ([email protected]).