So you want to build a recommendation engine? Whether it is for suggesting videos or developing the latest dating app, recommenders have become a powerful tool for many organizations. Here are a few tips that give a high-level look at how to approach the problem and how to use your available data in building a useful and robust solution.
Tip 1: Begin with a user-less model
The most powerful recommendation engines use massive amounts of user data to find the best matches. But don't start there. An ideal model begins with just the item or items being matched. This allows you to first focus on the matching potential of the item attributes, and it is beneficial for systems that will continuously have new items enter the system. A grocer can't use historical data to recommend a newly release product. Before you leverage user feedback, you also need to have a model that works well enough that customers want to use it.
Tip 2: Use multiple measures
On what basis should Facebook recommend friends to you? The simple answer is to find another users who is "closest" in the network analysis sense, but this can easily be improved by incorporating additional measures. Maybe Facebook finds matches by also including the geographic proximity of two users, whether they are associated with the same school or work, or the similarity of topics they post about. A recommender that includes multiple measures is more robust to overfitting, more flexible as the data evolves, and more likely to produce better results than relying on a single dimension.
Tip 3: Standardize your measures
When including multiple measures, it is important to scale them to a standard metric. This will allow for the evaluation of different models individually and comparably. I suggest giving each measure a shared output of a 0% to 100% match so it is also easy to interpret the results. Then you combine the model outputs into a single, ensemble model where any one measure is not able to have undue influence on the overall recommendation.
Tip 4: Combine your measures
Once you have multiple, scaled models, then you are ready to combine them into a single recommendation engine! This can be done by taking a simple average for each model or by assigning a system of weights. The weighted average has some clear benefits by giving greater emphasis to specific measures that are known to be more important. For example, if Netflix knows that users strongly prefer movies in their native language, then they can give a higher weight to a measure of the movie language. Conversely, if Netflix knows that users slightly prefer movies in the same genre as their most recently watched movie, then the recommender can give a small but positive weight to that measure.
Tip 5: Incorporate user feedback
Bring your recommendation engine to the next level by incorporating user feedback. One way to do this is by adding a measure for how other users behaved in the past. This is called "collaborative filtering" where if previous users made a particular match then this should be suggested to future users. For example, a product on Amazon lists similar products that are "frequently bought together."
A second way to leverage user feedback is by using that to evaluate your model. If you are unsure how to weight two different measures, then you can launch two versions and conduct A/B testing to see which produces better results along a defined objective. The more users interacting with the recommender means more opportunity for incremental improvements, unlocking economies of scale that fine tunes your model over time.
Tip 6: Assume customers are different
By differentiating customers into different subgroups, then a recommendation system can be tailored to the specific preferences of each subgroup. This customer segmentation is common practice in marketing and can be done through observed characteristics (e.g. Google Ads recommending hiking shoes to someone in Colorado) or observed behavior (e.g. Google Ads recommending hiking shoes to someone searching for national parks). Although differentiation increases the complexity of a model, it can lead to substantial improvements in its performance.
If you are interested in exploring this area more, I suggest checking out our recent whitepaper that outlines how to build a recommender for out of stock products (aka "the peanut butter problem"). It dives into an interesting business use case while providing a overview of the data science and modern data platform used to solve the problem. Also check out this solution brief about how we at BlueGranite helped a nonprofit build a matching service for adults with disabilities and employees looking to hire them. Happy developing!