Under the Hood: Building the App Center recommendation engine

October 3, 2012 at 11:05am

As more apps on Facebook Platform have launched over the years, the types of apps available have become more diverse, making it crucial that people see the most relevant and highest quality apps in channels like news feed and App Center. 


While news feed has always functioned as a recommendation engine, the App Center is the latest way for people to discover apps, and it's increasingly becoming a prominent channel for developers to distribute their apps. On average, 220 million people visit the App Center each month, and those visitors are 40% more likely to return the next day.


We built the App Center to give the growing audience of app users a central place on Facebook to browse apps. However, given the multitude of apps that use Facebook, recommending the right apps to the right people is a tough challenge. We needed to build a system that could handle large-scale data and traffic, respond quickly, and incorporate user feedback in realtime.


The goal is for curation of the App Center to be driven by quality and personalization, instead of editorialization. Just as with news feed, personalization in App Center will improve over time as people and their friends engage with more apps. 


Building a recommendation engine 

To efficiently solve this problem, we built a recommendation engine directly into App Center, so that, just as with news feed, each person would have a personalized experience. The recommendation engine powers the App Center and helps it learn people’s preferences in order to serve them with app recommendations that are timely, socially relevant, and unique to them. This allows a more diverse set of apps to become discoverable, particularly those in harder to find or up-and-coming categories. 


The system follows an aggregator-leaf architecture—very similar to that of a search engine. Because we have a lot of data, it is necessary to partition the objects into multiple subsets (shards) where each leaf node is only responsible for one subset. The aggregator acts as a central controller, receiving the recommendation request from the front end web server and distributing to leaf nodes. Each leaf node then finds a set of best candidates from the objects stored on the local machine and returns them to the aggregator. The aggregator then performs a final merge and returns the best results to the client. 


After that, the frontend collects user feedback, which is then integrated into the app recommendation engine. We scale this system in two ways: The first is to increase the number of shards so that we can handle more data. The second way is to have multiple replicas so that we can handle more traffic. Using replicas also adds redundancy to the system, which allows us to tolerate the failure of some machines. 


Determining high quality 

Growth in the App Center is tied to quality, and we determine that quality based on user ratings and positive/negative user signals for an app over time. 


In order to accurately measure quality, we developed a system that randomly surveys the user to rate an app shortly after we detect that the user has used the app. Then, when we compute the average rating for an app, we include a confidence adjustment based on the number of ratings the app has received. 


We found that the number of daily active users (i.e. the average number of users who used the app in a day) was a good measure of how large the app is, while the number of monthly active users could be inflated by spikes of activity during the month. So we settled on a formula for app quality that is primarily a function of its average rating as well as average daily active users. 


Algorithmic elements

From the algorithmic point of view, the App Center recommendation system has three major elements: candidate selection, scoring and ranking, and real-time updates. 


The key to candidate selection is efficiency and high recall. We use several heuristics to choose promising candidates, the first being the selection of popular items based on a user’s demographic information. The second heuristic we use is the selection of social items, because we believe that people are generally interested in their friends’ activities. The third heuristic is to select items related to objects liked or interacted with by the user in the past.


Once we obtain a set of candidates, we fetch their features from local storage and calculate ranking scores for them. A good scoring function should be able to capture high order interactions with three types of features. 


The first type is explicit features we can obtain directly, like demographic information about the user. The second type is dynamic features such as number of likes and impressions for objects. The third type—learned latent features—is more interesting. These features are learned from the user-object interaction history, which can capture user preference and object flavor.  


The underlying principle of learning latent features is low-rank approximation of matrix. The basic problem is to find out the values of missing entries for the user object response matrix.  The idea is to approximate the response matrix using the product of two low-rank matrices. Each row of matrix U is the latent representation of a user and captures the intrinsic taste of a user. Each column of matrix O is the latent representation of an object. It reflects the flavor of that object. The dot product between these two vectors is the predicted response from the user to the object. 


Remember, we have more than 950 million users, and even more objects. Our matrix is huge, and the major challenge is how to learn the latent features efficiently. We developed algorithms to compute the latent traits given the huge amount of historical data and update them in real-time as new user feedback comes in. 


This ability to do real-time updates as new objects and events come in is one of the most important features of recommending the best apps for people. When feedback comes in, we need to do several things. One is to update the index so that new objects will be available for candidate selection. The new actions from each user are added to the index in real-time so that friends’ activities are immediately available for recommendation. We also update the user history so that we can make recommendations based on user’s latest activities. The dynamic features are also updated so that the current counts for shares, likes, impressions can be accurately used for scoring. The latent features are also updated in real-time, so that the system can learn user taste and object flavor based on latest activities.  


The App Center has been available to people worldwide since August 1, 2012, and we will continue to make updates, such as the recently launched My Apps page, as we build a personalized App Center and app recommendation service for each person on Facebook. 


Wei Xu, Xin Liu, TR Vishwanath, and the open graph engineering team all worked together to integrate the recommendation engine and App Center.