Link Shim - Protecting the People who Use Facebook from Malicious URLs
As a member of the Site Integrity Team at Facebook, my primary goal is protecting users from spammy or malicious content. The Site Integrity Team has built a bunch of powerful tools over the years to help us in this fight. One of our most important tools is a system we call "the link shim," which has been around since 2008. This is how the link shim works: every time a link is clicked on the site, the link shim will check that URL against our own internal list of malicious links, along with the lists of numerous external partners including McAfee, Google, Web of Trust, and Websense. If we detect that a URL is malicious, we will display an interstitial page before the browser actually requests the suspicious page. This interstitial page serves a number of purposes, including:
We want to make sure we don't send users to a web site that we know (or suspect) is spammy or malicious. Being able to run a check at click time (i.e. when a user clicks on a link) enables us to have more sophisticated classification than what we have at display time (i.e. when the link is displayed). In addition to our own internal list and integration with external blacklists, we use advanced machine learning classifiers to check the authenticity of the sender along with a slew of other inputs.
The link shim also allows us to protect users who consume content via email. If we relied on display-time filtering or other means alone, we would not be able to retroactively block any malicious URLs that had been sent over email. To help defend against this threat, all links to non-facebook.com URLs in email are rewritten to first go through the link shim. By building our system to use click-time checks, users clicking links in their notification emails will still be prevented from seeing malicious or spammy content.
Protect Privacy and Identity
Sometimes, the URLs on Facebook themselves contain private information. For example, your Timeline may have a vanity URL - mine has the URL "https://www.facebook.com/mkjones." Without the link shim, when you click on a link posted to your Timeline, your browser would send that URL in the referrer to the 3rd party site, revealing whose profile you were looking at when you clicked the link.
Currently, it's not feasible to randomize our URLs such that no information is leaked if a 3rd party sees them, so instead we use the link shim's address as the referer instead of your Timeline URL (or the URL of whatever page you were on), to protect your information. By using the link shim as the referrer address, it's easier to ensure that the information in the link shim url does not contain personally identifying information.
Enable More Accurate External Analytics
The most common way that website owners understand how people find their site is by looking at the referrer header, which tells them how a visitor arrived at their site. However, there's a caveat to how this works - when people are on an HTTPS page and click a link to an HTTP page, the browser doesn't send a referrer header.
With an ever increasing number of our users using Facebook over HTTPS, this becomes a problem. A significant percent of clicks on Facebook will be incorrectly recorded by the destination site as being of unknown origin, when in fact they came from Facebook. However, we can fix this with the link shim if we always serve it over HTTP. Similar to above, by routing the click through link shim there will be anonymous Facebook referrer rather than a referrer from an unknown source.
How does it work?
So, how does the linkshim actually work? It's an endpoint accessible at facebook.com/l.php or facebook.com/l/, that takes 2 parameters: (1) The redirect URL, and (2) A user-specific hash. Everything would work just fine without this hash, if we simply redirected to the specified URL assuming it was safe. However, by not including the user-specific hash we'd create a security hole called an "open redirector". Endpoints that redirect to arbitrary URLs can easily be exploited by malicious actors - if someone sees a facebook.com url, they are likely to trust it without regards to the redirect URL itself.
To avoid being an open redirector, we generate a hash for each link shim url that's user specific. Then, when the person loads the interstitial link shim page, we check that the hash is valid for her. If it is, we allow her to access the site requested - but if not, we show a warning page like this:
Additional Privacy Protections
To avoid external parties trying to identify which pages a given person accessed on their site based on their link shim hashes, we also randomize this hash parameter - so one person has many different valid hashes at any given time, and are likely to get a unique hash for each click they make. This way, we ensure that not only is it impossible to determine who they are, but the external site is unlikely to be able to determine if they're even the same person that clicked on another link an hour ago.
Hopefully you've never encountered the link shim - but if you ever do, this should give you an idea of what's going on.
Credit goes to Chris Putnam, Jordan Moncharmont, Clément Genzmer, Wanhong Xu, and countless others for first conceiving of the idea for the link shim, and bringing its advanced capabilities to fruition over the years.
Matt Jones, member of the Facebook Site Integrity Team, is nuking spam with a little help from his friends.