Read-Only Blog

A decade ago when anti-phishing mechanisms started to be added to web browsers, I remember disabling these features immediately thinking sarcastically: "Great, now every site that I visit will be sent to a third-party for validation. Thank you but no thank you!" I thought that the privacy trade off was not worth the price and that I was better off simply being careful about the site I visit and the links I click.

You could say that I was arrogant but in those days, phishing attacks were not very sophisticated and fairly easy to detect. In particular they were usually not targeted. I was living in France at the time and since most phishing campaigns were aimed at Americans, it was obvious to me that the US Internal Revenue Service did not really owed me any money or that I did not need to reset my Bank Of America password since I did not have any account there!

However things have changed since then. Phishing attacks have become cleverer, leveraging the trove of information stolen from Yahoo!, LinkedIn, DropBox and many other widely used online services. Spear phishing attacks against high value targets are becoming frequent. And I now live in the US, meaning that I fall right into the target audience of scammers. Wonderful!

So I've decided to see what I could do be about it. My first idea was to revisit the anti-phishing options in Firefox. How does it work? Here is what this Firefox support page has to says:

There are two times when Firefox will communicate with Mozilla’s partners while using Phishing and Malware Protection for sites. The first is during the regular updates to the lists of reporting phishing and malware sites. No information about you or the sites you visit is communicated during list updates. The second is in the event that you encounter a reported phishing or malware site. Before blocking the site, Firefox will request a double-check to ensure that the reported site has not been removed from the list since your last update. This request does not include the address of the visited site, it only contains partial information derived from the address.

Using a locally downloaded list is a great start but it leaves a few open questions. Certainly Firefox cannot refresh the entire list of phishing and malware URLs regularly, there must be millions of entries in that list by now and it is probably growing at an exponential rate. Is it why they need to "double-check" online, to query a more up-to-date or comprehensive list? But then are we back to square one where the "partners" get the URLs of at least some of the sites you visit? And who are those partners anyway? The first clue to the answer of these questions is at the end of Firefox's page: "The Google Policy Privacy explains how Google handles collected data." That seems to indicate that Google is somehow involved in this. And this is confirmed further down the page when it says: "To request removal from the list of reported phishing sites, use this form provided by Google." Now we are going somewhere. Firefox is using the same mechanism as Chrome, known as "Google Safe Browsing". The service's API documentation explains how it works pretty well but an even better summary can be found in this Chromium Blog post. I am going to try to summarize it even further. When anti-phishing is enabled, the browser downloads from Google a list of 32-bit hashes of normalized URLs for know phishing sites. 32-bit is not enough to prevent collisions with legitimate sites URLs but it is enough to be downloaded regularly without requiring a lot of bandwidth or storage. Also it is enough to avoid the need for an online check for the vast majority of pages that you visit.

However, when a visited page's 32-bit hash has a match in the list, your browser cannot be certain that it is one of the URLs flagged by Google due to the hash collision risk explained before. So what the browser does is that it sends the 32-bit hash to the Safe Browsing server which returns the 256-bit hashes of the bad URLs. If 256-bit hash of the URL that you were about to visit matches one of the 256-bit hashes received, the browser can pop a warning confidently since the risks of collisions with 256 bits is dramatically reduced, not to say inexistent. Of course you have given some information to Google, 32 bits of information to be precise, but what does that amount to? 32 bits represent about 4.3 billion possible values. It is hard to know for certain how many pages are in Google's index but this site claims that there were around 30 trillion indexed pages in 2014 and that the number back then was almost doubling every two years. So now that we are starting 2017, it gives us a rough estimate of more than 10,000 pages per 32-bit hash. Of course not all these pages have an equal probability to be the one that you are visiting, and Google could probably significantly reduce the number of candidates by using the rest of the information it knows about you (location, language etc.). But my point is that they do a fairly good attempt to anonymize your private data in this instance. In the end it is up to you to decide if you want to trust this browser feature or not. But given that the alternative is being exposed to phishing attacks, I personally chose to re-enable this feature to defend myself against a risk that I know exist, rather than preventing a privacy violation that could exist.

Short URL for this post: https://lepl.us/r