After the Snapchat exploits, we have been getting questions from concerned people wondering what we are doing to protect our system from this kind of data leakage.
In the early days of instant messaging users found each other through word of mouth or through other out-of-band communication. Instead of searching for your friends you asked them for their username or in some cases a long unique identifier (yes icq we are talking about you) which you then had to type in to your IM client in order to talk with them. While tedious this system is obviously still in use for most IM’s and other internet services like email and web pages (remember when we didn’t have search engines?).
You have likely already done the manual labor of finding your friends at least once by using social networks like Facebook, or in the case of mobile phones, by adding them to the phones address book. Lets be honest here, today most of us don’t have the patience to do all that work over and over again each time we add a new social application. If we can’t get it working within a few minutes, we move on to the next thing.
That is exactly why contact discovery is so common and quite necessary.
The basic principle of contact discovery is that an application extracts identifiers from a contact list and then uploads it to a server in order to match it against a list of known users. Implementations differ in how they upload the data, where some send everything in plain text others will use hashes or better yet hashes sent over encrypted connections. For any matching user the server will then reply with an identifier that the application can use in further contact with that particular user. Now depending on the implementation this is a potential problem.
When building a user search the easiest way to implement it would be to do something like the following:
- The client sends the phone number 123456 to the server which matches it to user abcd and returns abcd to the client.
- The client then sends the email foo@bar which again matches the user abcd so the server responds abcd.
- The client now knows that abcd = 123456 and foo@bar.
A malicious client could now repeatedly and systematically ask for new matches until it had managed to assemble a huge database of users and their information. Which is what happened to Snapchat.
A small change to the servers response could completely avoid this problem. By giving each identifier a unique response it is impossible for the client to know which user a certain identifier belongs to. Now this solution wouldn’t work for a service with public profiles, like Facebook, but it happens to work quite well for an instant messenger where a client only needs to be able to send a message to a given address.
Another potential problem lies in the data that the client sends to the server during contact discovery. Even if the client sends the data hashed (obfuscated), the server will still get access to information that some people might not want that server to have. Although we wouldn’t be able to connect an email to a specific user unless that user had uploaded their email, we could create a huge database of email addresses, which in itself is valuable to certain companies (read: spammers). In a perfect world we could have contact discovery without sharing any information with the server, but unfortunatelly we don’t know of any practical way to do this.