What’s Wrong with Google’s Enterprise Search Security? (Part 1)
I just stumbled onto a post from December on the Google enterprise search blog arguing that of the two primary methods for implementing document-level search security, ACL indexing (early binding) and search-time result by result checking (late binding), only the latter is truly secure. Sounds to me like a prime example of a vendor trying hard to make a virtue out of a serious product limitation (Google search appliances currently do not support the early-binding method):
“While we agree with Mark [reference to this article] on some of the benefits with using early-binding security filtering, there are certain limitations that make it impractical (if not impossible) to use for most deployments today”.
Contrary to what the Google spokesperson says in the quote above, there is actually broad consensus in the industry that of the two methods – early binding is in most cases the most practical – and a combination of both methods should be used for high requirement deployments.
There are multiple reasons for this:
1. In many instances late-binding – which requires a network request for every top result considered – is way too inefficient to be practical. When indexing emails or SharePoint team sites, most people have permissions to access just a very small portion of the content. With the late-binding method, in order to be able to return the results that the end user has permission to see (let’s say 10 results), the search application needs to consider and check many more results (possibly hundreds) until finding 10 that they are allowed to see. So, in this example, you would have hundreds of network requests for each query, adding tremendous load to the network infrastructure and creating very high latencies – in some cases up to minutes.
The search administrator is then confronted with a dilemma, either set a high timeout and allow searches to run for minutes, or set a low timeout and take the risk of returning partial and inconsistent results. This problem is compounded when using search results clustering or results de-duplication, as both techniques require a higher number of results to be retrieved.
2. This inherent inefficiency described above actually makes the late-binding approach highly insecure. Using a stopwatch, a malicious user can easily craft probing queries and guess that there is content matching these queries by just checking the response time. Sensitive information can be obtained this way, for example by using queries like: “Firing John Doe”, “acquiring company X”, “chemical compound Y”, etc.
3. In addition, late binding often requires passing around the user credentials. In the standard implementation of this method, the user is required to pass their username and password to the search application, which in turn will use them to check each result. This is not only cumbersome (because of the need to re-enter credentials instead of leveraging integrated authentication) but also runs counter to the security policies of many IT departments.
In contrast, early binding can provide very good security, especially when gathering user groups at search time. A standard implementation takes the username and query at search time and uses an ACL such as Active Directory to identify what security group a user belongs to. In this case there will be no latency in terms of user permissions (but still some latency in terms of document ACLs). User permissions are often the most important part of the authorization process. For example, if an employee is fired, their access will be revoked immediately. Changes in the document ACLs are rarely time-sensitive, because these documents were already accessible in a near past with the old ACLs.
If these changes in document ACLs do happen to be critical, then the best approach is to combine the two security methods. Very few vendors (Vivisimo is one of them) support this most accurate solution. The advantage of using early binding in combination with late binding is that it guarantees that not many documents will have to be thrown away (only those whose security has changed in last X minutes or hours) and therefore guarantee a consistent response time and a truly secure deployment.The ultimate solution is actually to do early binding with a very low synchronization time (in the second or minute range).
If you want to read more about security, you can download a technical white paper on it here. (Yes the marketing people make you register to download it, but unless you ask them to contact you, they won’t!)
Stay tuned for part two…
Technorati search for links to this article
Post this article to Digg (must be logged in)
Post this article to del.icio.us (must be logged in)
Post this article to Reddit (must be logged in)
Post this article to Furl (must be logged in)
Post this article to Spurl (must be logged in)
[...] A competitor asks whether GSA’s approach to document security remains too trivial (all caveats about casting stones apply)… [...]
[...] clustered within an acceptable response time. If user authentication is an issue (note discussion here), then the response time should include the time for the search engine to verify that the user can [...]
[...] this is changing. When administrators can rely on error-free operations in terms of security, there is no reason, except for lack of budget, ambition, or imagination, to withhold the most [...]
[...] searching on the topics as it’s very important to your solution. I’ve found some interesting articles on this [...]