<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.3.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: What&#8217;s Wrong with Google&#8217;s Enterprise Search Security? (Part 1)</title>
	<link>http://searchdoneright.com/2007/02/whats-wrong-with-googles-enterprise-search-security-part-1/</link>
	<description>Enterprise Search Expertise, Brought To You By Vivísimo</description>
	<pubDate>Fri, 04 Jul 2008 12:50:32 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.3.3</generator>
		<item>
		<title>By: Search Done Right &#187; Blog Archive &#187; Enterprise Searching To Surpass Web Searching?</title>
		<link>http://searchdoneright.com/2007/02/whats-wrong-with-googles-enterprise-search-security-part-1/#comment-22526</link>
		<dc:creator>Search Done Right &#187; Blog Archive &#187; Enterprise Searching To Surpass Web Searching?</dc:creator>
		<pubDate>Tue, 19 Feb 2008 20:48:17 +0000</pubDate>
		<guid>http://searchdoneright.com/2007/02/whats-wrong-with-googles-enterprise-search-security-part-1/#comment-22526</guid>
		<description>[...] this is changing. When administrators can rely on error-free operations in terms of security, there is no reason, except for lack of budget, ambition, or imagination, to withhold the most [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] this is changing. When administrators can rely on error-free operations in terms of security, there is no reason, except for lack of budget, ambition, or imagination, to withhold the most [&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Search Done Right &#187; Blog Archive &#187; How to Evaluate a Clustering Search Engine</title>
		<link>http://searchdoneright.com/2007/02/whats-wrong-with-googles-enterprise-search-security-part-1/#comment-60</link>
		<dc:creator>Search Done Right &#187; Blog Archive &#187; How to Evaluate a Clustering Search Engine</dc:creator>
		<pubDate>Tue, 13 Mar 2007 20:50:03 +0000</pubDate>
		<guid>http://searchdoneright.com/2007/02/whats-wrong-with-googles-enterprise-search-security-part-1/#comment-60</guid>
		<description>[...] clustered within an acceptable response time. If user authentication is an issue (note discussion here), then the response time should include the time for the search engine to verify that the user can [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] clustered within an acceptable response time. If user authentication is an issue (note discussion here), then the response time should include the time for the search engine to verify that the user can [&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Shad</title>
		<link>http://searchdoneright.com/2007/02/whats-wrong-with-googles-enterprise-search-security-part-1/#comment-10</link>
		<dc:creator>Shad</dc:creator>
		<pubDate>Tue, 13 Feb 2007 22:23:46 +0000</pubDate>
		<guid>http://searchdoneright.com/2007/02/whats-wrong-with-googles-enterprise-search-security-part-1/#comment-10</guid>
		<description>re: your timing attack on late binding.

It only tells the user if content matching the query is available.
This is only a problem if you have a case where very untrusted users
can query against a higher security search index or a search index
with multiple security levels.

It would be very hard to build up a picture of the content of a single
document using this technique.  You don't have an identifier to work
off of.  You cannot be sure that you are accessing the same document.

Examples where this would be useful anyway:
  You suspect the government of developing mind reading dachshunds.
  If you searched for mind-reading-dachshunds and it took longer than
  a garbage query, then you had a good case that they exist.

This timing attack would be quite interesting to try out.  The search
engine can skew results by actually padding out low result searches by
doing another dummy search for something more common.

To maintain a higher level of security; if you have low security
users, they should not be allowed to query higher security search
index.

However....

If you have mixed security level content and are using a single index
for it, then you are already saying it's okay for the user to know
that content exists.  But a low security level user isn't allowed to
see the whole thing.  Content-for-pay would be a good example of this;
you have links describing pay-content, but you can't view it unless
you pay.

An interesting version of this would be where each result is actually
an AJAX call to fill in the details.  By using the actual content
management system's ability to check the user's credentials (AJAX is
client side) you prevent the search engine from having to have extra
logic to manage credentials (and yet thing to audit).  You reveal that
a document exists, and even reveal it's location.  Ideal for a
content-for-pay system.  You advertise without giving away the
information.</description>
		<content:encoded><![CDATA[<p>re: your timing attack on late binding.</p>
<p>It only tells the user if content matching the query is available.<br />
This is only a problem if you have a case where very untrusted users<br />
can query against a higher security search index or a search index<br />
with multiple security levels.</p>
<p>It would be very hard to build up a picture of the content of a single<br />
document using this technique.  You don&#8217;t have an identifier to work<br />
off of.  You cannot be sure that you are accessing the same document.</p>
<p>Examples where this would be useful anyway:<br />
  You suspect the government of developing mind reading dachshunds.<br />
  If you searched for mind-reading-dachshunds and it took longer than<br />
  a garbage query, then you had a good case that they exist.</p>
<p>This timing attack would be quite interesting to try out.  The search<br />
engine can skew results by actually padding out low result searches by<br />
doing another dummy search for something more common.</p>
<p>To maintain a higher level of security; if you have low security<br />
users, they should not be allowed to query higher security search<br />
index.</p>
<p>However&#8230;.</p>
<p>If you have mixed security level content and are using a single index<br />
for it, then you are already saying it&#8217;s okay for the user to know<br />
that content exists.  But a low security level user isn&#8217;t allowed to<br />
see the whole thing.  Content-for-pay would be a good example of this;<br />
you have links describing pay-content, but you can&#8217;t view it unless<br />
you pay.</p>
<p>An interesting version of this would be where each result is actually<br />
an AJAX call to fill in the details.  By using the actual content<br />
management system&#8217;s ability to check the user&#8217;s credentials (AJAX is<br />
client side) you prevent the search engine from having to have extra<br />
logic to manage credentials (and yet thing to audit).  You reveal that<br />
a document exists, and even reveal it&#8217;s location.  Ideal for a<br />
content-for-pay system.  You advertise without giving away the<br />
information.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jerome Pesenti</title>
		<link>http://searchdoneright.com/2007/02/whats-wrong-with-googles-enterprise-search-security-part-1/#comment-9</link>
		<dc:creator>Jerome Pesenti</dc:creator>
		<pubDate>Tue, 13 Feb 2007 19:10:35 +0000</pubDate>
		<guid>http://searchdoneright.com/2007/02/whats-wrong-with-googles-enterprise-search-security-part-1/#comment-9</guid>
		<description>1/ "Late binding" is checking if each search result can be accessed with certain credentials - so whatever cache you get is only valid for a given URL &lt;strong&gt;and&lt;/strong&gt; a given user which will be very limited. "Late binding" is often implemented by doing a "HEAD" request to the actual page passing the username &#38; password of the end-user, i.e., it doesn't involve any ACLs (that's actually one point of Google's post: it can be deployed without having to figure out the underlying ACLs).

2/ My post is a response to Google's post claiming that "early binding" is not secure (because of latencies in the ACL indexing). My point is that in general it's more secure than "late binding". I gather from your first point that we actually somewhat agree here that "caching the ACL" is often not a big deal.

I disagree with you with regard to the performance of "late binding" (which, again, is not really an ACL check). It can be (and often is) &lt;strong&gt;very&lt;/strong&gt; expensive. In the case of Sharepoint &#38; Email search for example, it often requires doing hundreds of network requests&lt;strong&gt; per&lt;/strong&gt; query (see Mark Bennett's explanation in the &lt;a title="original post" href="http://www.ideaeng.com/pub/entsrch/v3n5/article01.html" rel="nofollow"&gt;original post&lt;/a&gt;).

3/ Re-entering user credential is not a strict requirement if you have implemented SSO across all your collections, but it's very common for many security infrastructures (for example for people using Windows Integrated Authentication, see &lt;a href="http://groups.google.com/group/Google-Search-Appliance/browse_thread/thread/b9e65f77f341617c/c153f465222d6076" rel="nofollow"&gt;this post&lt;/a&gt; on the search appliance user group).</description>
		<content:encoded><![CDATA[<p>1/ &#8220;Late binding&#8221; is checking if each search result can be accessed with certain credentials - so whatever cache you get is only valid for a given URL <strong>and</strong> a given user which will be very limited. &#8220;Late binding&#8221; is often implemented by doing a &#8220;HEAD&#8221; request to the actual page passing the username &amp; password of the end-user, i.e., it doesn&#8217;t involve any ACLs (that&#8217;s actually one point of Google&#8217;s post: it can be deployed without having to figure out the underlying ACLs).</p>
<p>2/ My post is a response to Google&#8217;s post claiming that &#8220;early binding&#8221; is not secure (because of latencies in the ACL indexing). My point is that in general it&#8217;s more secure than &#8220;late binding&#8221;. I gather from your first point that we actually somewhat agree here that &#8220;caching the ACL&#8221; is often not a big deal.</p>
<p>I disagree with you with regard to the performance of &#8220;late binding&#8221; (which, again, is not really an ACL check). It can be (and often is) <strong>very</strong> expensive. In the case of Sharepoint &amp; Email search for example, it often requires doing hundreds of network requests<strong> per</strong> query (see Mark Bennett&#8217;s explanation in the <a title="original post" href="http://www.ideaeng.com/pub/entsrch/v3n5/article01.html" rel="nofollow">original post</a>).</p>
<p>3/ Re-entering user credential is not a strict requirement if you have implemented SSO across all your collections, but it&#8217;s very common for many security infrastructures (for example for people using Windows Integrated Authentication, see <a href="http://groups.google.com/group/Google-Search-Appliance/browse_thread/thread/b9e65f77f341617c/c153f465222d6076" rel="nofollow">this post</a> on the search appliance user group).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jan</title>
		<link>http://searchdoneright.com/2007/02/whats-wrong-with-googles-enterprise-search-security-part-1/#comment-6</link>
		<dc:creator>Jan</dc:creator>
		<pubDate>Sun, 11 Feb 2007 21:59:35 +0000</pubDate>
		<guid>http://searchdoneright.com/2007/02/whats-wrong-with-googles-enterprise-search-security-part-1/#comment-6</guid>
		<description>Hogwash.

1) The answer to performance issues for late binding is ACL caching - and no, it's different from early binding.  With early binding I have no choice as to the validity period of the ACL info - with ACL caching I have ultimate control and can determine on a per-content-set basis how sensitive the ACL's should be.  For HR info I may well require a per-query check, on my published white papers I may be happy with the ACL's sticking around for days at a time.

2) Highly insecure???  Like anything, I suppose late binding could be implemented in a way that leaks information, but come on - most companies struggle with getting the ACL's right on their most sensitive content, never mind this.  Besides, the ACL check overhead for most systems is very low - I cannot imagine a user (with a stopwatch no less!) gaining any meaningful information this way.  There are way too many variables here - this point is just scaremongering.

3) I'm not sure where you get the "need to re-enter user credentials" from.  Any reasonable implementation will rely on existing user credentials, be it from the portal or from SSO framework - since you must have a shared authorisation credentials system to make sense of the ACL's in the first place why would you not use it??</description>
		<content:encoded><![CDATA[<p>Hogwash.</p>
<p>1) The answer to performance issues for late binding is ACL caching - and no, it&#8217;s different from early binding.  With early binding I have no choice as to the validity period of the ACL info - with ACL caching I have ultimate control and can determine on a per-content-set basis how sensitive the ACL&#8217;s should be.  For HR info I may well require a per-query check, on my published white papers I may be happy with the ACL&#8217;s sticking around for days at a time.</p>
<p>2) Highly insecure???  Like anything, I suppose late binding could be implemented in a way that leaks information, but come on - most companies struggle with getting the ACL&#8217;s right on their most sensitive content, never mind this.  Besides, the ACL check overhead for most systems is very low - I cannot imagine a user (with a stopwatch no less!) gaining any meaningful information this way.  There are way too many variables here - this point is just scaremongering.</p>
<p>3) I&#8217;m not sure where you get the &#8220;need to re-enter user credentials&#8221; from.  Any reasonable implementation will rely on existing user credentials, be it from the portal or from SSO framework - since you must have a shared authorisation credentials system to make sense of the ACL&#8217;s in the first place why would you not use it??</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: &#187; Posted on CMS Watch - Blogosphere responds to Google&#8217;s appliance upgrade - My Webmaster News Blog</title>
		<link>http://searchdoneright.com/2007/02/whats-wrong-with-googles-enterprise-search-security-part-1/#comment-5</link>
		<dc:creator>&#187; Posted on CMS Watch - Blogosphere responds to Google&#8217;s appliance upgrade - My Webmaster News Blog</dc:creator>
		<pubDate>Fri, 09 Feb 2007 19:17:01 +0000</pubDate>
		<guid>http://searchdoneright.com/2007/02/whats-wrong-with-googles-enterprise-search-security-part-1/#comment-5</guid>
		<description>[...] A competitor asks whether GSA&#8217;s approach to document security remains too trivial (all caveats about casting stones apply)&#8230; [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] A competitor asks whether GSA&#8217;s approach to document security remains too trivial (all caveats about casting stones apply)&#8230; [&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jerome Pesenti</title>
		<link>http://searchdoneright.com/2007/02/whats-wrong-with-googles-enterprise-search-security-part-1/#comment-4</link>
		<dc:creator>Jerome Pesenti</dc:creator>
		<pubDate>Fri, 09 Feb 2007 16:55:10 +0000</pubDate>
		<guid>http://searchdoneright.com/2007/02/whats-wrong-with-googles-enterprise-search-security-part-1/#comment-4</guid>
		<description>This is a good point that I am going to address in more details in Part #2. The short answer is that for the most common repositories, the search vendor should be able to offer connectors handling the security framework(s) properly. Vivisimo Velocity for example provides out-of-the-box connectors to Unix &#38; Windows file systems, Lotus Notes, Sharepoint, Documentum, Email servers, Email archives, etc. For each of these systems, the security framework is known and documented and our software takes care of collecting and mapping out the ACLs properly. The fact that they use different frameworks is not an issue. The only part left to the administrator is to figure what user ids apply (for example if the Lotus Notes ID is different from the Windows ID) in which case an extra call to a directory service might be required, but that's also true of "late binding" anyway.

For unknown or poorly documented repositories (likely to be crawled rather than connected to through an API), "late binding" might be the only practical solution but "early binding" should be used without difficulties with all others (often containing the majority of the enterprise content).</description>
		<content:encoded><![CDATA[<p>This is a good point that I am going to address in more details in Part #2. The short answer is that for the most common repositories, the search vendor should be able to offer connectors handling the security framework(s) properly. Vivisimo Velocity for example provides out-of-the-box connectors to Unix &amp; Windows file systems, Lotus Notes, Sharepoint, Documentum, Email servers, Email archives, etc. For each of these systems, the security framework is known and documented and our software takes care of collecting and mapping out the ACLs properly. The fact that they use different frameworks is not an issue. The only part left to the administrator is to figure what user ids apply (for example if the Lotus Notes ID is different from the Windows ID) in which case an extra call to a directory service might be required, but that&#8217;s also true of &#8220;late binding&#8221; anyway.</p>
<p>For unknown or poorly documented repositories (likely to be crawled rather than connected to through an API), &#8220;late binding&#8221; might be the only practical solution but &#8220;early binding&#8221; should be used without difficulties with all others (often containing the majority of the enterprise content).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: googfan</title>
		<link>http://searchdoneright.com/2007/02/whats-wrong-with-googles-enterprise-search-security-part-1/#comment-3</link>
		<dc:creator>googfan</dc:creator>
		<pubDate>Thu, 08 Feb 2007 16:22:02 +0000</pubDate>
		<guid>http://searchdoneright.com/2007/02/whats-wrong-with-googles-enterprise-search-security-part-1/#comment-3</guid>
		<description>What you failed to cover was how someone would actually deploy an early binding method.  Early binding puts a massive deployment effort on the part of the consultant or IT department deploying the search system. In our search deployment and content management operation, we often find companies who strive to implement early binding-based deployments, but after weeks of "business analysis" and technical prototyping realize that its just not technically practical in their heterogeneous environment.  You see, the real issue is that most (if not all) enterprises don't have one nice ACL store that has the access information for all of their various content systems.  If they did, this problem would be a no-brainer.  The work of some security vendors like CA are trending toward this, but they'll be the first to admit that adoption is very slow and rework high in order to implement centralized or homogeneous distributed policy stores.</description>
		<content:encoded><![CDATA[<p>What you failed to cover was how someone would actually deploy an early binding method.  Early binding puts a massive deployment effort on the part of the consultant or IT department deploying the search system. In our search deployment and content management operation, we often find companies who strive to implement early binding-based deployments, but after weeks of &#8220;business analysis&#8221; and technical prototyping realize that its just not technically practical in their heterogeneous environment.  You see, the real issue is that most (if not all) enterprises don&#8217;t have one nice ACL store that has the access information for all of their various content systems.  If they did, this problem would be a no-brainer.  The work of some security vendors like CA are trending toward this, but they&#8217;ll be the first to admit that adoption is very slow and rework high in order to implement centralized or homogeneous distributed policy stores.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
