Thursday, February 23, 2012

WSS 3.0 - Search Engine Errors

So I was setting up a sharepoint (WSS 3.0) site for a customer, and started getting the following errors in the event log:




Log Name:      Application
Source:        Windows SharePoint Services 3 Search
Date:          2/23/2012 10:48:05 AM
Event ID:      2436
Task Category: Gatherer
Level:         Warning
Keywords:      Classic
User:          N/A
Computer:      xxxx.xxx.local
Description:
The start address cannot be crawled.


Context: Application 'Search index file on the search server', Catalog 'Search'


Details:
The crawler could not communicate with the server. Check that the server is available and that the firewall access is configured correctly.   (0x80041200)


AND



Log Name:      Application
Source:        Windows SharePoint Services 3 Search
Date:          2/23/2012 10:48:05 AM
Event ID:      2424
Task Category: Gatherer
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      xxx-xxx.cds.local
Description:
The update cannot be started because the content sources cannot be accessed. Fix the errors and try the update again.


Context: Application 'Search', Catalog 'index file on the search server Search'





I looked around and thought this was related to DB permissions.. there are plenty of google's that indicate this, but I've done this often enough to know how to setup the permissions on service accounts.

What it turned out to be, and what I could not seem to find anyone else that discovered the same (which is why I'm posting this blog) is that it was due to the site being unavailable via HTTP.  I had locked down the site to only allow HTTPS.. you can see in the above error messages that the protocol being used is STS3.. not STS3S.  



If you review the following link, you'll see something like this:

The STS3 Protocol
The STS3 protocol is used for crawling SharePoint content without having to have URL links that lead to every possible content item in the web site. This protocol is used to index WSS 3.0 server farms (which MOSS 2007 is layered on top of ) STS3 protocol achieves this by using the Site Data Web service to determine all the content in your SharePoint site.
Some examples:
sts3://www.yourdomain.com/
sts3://www.yourdomain.com/teamsites/
sts3://portal.yourdomain.com/
The protocol handlers (that ship out-of-the-box with MOSS 2007) translate the content source url from sts3:// into a web service call sts3://www.yourdomain.com is actually crawled on the web front end by using the web service http://mysite.yourdomain.com/_vti_bin/sitedata.asmx ( the STS3 SOAP call is made using HTTP protocol on TCP Port 56737 )
Note, that if you are using SSL on your Site host headers, and you do not expose the pages through non-SSL URLs, the STS3 protocol should not be used, and you should change your content source to use the STS3S protocol instead.



So yeah, of course it's trying to access the site via non-SSL, and is failing.  Not user permissions related AT ALL.


The fix for this was to go in to my Central Administration > Operations > Alternate Access Mappings, and look at my URLs.  Yep, my default zone was http://xxx.local.  I changed it to be https://xxx.local.  (I also edited my HOSTS file to point to the internal IP for this website, as it would normally resolve to the public IP which is unrouteable from the server itself.)  


Voila, search errors went away and all is well in the land.  I hope this helps someone else.