It’s All About the Metadata, Baby: Search Engine Optimization On The Intranet
Search engine optimization (SEO) is a hot topic these days. SEO consultants are paid well for their expertise in telling a company how to improve its web search engine ranking. A high rank often means more traffic, which means more business, which means more money.
However, SEO consultants focus primarily on the World Wide Web. After all, it’s only the public that needs to find information on the company’s website, right?
Wrong. This fact is well established. Vivísimo and other vendors entered the enterprise search market years ago and their growing number of clients shows that employees need to find data within the company’s intranet sites and storage networks using a unified, easy-to-use interface.
One thing SEO consultants often stress is the use of metadata on a page. A search engine uses metadata – literally data describing data – as a way of presenting to the searcher a description of the content without having to analyze the content to extract information such as an author, date of last modification, subject, keywords or a snippet. This can reduce the amount of time it takes to index a page or site. More important, it improves the relevance of a result by optimizing the document for the way that search engine crawlers and indexers work. Metadata is thus a key part of search engine optimization.
Web crawlers such as Google and Yahoo are generally not customized to crawl a specific website or document format. These crawlers extract as much information from a document’s metadata as possible, e.g. the <meta> tags in HTML or the information embedded in PDF files. These crawlers probably cannot read an obscure format document or a customized XML format contrived in house.
This is where enterprise search engines like the Vivisimo Velocity Search Platform excel – behind the corporate firewall. They can be configured to read an endless number of formats. An administrator can even customize the solution to pluck metadata from a dynamic area of a page so that even pages that lack header metadata can have extended metadata displayed in a search result.
Corporate web designers can assist their administrators by using HTML class or id attributes to define key areas where metadata or other information that would be useful to a searcher resides. A well-structured, database-backed content management system (CMS) is beneficial, primarily because it can be crawled in two ways: Web and database. Velocity can obviously crawl web sites, but it can crawl databases even faster.
A major hurdle for any search engine crawler is Javascript links. Some CMS solutions, both commercial and homegrown, use Javascript links for navigation. Velocity and other search solutions can handle Javascript links, but doing so requires some extensive configuration and testing. It is much easier and less time-consuming to use a CMS system that uses standard anchor tags with href attributes.
Another major hurdle for crawlers is Adobe Flash content. While Velocity can index Flash content, it is often at the mercy of the Flash content designer. Extensive use of Flash reduces the crawl-readiness of a site, but fortunately, the presence metadata on the HTML page used to display the Flash content can offset the inherent precariousness.
Human-friendly URLs, shortened URLs for easy vocal sharing and canonical URLs when content can be accessed through several different URLs are examples of important coding and design practices that every SEO will encourage. However, general content policies such as providing metadata on all content such as HTML, PDF, word processor and spreadsheet documents, including most important optimization of all – excellent content – can improve the relevance of intranet searches also.
As evident, SEO is not just for the Internet. Using general SEO practices on intranet sites is just as important and is often overlooked. The heuristics of SEO are just as applicable behind the firewall.
Technorati search for links to this article
Post this article to Digg (must be logged in)
Post this article to del.icio.us (must be logged in)
Post this article to Reddit (must be logged in)
Post this article to Furl (must be logged in)
Post this article to Spurl (must be logged in)
[...] Vivísimo’s Search Done Right™ blog posted an article I wrote a while ago entitled It’s All About the Metadata, Baby: Search Engine Optimization On The Intranet. [...]