Search Engine Optimisation

A brief article {18/04/04}

Introduction

Search engines are the primary tools used by Internet users to find sites and pages that match with what they want to view on the World Wide Web. In the old days there were directories (www.yahoo.com being the best known example), which searched a man-made database of sites. These portals and directories have lost ground over the last few years (but some are still very important, see the later passage on dmoz). These sites have been largely replaced by true search engines that use software to add constantly to a database of sites and pages. The best known and most visited of these engines is www.google.com. Google has become, over the last few years, maybe the most important URL on the entire Internet, for millions of net users it has become the first port of call in any browsing session. Whichever kind of site a user visits the methodology is roughly the same; the front page will contain a search text field. The user inputs a word, phrase or several words that they think relevant and the site using its database and some form of regular expression algorithm returns a result set of URL's that may match the query. These results may number in the hundreds and the first few are the ones that are most desirable to have for site owners and developers. Any Internet business that wishes to make money via online sales needs to achieve high search engine rankings. This being the current state of play on the net the question every web designer and web developer must ask himself or herself is how to achieve good rankings with the search engines and web directories.

How do the engines and directories add sites to their databases?

In the case of directories this is a human driven process. An employee or a volunteer checks links and adds URL's that fulfil a certain criteria to the database. The best example of this approach is www.dmoz.com "the Open Directory Project" This a directory that attempts to follow the "Open Source" model of development with keen volunteers all adding input to a whole. Other examples include www.yahoo.com www.search.looksmart.com www.bbc.co.uk and www.hotsheet.com. In the case of Search Engines proper, software does the job of looking for URLs and sites. A specially written "web crawler" or "spider" checks sites and links automatically adding and indexing content it finds to the database. Examples of Search engines examples are www.google.com www.msn.com (powered by Inktomi) www.altavista.com and www.ask.com. What both types have in common is that they will often have a "submit a site" link where people can add sites for indexing and more often than not a paid submission service which will promise instant addition to the list and a automatically good ranking in search results. Plus both kinds gain new URLs through submission and by new links on already indexed sites.

What are the advantages and disadvantages of Directories over Engines?

Both have plus and minus points. The Internet is vast (3.6 BILLION websites according to the BBC) and only an automated process can hope to keep up with such a huge and rapidly growing pool of information. This is Engine's strength, the web crawler never sleeps and its database can be as up to date as is possible. On the other hand the web crawler has no criteria to judge taste or usefulness; therefore any junk it encounters will be indexed as easily as valuable content. This is the Directories strength its human Data entry people will be checking a site against preconceived criteria. The directory will never keep up with the Internets expansion but it will contain in theory a far higher ratio of "quality" sites. Great, unless the site you're looking for didn't some meet obscure criteria.

Getting added to Engines and Directories

The first step with any new site is the job of going from portal to portal manually adding the sites URL to their "to be indexed" list (there are paid services that do this for you but they are a subject for another time). The first port of call must be dmoz, as all the major search engines index dmoz in their turn, including Google (in fact Google's "new" directory is just dmoz under the Google logo). From there all the major engines and directories must be submitted to. Particular attention must be paid to Google (its importance already being stated) and Inktomi as so many search engines and portals use these services as their own (for instance MSN's search is powered by Inktomi).

How do the various services rank the URLs they index?

The basis of ranking within the various services databases is formed around content and in particular text content. Web crawlers are similar in facility to screen readers as they can only "see" ASCII text. Jeffery Zeldman describes Google as "the blind billionaire" as it brings traffic and therefore sales but it can't see dynamic content or even images (also as the web crawlers work like screen readers it's worth noting that clean non-semantic HTML/XHTML in a page will aid spiders in parsing pages correctly). Counts of particular words and phrases are all part of how a page is parsed by a search engine or portal. Other factors are more controversial. Google's actual ranking algorithms are a hot topic of debate in the web development community. It is believed that Google takes into account how many other sites link to you but exactly what is important to Google is huge area of research and debate. Google's earliest ranking algorithms are actually available on the net but it's a safe bet that if they've made them available they now use something completely different. A useful tool for a web developer to better understand Google's ranking methods is the "Google Toolbar " this is a plug-in for IE that can be set up to show Google's rating of the page it is currently viewing.

How can better rankings be achieved?

Correct Keywords.

To get listed correctly in the search engines each page of your site that you want listed needs to be optimised to the best of your ability. Every page of your site should, if possible, include your keywords in Meta-tags, alt tags and the actual copy. Since the keywords that you decide to target will be used throughout the optimisation process choosing the right keywords is essential. Making these as specific as possible is vital. "Shoes" is not a good keyword but "Italian Leather hand stitched shoes" is. A huge amount of work needs to be done to ensure good, comprehensive keywords are in place.

Title Tags

The title tag of a page is a very important factor to consider when optimising a web page for the search engines. This is because most engines & directories place a high level of importance on keywords that are found in title tags. The title tag is also what the search engines usually use for the title of the listing in the search results. Including one or two of the most important keyword phrases in the title tag will help rank a site higher.

Optimising Page Copy

The copy on web pages is also very important in order to achieve better search engine listings. This is where the modern search engine does its real ranking. Meta-tags are becoming obsolete as search engines concentrate on content. After all I can fill my Meta-tags with keywords that have little to do with my actual content. For best results each page should ideally have at least 200 words of copy on it. There are some cases where this much text can be difficult to put on a page, but the search engines really like it so copy should be increased wherever possible. This text should include the sites most important keyword phrases, but should remain logical & readable. Effort must be made to ensure that these keywords and phrases match those used in other tags (i.e. meta-tags, alt tags and title tags).

Optimising Meta Tags

Meta tags were originally created to help search engines find out important information about your page that they might have had difficulty determining otherwise. For example, related keywords or a description of the page itself. Many people incorrectly believe that good Meta tags are all that is needed to achieve good listings in the search engines, which is entirely incorrect. While Meta tags are usually always part of a well-optimised page they are not the be all and end all of optimising your pages. In the early days of the web people were able to get great listings from optimising just their Meta tags, but the increasing competition for good search engine listings eventually led to many people spamming the search engines with keyword stuffed Meta tags. The result is that the engines have changed what they look at when they rank a web page. Keywords Meta-tags should not exceed 1024 characters including spaces and Description Meta-tag should not exceed 250 characters including spaces.

Optimising Image "alt" Attribute

Images on pages can help site listings too. Each image on a page can include a keyword phrase or two that relates to the image; in "alt" tags. This text will also show up & help those that may have their images turned off when visiting your site. This does not work for all engines, but it certainly can't hurt and may well be a factor in Google.

High Traffic

Here's the catch 22 of SEO: If your site gets lots of visitors then your Search Engine rating will be good but, of course, how does one manage that if you aren't ranked well in the first place? But it does mean that when things get rolling more traffic can generate better rankings, that generates more traffic etc.

What should I avoid?

Frames

Search engines do not index Frames based sites well. In fact it can be a real problem for a site to get good listings with Frames. There are ways round the Frames issue but if possible it's better to just avoid them.

Dynamic URLs

Some search engines do not index dynamic URLs (ones generated by database CMS that contain characters like *-_&=). This does not include the major players in the search engine world so it, at worst, a minor consideration.

Flash/Shockwave

Sites built in Flash, Shockwave and other similar technologies are not search engine friendly. They are not composed of ASCII text that can be parsed by a spider. Therefore Flash sites often struggle to achieve good ratings. This makes it a poor choice for corporate sites.

Image Maps

Some Web crawlers can actually get stuck in the code used to create image maps, this problem will naturally disappear as crawler technology improves, plus good design practices are making the image map rarer. Still, if possible, links are better in plain old HTML/XHTML.

JavaScript for Navigation

Search engines can't follow links that are within JavaScript, so sites will not get spidered unless they have some form of standard HTML/XHTML navigation that they can be followed. A possible solution is to create a "site map page" that uses standard HTML to link to every page on the site. Then a standard HTML link is placed on each page of the rest of the site.

What practices are frowned upon?

Basically anything that is designed to trick the search engines can get a site into trouble. Sites have been and will be blacklisted for the optimisation equivalent of "spamming." Keywords must not be repeated in meta-tags or repeated ad infinitum in the pages copy. The keywords used in tags must match the actual pages purpose/content. Some people believe that invisible text (text containing extra relevant phrases that is the same colour as the page background and is therefore spiderable but not visible to a site user) may get a site blacklisted and my own research backs this up. I have used invisible text on a Flash site to place the site's content in a format that could be indexed but was shocked to see the sites rating plummet afterwards. I spoke to Google on the subject and they denied that invisible text would alter their ranking but the drop was most convincing that this was not quite true. Putting up multiple copies of a site or a page is strongly guarded against, as is multiple site/page submissions over a short period of time to the same engines.

Summary

SEO is a huge topic that is developing all the time and one that all web developers need to be aware of. It is no longer enough to add a few Meta-tags to a site. But instead SEO should be a primary consideration from the very start of site planning, before fingers hit keyboard the sites plans should allow for and indeed promote SEO. It is also an area of a site that needs updating in as regular a timeframe as the very content.