, and
tags, text content, and URL addresses of web pages are used. All the terms from each of the above mentioned tags and URL addresses of the relevant web pages in the training set are taken. After term extraction, stopword removal and Porter’s stemming algorithm [44] are applied. Each stemmed term and its corresponding tag or URL pair forms a feature. For example, a term “program” in a tag, in a