Saturday, November 20, 2010

Readings for Wk #11

Web Search Engines, Pts. 1 & 2 / David Hawking

These articles explain how search engines use crawling algorithms to search and index the web. In part 1, Hawking describes how crawling machines are assigned to specific URLs via hashing. If a crawler comes across a URL that is not assigned to it, it then sends it to the correct crawler. Indexers first scan and then sort documents containing specific words and phrases. Like crawlers, indexers are also assigned to specific URLs to manage the volume documents that will be analyzed.

Current Developments and Future Trends for the OAI Protocol for Metadata Harvesting / Shreeves, Habing, et al.


OAI-MPH was created to facilitate access to online archives via shared metadata standards. These shared standards allow users from different organizations or users of different systems to easily share resources. The involved repositories use metadata standards such as Dublin Core, XML, etc. In the future, OAI-MPH will work towards improving its registry to be more searchable and providing better descriptions.




The Deep Web: Surfacing Hidden Value / Bergman

When performing an internet search, a typical user is only scratching the surface of the web. According to Bergman, most are getting just 0.03% of what is actually available. The "deep web" is the other 99.97%. Many of these sites are made up of company/business intranets, specialized databases, archives, & repositories. This article is ten years old and I wonder how much of this information has changed because of the sophistication of current search tools. I do believe parts of the web remain "hidden" but they're not as inaccessible as they once were.


No comments:

Post a Comment