Saturday, November 27, 2010

Saturday, November 20, 2010

Readings for Wk #11

Web Search Engines, Pts. 1 & 2 / David Hawking

These articles explain how search engines use crawling algorithms to search and index the web. In part 1, Hawking describes how crawling machines are assigned to specific URLs via hashing. If a crawler comes across a URL that is not assigned to it, it then sends it to the correct crawler. Indexers first scan and then sort documents containing specific words and phrases. Like crawlers, indexers are also assigned to specific URLs to manage the volume documents that will be analyzed.

Current Developments and Future Trends for the OAI Protocol for Metadata Harvesting / Shreeves, Habing, et al.


OAI-MPH was created to facilitate access to online archives via shared metadata standards. These shared standards allow users from different organizations or users of different systems to easily share resources. The involved repositories use metadata standards such as Dublin Core, XML, etc. In the future, OAI-MPH will work towards improving its registry to be more searchable and providing better descriptions.




The Deep Web: Surfacing Hidden Value / Bergman

When performing an internet search, a typical user is only scratching the surface of the web. According to Bergman, most are getting just 0.03% of what is actually available. The "deep web" is the other 99.97%. Many of these sites are made up of company/business intranets, specialized databases, archives, & repositories. This article is ten years old and I wonder how much of this information has changed because of the sophistication of current search tools. I do believe parts of the web remain "hidden" but they're not as inaccessible as they once were.


Comments from Wk #11


http://nrampsblog.blogspot.com/2010/11/unit-11-web-search-and-oai-protocol.html?showComment=1290312661547#c4758384399719463169

Monday, November 15, 2010

Saturday, November 13, 2010

Comments from Wk #10

https://lis2060notes.wordpress.com/2010/11/06/reading-nov-15/#comment-20

http://maj66.blogspot.com/2010/11/week-10-readings.html?showComment=1289710758965#c2196093730956100326

Readings for Wk #10

Digital Libraries: Challenges & Influential Work / Mischo

(This article really made me appreciate how far we've come in digital libraries technology. When I started college in 1991, I had to do a research paper for sociology class. I spent 3 days researching my subject in two different libraries and then had to make a 15 minute appointment to have access to a specific database. It's amazing what has happened in just 20 years.)
Mischo  gives a brief history of digital library projects and why they were developed. Digital libraries were created out of the need to make large amounts of information housed in several different places/systems easily accessible via simpler portals. Like most projects of this scale involving and effecting several fields, it was primarily funded by the government and launched at a few select university libraries. The most surprising thing to me was that the early stages of this project were undertaken during the early age of the WWW. Thanks to this group of developers, programmers, engineers, and libraries, anyone can just visit ProQuest, Muse, or Google Scholar to download books, articles, etc. on almost any subject instead of visiting 3 different libraries to use specialized machines or databases.


Dewey Meets Turing: Librarians, Computer Scientists & DLI / Paepkce, Garcia-Molina, Wesley 


This article explores the mostly harmonious relationship between librarians and computer scientists in the context of Digital Library Initiatives. Working together on this project made sense in so many ways initially because both understood the need to build collections that could "search[ed], organize[ed], and browse[d]." However, with the rise of the Web, both groups had to adjust their thinking on how to implement many of their goals. Computer scientists were naturally drawn to the breakthroughs made possible by the Web (machine learning, links everywhere - not just local, etc) while librarians had to grapple with higher prices for online journal content. As this relationship has evolved since the early DLI projects, librarians and computer scientists have been able to learn from each other. Computer scientists have collected websites of similar topics into hubs. Librarians can now help these computer scientists manage their scholarly publications online.


Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age / Lynch


Institutional repositories are gaining popularity for several reasons: metadata standards have been implemented, the price of online storage is cheap, the high price of serials, & it promotes scholarship at the institutions. Lynch uses MIT's Dspace as a model repository that utilized open source software and corporate partnerships (in this case with Hewlett-Packard). While creating their own repositories can lower costs for institutions/libraries (cutting out contracting with other firms to handle digital storage), Lynch warns them to stay on mission. First, don't use the repository to control or impose ownership over student/faculty/researchers intellectual property. Lynch states that successful repositories "are responsive to the needs ...and advance the interests of campus communities and of scholarship broadly." Second he says that repositories can't be slowed down or burdened by heavy policies. Libraries, faculty, researchers must cooperate on making policies that don't advance one group's agenda over the others. Third, institutions must be committed to maintaining & funding the repository after it's established.