Subscribe to: Posts Comments Photos Links All feeds in one 237 Posts and 349 Comments till now

Explanation of Metasearch

Definition and Terminology

Metasearching, in the library setting, “is a process in which a user submits a query to numerous information resources simultaneously. The resources can be heterogeneous in many respects: their location, the format of the information that they offer, the technologies on which they draw, the types of materials that they contain, and more. The user’s query is broadcast to each resource, and results are returned to the user” (Sadeh, Metasearching and federated searching section, para.3). Some metasearch engines are very simplified, but others allow for a lot of custom configuring, including changing the style to match the library’s existing web page, tweaking which databases and under what circumstances will be included in a search, and tweaking relevance ranking- including choosing the fields and databases that will contribute to a higher ranking. In this way, the librarian can direct the results a user gets, even when he or she can’t be there at the time of a search.

There are many names for searching multiple databases: metasearching, federated searching, cross database searching and more. Many authors treat these names interchangeably, but according to Sadeh, metasearching and federated searching, at least, are distinct terms. Metasearch employs “just in time” processing, meaning that no pre-processing of the data occurs. With federated searching, data is pre-processed and indexed for faster searching (2007). Not all authors agree on a definition, but in the library setting, meta and federated searching and other terms mean the same thing: making all library resources searchable through a single interface.

Implementation

There are many different choices available to a librarian looking for a metasearch engine. Some metasearch engines run on a local server and require a dedicated tech team, and some are hosted off site. Most require a contract with a vendor that includes, among other things, an ongoing service contract to keep all links to various databases intact. As Paula J. Hane (2003) puts it, metasearching “certainly is software, but it’s best consumed as a service. A federated search engine searches databases that update and change an average of 2 to 3 times per year. This means that a system accessing 100 databases is subject to between 200 and 300 updates per year—almost one per day! Subscribing to a federated searching service instead of installing software eliminates the need for libraries to update translators almost daily so they can avoid disruptions in service” (para. 6). Some databases have an Application Programming Interface (API) or another method to query and get stable responses from a metasearch engine. If this interface is stable, it can greatly reduce the need to tweak the metasearch engine periodically. However, databases have been historically reluctant to provide such an interface, because they want the user to use the database’s native interface. In this case, programmers have to resort to processing the HTML of a database’s search results into a machine readable format (known as screen scraping), which is prone to breaking.

A new development in metaseaching is the ability to pre-index selected content, which speeds up searching. This ability is available for locally owned content (such as the library catalog and any full text resources the library generates) already. However, some database vendors are making their metadata available, either negotiated for at the time of database purchase, or as an add on. Jonathan Rochkind (2007) says:

Publishers and vendors have not necessarily been eager to let libraries have the metadata, and libraries may not have had the infrastructure in place to give it. But content providers are starting to provide this information to Google and Google Scholar. Everyone wants to be found on Google, and this requires making sure Google has the ability to index your metadata and full text. EBSCOhost Connection and Gale AccessMyLibrary have both put their metadata on the public web for indexing by Google and other spiders. (Why cross-search? section, para. 4&5)

As vendors start to open up to the idea that providing their metadata will make their content easier to find (and therefore more valuable) pre-indexing of data may become a more common practice.

An essential component to a metasearch engine is an OpenURL resolver. “This software allows library patrons to move from citations in one database to full-text content in another. These technologies work together to create a pathway that begins with the user entering query terms and ends with the delivery of full-text content. (2007, Lindahl, p. 219) With a functioning URL resolver in place, a user can begin a search in a metasearch engine and end with full text. This is quite an improvement from the old process of citation chasing, especially for forays into a research topic. The two technologies of metasearching and OpenURLs are so intertwined that they are often provided by the same vendor, though they don’t have to be.

Trackback this post | Feed on Comments to this post

Leave a Reply