Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

INFORMATION RETRIEVAL Suggest what normalized form should be used for these word

ID: 3887002 • Letter: I

Question

INFORMATION RETRIEVAL

Suggest what normalized form should be used for these words (including the word itself as a possibility)

a. ‘Cos

b. Shi’te

c. cont’d

d. Hawai’i

e. O’Rourke

The following pairs of words are stemmed to the same form by the Porter stemmer. Which pairs, would you argue, should not be conflated? Give a one-sentence reasoning that justifies your response.

a. abandon/abandonment

b. absorbency/absorbent

c. marketing/markets

d. university/universe

e. volume/volumes

A more-like-this query occurs when the user can click on a particular document in the result list and tell the search engine to find documents that are similar to this one. Describe which low-level components are used to answer this type of query and the sequence in which they are used.

Document filtering is an application that stores a large number of queries or user profiles and compares these profiles to every incoming document on a feed. Documents that are sufficiently similar to the profile are forwarded to that person via email or some other mechanism. Describe the architecture of a filtering engine and how it may differ from a search engine.

if you can answer any of these please do

Explanation / Answer

Answer:-

It is better to divider hosts) rather than separate URLs) between the nodes of a dispersed crawl system, as the host address usually has direct correspondence with the physical location of a host while the URLs may have nothing to do with it.

By looking at the URLs we cannot say where the consistent physical machine (bunch) is located, because of many reasons.

1. There are many international domains which are in use through the world, i.e. g o o g l e .com, g o o g l e .net

2. Many countries top most domain registries allow its users to sell the domains to the residents of third countries. For example, it is possible to buy a domain in US zone g o o g l e .us while being an UK resident and not planning to use it for mostly US users and keep in USA.

3. Also, even if purchasing a domain in your national zone to host a website for the local public, it is sometimes better to keep a server abroad for the sake of savings, security and/or other reasons.

Thus, if URL's are distributed by certain URLs, all the nodes will be up swarming the servers all over the earth that will lead to reduced performance, which is exactly conflicting in host partition.

So, It is better to partition hosts rather than Individual URL's.