Information Retrieval In unit one, we are introduced to the concept of the inver
ID: 3746543 • Letter: I
Question
Information Retrieval
In unit one, we are introduced to the concept of the inverted index as a fundamental technology in information retrieval systems. The inverted index essentially is an index of words known as terms extracted from the document corpus that can be searched to find documents with the content that the user is looking for. Our text also introduces two extensions to the concept of the inverted index, the biword index and the positional index.
For your discussion assignment:
Select either the biword index or positional index
a. Provide a description of the index that you selected. As part of your description make sure that you describe why and how it is different than the inverted index.
b. Describe both where and when the index would it be used
c. Describe the advantage the index has over the inverted index
Explanation / Answer
I have selected the positional index.
a. In positional index what we do is that we like in Inverted Index but instead of just storing the document containing the word we also store all the positions of word appearing in the document in a linked list connected the document number itself. The inverted index only contains a list of documents containing the word and no other information about that word like how frequent or what is the distance between the words.
b. As the name suggests the main purpose of positional index is to store/give information about the position of the word in the document. So if we are given a query we can break it down as a set of words appearing within a certain "distance" of one another. This can then be matched to the index by making use of the positions being saved in the positional index. So any case in which we are searching for more than one word query is a good place to use positional index to store our data.
c. The only place inverted index shines over positional index is where we know that the queries are one word and one word only because inverted index take much lesser space as compared to positional index. But apart from that there is no case in which inverted index is better than the positional index. Positional index will give much more accurate results as soon as we move to more than one word query. Positional index also has avantage over biword that biword is limited to 2 words where as in positional we can have a query as big as possible. The construction of positional index is also very easy, it is just linked list of linked lists.