Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Consider the following set of documents. Assume that your indexing system handle

ID: 3528277 • Letter: C

Question

Consider the following set of documents. Assume that your indexing system handles tokenization, and converts all terms to lowercase. Doc 1: Treatment of methicillin resistant staphylococcus aureus. Doc 2: MRSA spread amongst wrestlers, treatment and prophylaxis. Doc 3: Treatment of antibiotic resistant hospital-acquired staph. aureus infection. Doc 4: MRSA: multidrug-resistant staphylococcus aureus Which of the following statements are true? Select one or more: a. Two sets of document vectors are generated for these documents. In one, binary weighting as used as a local weighting metric. In the other, log(type frequency) is used. The results will be the same for any query, regardless of which of these weighting metrics was used. b. If the inverse document frequency (idf) weighting metric is applied, the term "mrsa" will contribute most to the representation od Document 2. c. The cosine metric between the vectors for documents 2 and 3 will be 0. d. Of the terms "aureus", "wrestlers" and "treatment", the term "wrestlers" has the highest inverse document frequency.

Explanation / Answer

b. If the inverse document frequency (idf) weighting metric is applied, the term "mrsa" will contribute most to the representation od Document 2




d. Of the terms "aureus", "wrestlers" and "treatment", the term "wrestlers" has the highest inverse document frequency.