Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

InspiredbytheexampleofthatgreatCornellian, VladimirNabokov, some of your friends

ID: 3786787 • Letter: I

Question

InspiredbytheexampleofthatgreatCornellian, VladimirNabokov, some of your friends have become amateur lepidopterists (they study butterflies). Often when they return from a trip with specimens of butterflies, it is very difficult for them to tell how many distinct species they’ve caught—thanks to the fact that many species look very similar to one another. One day they return with n butterflies, and they believe that each belongs to one of two different species, which we’ll call A and B for purposes of this discussion. They’d like to divide the n specimens into two groups—those that belong to A and those that belong to B—but it’s very hard for them to directly label any one specimen. So they decide to adopt the following approach.

For each pair of specimens i and j, they study them carefully side by side. If they’re confident enough in their judgment, then they label the pair (i,j) either “same” (meaning they believe them both to come from thesamespecies)or“different”(meaningtheybelievethemtocomefrom different species). They also have the option of rendering no judgment on a given pair, in which case we’ll call the pair ambiguous. Sonowtheyhavethecollectionofn specimens, aswellasacollection of m judgments (either “same” or “different”) for the pairs that were not declared to be ambiguous. They’d like to know if this data is consistent with the idea that each butterfly is from one of species A or B. So more concretely, we’ll declare the m judgments to be consistent if it is possible to label each specimen either A or B in such a way that for each pair (i,j) labeled “same,” it is the case that i and j have the same label; and for each pair (i,j) labeled“different,” itisthecasethati andj havedifferentlabels. They’re in the middle of tediously working out whether their judgments are consistent, when one of them realizes that you probably have an algorithm that would answer this question right away. Give an algorithm with running time O(m+n) that determines whether the m judgments are consistent.

Explanation / Answer

Algorithm. Each node will be given a label, which is initialized as a blank label, and will be implicitly marked visited or unvisited by its label being blank (unvisited) or A(visited) or B (visited). We'll maintain a queue Q of nodes to process.

For each node v, if v is labeled, skip it. Otherwise, label it A, put it in Q, and execute the following sub-procedure while Q is non-empty: { Dequeue the next node (call it u) from Q. {For each "same" edge (u; x), if x has the opposite label of u report INCONSISTENT, if x had no label then give it the same label as u and put x into Q, and if x already had the same label do nothing more. {For each "dierent" edge (u; y), if y has the same label as u report INCONSISTENT, if y had no label then give it the opposite label from u and put x into Q, and if y already had the opposite label do nothing more. If this procedure completes (exhausts every node) without reporting INCONSISTENT, report CONSISTENT.

Algorithm. Each node will be given a label, which is initialized as a blank label, and will be implicitly marked visited or unvisited by its label being blank (unvisited) or A(visited) or B (visited). We'll maintain a queue Q of nodes to process.

For each node v, if v is labeled, skip it. Otherwise, label it A, put it in Q, and execute the following sub-procedure while Q is non-empty: { Dequeue the next node (call it u) from Q. {For each "same" edge (u; x), if x has the opposite label of u report INCONSISTENT, if x had no label then give it the same label as u and put x into Q, and if x already had the same label do nothing more. {For each "dierent" edge (u; y), if y has the same label as u report INCONSISTENT, if y had no label then give it the opposite label from u and put x into Q, and if y already had the opposite label do nothing more. If this procedure completes (exhausts every node) without reporting INCONSISTENT, report CONSISTENT.