Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

I\'m trying to compare two graphs using hash value ( i.e, at the time of compari

ID: 652735 • Letter: I

Question

I'm trying to compare two graphs using hash value ( i.e, at the time of comparison, try to avoid traversing the graph ) Is there a way to make a function such that the hash values compared can also lead to determining at which height the graphs differ? The comparisons between two graphs are to be made by comparing children at a certain level. One way to compare the graphs is have a final hash value for the root node and compare them, but that wouldn't directly reflect at which level the graphs differ, since their immediate children might be the same ( or any other case ).

Explanation / Answer

A cobbled together solution:

You could probably cobble something together where each level of the graph affected a different portion of the hash value. What you'd essentially be doing here is creating a separate hash value for each level and appending them in sequence.

e.g. For a 32-bit hash and graphs with 8 levels, each level could hash to 4 bits: 1st level => bits 1-4; 2nd level => bits 5-8; etc.

If the number of levels in your graphs isn't fixed/known, then you'll have to wrap this around. This wouldn't tell you exactly which level was different, but it would narrow down your options.

e.g. For a 32-bit hash and graphs with an unknown number of levels, each level could still hash to 4 bits: bits 1-4 would be affected by the 1st, 9th, 17th, ... layers; bits 5-8 would be affected by the 2nd, 10th, 18th, ... layers; and so on.

Why it's probably a bad idea:

The hash function I've described is a rubbish hashing function. I expect it would produce a lot of clashes, and poor distribution.
Using hash values to detect differences is only quicker if there is a difference. hash(A) != hash(B) tells you that A and B are different, but hash(A) == hash(B) does not tell you that they are the same. In the second case, you'll have to do a full comparison to be sure.
Some other possible approaches:

Take another approach. If the comparisons are too slow, run a profiler and see if there are other ways you can speed up the comparison: try using a different data structure for storing your graphs, etc.
Try a compromise:
Create a separate hash value for each level as well as one for the entire graph. Comparing hash values for each level is still going to be loads quicker than comparing every single node.
Instead of using a hash value to speed up your comparisons, create two objects with metadata about the graphs (e.g. # of levels, # of nodes on each level, etc.) and compare those.