Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

I have problem that is reducible to the following: From a collection of stacks,

ID: 650988 • Letter: I

Question

I have problem that is reducible to the following:

From a collection of stacks, find all items whose "keys" are on all stacks.

My current solution to this problem is to just pop things off as quickly as possible store the items in the language's set type, compute the intersection every once in a while (the stacks are refilling constantly), and use that set, repeat.

The issue here is that the equivalent of the pop operation is expensive (I can do this at most 4k times a second), and it kills the chance of multithreading (as pops are destructive).

Is there a way of doing this such that I can multithread the intersection bit? I cannot practically use shared memory or requeuing items. The threads can communicate with each other via pipes/sockets if need be, but such communication should be kept to a minimum (not really interesting in starting up some whole separate client/server thing just for this part of my application)

As an idea of what I am dealing with, there can be up to 15 million items on the queues at any one time. By the time its "over" all items will have a match.

Ideas?

Clarification: These are message stacks (well queues really but the same thing) over the network. A pop is a consume operation over network, and push is a publish. Thats why its so slow. Since the issues is one of latency rather than bandwidth, multithreading can potentially help. I am not CPU bound. I have ~10 stacks, and up to 10 million items at a time. I have no control over the implementation of the stacks.

Explanation / Answer

Apparently this is a distributed systems problem. You have ~ 10 stacks, each with up to 10 million items at a time, and each stack living on a different machine.

Here is one simple candidate solution. Have a separate "monitor" machine, whose sole purpose is to continually compute the intersection. Any time one of the stacks is modified, the machine storing that stack should send a message to the monitor message describing the change. (These diffs can be batched, depending upon your latency requirements.)

Now you have all the data on a single machine, the monitor machine. When running on a single machine, you can completely avoid all of the concurrency issues associated with multithreading or distributed systems -- e.g., the need for synchronization, locks, etc.

Moreover, the amount of data is small enough that it could easily be stored in the available RAM on that monitor machine. (Why? If each entry is large, hash it first using a good hash function and a large enough hash output that you won't see collisions. If each hash value is 128 bits, then storing all 100 million hashed items takes 1.6GB. If you use SHA256 truncated to 128 bits as your hash function, by the birthday paradox, the chances of encountering a hash collision are incredibly small.)

Once all of the data is living in RAM on a single machine, the problem becomes much easier. For example, one approach would be to build an index data structure: a hashmap that maps each key to a 10-bit bitmap that indicates which of the stacks it is stored on. You can easily update this hashmap as the individual stacks change.

You can also easily use this hashmap to compute the intersection: just have a doubly linked list threaded through the entries of this hashmap where the bitmap is equal to 11111111111 (all 1's). Any time a bitmap changes from 11111111111 to something else, you remove it from the doubly linked list. Any time a bitmap changes from something else to 11111111111, you insert it into the doubly linked list.

The amount of data you need to transfer from other machines to the monitor machine is very small, especially since you only need to send the hashed items to the monitor, not the original items. If each of the 10 machines will do at most 4k push/pop's per second (taken from your question), then that's 64 KB of hashed data per second per machine. If we add another 32 bytes or so of overhead (packet headers), that's about 200 KB/s of data out of each machine, or about 2 MB/s of data into the monitor machine -- a very manageable amount. On a 1 Gbps Ethernet link you won't notice this, and you might not notice it even on a 100 Mbps link. You can reduce the amount of traffic by a factor of 2-5x by batching such messages and/or by using a shorter hash.

This should provide an efficient algorithm to compute the intersection, and keep it updated on the fly as each of the stacks change.