Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

I start to suspect this problem is very hard now that I cannot find a single rel

ID: 653014 • Letter: I

Question

I start to suspect this problem is very hard now that I cannot find a single relevant literature on the subject, but it's too late to change the class project topics now, so I hope any pointers to a solution. Please pardon the somewhat artificial scenerio of this question, but here goes:

Technical version:

Let $Sigma_{c}$ and $Sigma_{q}$ and $Sigma_{a}$ be 3 disjoint finite alphabet (c, q, a stand for content, query and answer respectively). Let $L_{c}inSigma_{c}^{*}$ and $L_{q}inSigma_{q}^{*}$ be FINITE languages, wherein $L_{q}$ have the property that for every string in the language all of its prefix are in the language too. There is an unknown function $f:L_{c} imes L_{q} ightarrowSigma_{a}^{*}$. Consider a mysterious machine that receive continuous stream of symbol through a channel one at a time step (we assume that the symbol are clearly distinguishable). This machine, whenever being feed with a string in $cin L_{c}$ (with the symbol in correct temporal order) followed by a string in $qin L_{q}$ will output (through a different output channel) the value of $f(c,q)$ as a temporal sequence, one symbol at a time. Note that the machine always output after every new symbol from $Sigma_{q}$. Note that the empty string is in $L_{q}$, which means the machine also output something before any symbol on $Sigma_{q}$ have arrived, but only if it is certain with high probability that the full string in $L_{c}$ have been received.

The objective is to construct a neural network that emulate that mysterious machine, if we have only access to its input and output channel to use as training data, and we do not know $f$. We also have to assume that the input channel are noisy in the following sense: random noise are inserted into the input channel at high probability, delaying input symbol, and we initially do not know which one is noise and which one is authentic; also symbol in the input channel are sometimes lost at low probability. EDIT: Note: we do not know $L_{c}$ nor $L_{q}$, only the mysterious machine know, in fact we do not even know the alphabet $Sigma_{c}$ and $Sigma_{q}$ other than the fact that they are disjoint and are subset of the set of all possible input symbol (input symbol not in either set are certainly noise, but we can't tell which set it belongs to initially; note that it is still possible for symbol from the alphabet to be noise).

(why neural network: beside the noise problem, also because that's what I wrote in my class project proposal)

(layman version: consider Sherlock Holmes sitting in his chair, bored. Dr. Watson give a short description of the client. Once he's done, Sherlock Holmes give a conclusion about the client. Dr. Watson is astonished, and ask more question, and Sherlock Holmes reply. The conclusion must obviously based on the description alone; and subsequent answer have to answer the question being asked, taking into account the contexts which consists of question already being asked (for example, the same "How did you know?" following "Age?" demands different answer than when following "Height?"). Now you want to make a neural network that simulate Sherlock Holmes, having all the recordings of those session. Dr. Watson however tend to insert in long description that are rather irrelevant, making long statement before finally getting around to ask question, and sometimes accidentally omit crucial information, but otherwise describe people in a rather fixed order of details. The neural network must be able to deal with that. Of course, this is a just a layman's description, the situation is much less complex.)

I have looked through various relevant literature, and I cannot find anything relevant. Conversion to spatial domain is useless due to high amount of noise causing very long input sequence. I have looked into LSTM to deal with the memory problem over arbitrary long time lag, but I for the life of me cannot figure out how is the network is supposed to be trained when there are arbitrarily long noise insertion everywhere or possibilities of missing symbol (every method I found seems to force a fixed time-lag between input and output, and missing symbol immediately wreck any method based on predicting the next item in the sequence). Also, is it too much to ask for network that isn't too hard to code? Integrate-and-fire neuron is even worse than LSTM in term of difficulty in coding.

Explanation / Answer

I unfortunately know very little of neural network. The closest thing that your project reminds me of is speech recognition, and I would look at that literature. I am thinking of the first stage of speech recognition, when the sound stream is transformed into a word lattice (or a word stream, if you keep only the most likely path in the lattice). But all I know on this is based on Hidden Markov Models and Viterbi algorithm [1]. I have not looked at the field for a long time, and I have no idea how it would translate in neural network, but I would suggest you look at that literature, for example by searching the web for neural networks and speech recognition.

I doubt you can code anything serious in 2 days. I would not even try, but I do not know what kind of programming is expected. But maybe a good description of what should be done, with appropriate references would be enough.

You should simplify your question, if you find out that your requirements are too strong, particularly on noise. At first, you should limit yourself to very simple kinds of noise. Problems are seldom solved the first time in full complexity. You first solve simple cases, then try to see where you could do more. For one thing, do you know how to do it without any noise? What are the limitations? Then you can start adding simple noise, and see what changes.

Your input, content and query, do not seem to have much reason to be distinguished, or do you have a strong reason to distinguish them? I would think that at some point your system must enter a state when it starts answering on the output tape.

[1] Bahl, L. R.,Jelinek, F., & Mercer,R.L. (1983).A maximum likelihood approach tocontinuous speech recognition. IEEETrans. Pattern Anal. Machine Intell., PAMI 5, 179-190.

These authors actually published several papers on the subject for noisy input, including insertion, deletion and substitution of symbols. There are surely others, and this work is quite old. I am not sure the paper referenced is actually about learning. But the same people worked on learning too, such as parameters identification for Hidden Markov Models.