Wenyue Hua & Adam Jardine
Video:
Link to the paper: here
Abstract: This paper studies the learning of two functions given positive samples of their composition, motivated by an empirical problem in natural language phonology. Empirically relevant conditions under which this is possible are identified and a provably correct algorithm is given that can semi-strongly identify the two functions in polynomial time and data. In order to clearly illustrate the learning problem and related concepts, we focus on a simple subset of input strictly 2-local functions. But we further argue that the general learning procedure we propose can be extended to more general classes of functions.
Any question or comment can be made using the comment section of the page below.
This is an interesting paper, and I have a comment and a question. The comment is that as far as I can understand, the “learning” part — by which I mean specifically the part that goes from finite data to a representation of the infinite function — is basically handled by OSTIA and then there is a “decomposition” part which decomposes the learned FST into two other FSTs. So my questions are
1. Is there a sense in which the decomposition part also requires going back to the finite data to check something, or can it be done solely from the output of the OSTIA ignoring the data?
2. The future directions indicate ways to generalize the result but they keep the requirement that f be simplex. Is that really necessary? How much work is being done by the simplex property? It seems to me that most of the work here is done because f and g are both deterministic functions and then have canonical (onward, deterministic) finite state transducers. That means we know the transitions of the learned composed function have to come from certain source transitions in those two machines. Then it becomes solving a ‘series of equations’ so the (fewer) transitions in the machines for f and g correctly give rise to the transitions in the learned FST.
3. Small comment: You mentione somewhere that the children have access to the string of morphemes… But do they really? (Not a criticism, I think you have started from a fair place, but kids don’t really have the string of morphemes, right?)
Great questions! I’ll post brief answers here but would love to talk more about these in the Q&A tomorrow.
1. The decomposition part can solely be done from the output of OSTIA. So you’re absolutely right that the pure “learning” part is handled by OSTIA.
2. It’s hopefully not strictly necessary but the simplex property not only allows for a fixed set of tail functions to look for (here, 2), but also keeps the ‘opacity’ to a minimum. That is, it lets the learner know how to deduce the input that has been obscured in the output by the application of a function. It is probably by no means the only way to do it, but it seems like a good place (and an empirically relevant one) to start. But we are definitely interested in other ideas for how to generalize the result!
3. Fair comment! Maybe it would have been more accurate to say “Assuming children have some idea about what morphemes occur in a word and in what order…” Whether that assumption is a valid one is an empirical question–when children start learning their phonology, what do they know about the morphological makeup of words? I would imagine the answer is “some” but I don’t actually know.
Really interesting work! I had a question about the definition of semi-strong learning. Which target function is it guaranteed to learn? The morphology one, or the phonology one?
Thanks! The learner is guaranteed to learn the phonology function exactly. It furthermore learns a hypothesis for the morphology function such that its composition with the phonology function is equal to the composition of the target functions. Hope that makes it clearer.
Really interesting paper! The idea of learning a composition is a very relevant one and another domain in which this problem occurs is in software engineering. For example, in the paper “Gray-Box Learning of Serial Compositions of Mealy Machines” by Andreas Abel and Jan Reineke, they learn the composition of two Mealy machines (which are a specific type of deterministic transducers). It is a different application domain and they have very different assumptions. I just wanted to say this, to give a broader context and relevance to your work.