Phillip Burness & Kevin McMullin
Video: (if the video does not load within this page, click on its top right “external window” button to watch it in another window)
Link to the paper: here
Abstract: Work on the learnability of tier-based string-to-string functions has so far been limited to those that operate over a single tier. Such functions are, however, generally incapable of modelling multiple simultaneous processes. It is relatively easy to define a class of multi-tiered functions that can handle more than one process at a time, but doing so has negative consequences for learnability. Namely, we conjecture that it is difficult if not impossible to efficiently learn any arbitrary set of tiers from positive data alone. We thus describe the strongly target-specified subclass of multitiered functions, whose tiers can be efficiently identified from positive data. In these functions, each input element is associated with a single tier that on its own can fully determine what the element is mapped to. When the tiers act independently in this way, we can learn them in isolation from each other. A transducer representation of the target function can then be constructed using the discovered tiers.
Any question or comment can be made using the comment section of the page below.
Thanks for the interesting paper! I’m a little worried about the complexity bounds for finding the tier. Do you have a sense that those can be improved upon?
A related question is how does this algorithm compare to OSTIA on the kinds of data and processes you are considering. OSTIA is cubic time in the size of the data and can learn any subsequential function. The functions you are considering are subsequential so… what is the benefit to finding a tier in ^4 or ^5 of the data? The ISL and OSL learning algorithms of Chandlee et al. had guarantees in quadratic time so that was a reason to use those instead of OSTIA if one had a priori knowledge that the target functions belonged to that class.
I’m also wonder to what extent the ideas on tier-identification for formal languages (not functions) in Lambert (this conf) could help facilitate tier-identification in your work.
Finally, another interesting comparison to make in addition to OMTSL_2 to OSTIA would be to take a look at SP functions (which I think you also have work on)
Thank you for these great questions!
The main goal for this research was not necessarily to beat OSTIA at learning the relevant functions, but to see whether their tiers could be identified efficiently at all. As you rightfully point out, though, the increase from O(n^3) to O(n^5) when compared to OSTIA is disappointingly large. The bulk of the time complexity comes from (1) calculating relevant contributions and (2) extracting information about the prefix function. Kevin and I implement both of these processes by effectively doing nested reads of the sample, but it may be possible to obviate this problematic nesting by using an onward Prefix Tree Transducer (PTT). I suspect that passing the sample once through its onward PTT could (with the right heuristics) extract the contributions necessary to determine the safe/unsafe status of a hypothesized tier element. If this turns out to be the case, the time complexity of tier discovery would reduce to the time complexity of building the onward PTT, which is O(n^2) in the size of the sample.
While I don’t yet know what Lambert’s augmented subsequences would look like in the function case, adapting them to functions is certainly a promising direction. In particular, I think doing so could help with learning the tiers of (M)TSL_k functions for k > 2, which evade the existing algorithms.
Kevin and I are indeed working on SP functions, and the SP_2 functions can handle most of the same patterns as the strongly target-specified MTSL_2 functions, provided there are no “blocking” effects. Learning an SP function requires only the transducer-building step, and modifications of the ISL/OSL algorithms can achieve this while maintaining O(n^2) time complexity, so by virtue of not requiring a tier-discovery step, SP learning is more straightforward than (M)TSL learning. The issue for learning SP functions is that their transducers are exponential in the size of the alphabet, making the amount of necessary data impractical for most cases, since the ISL/OSL algorithms require O(n^2) data in the size of the target transducer. We haven’t yet found a good way of factoring these transducers into a group smaller machines (as is done in some work on SP languages and their acceptors) but finding the right such method could make SP function learning more practical than MTSL function learning when the pattern belongs to both classes.
Thanks. I would love to chat with you how to factor SP functions!