Beyond Chomsky normal form: Extending Strong learning algorithms for PCFGs

Alexander Clark

Video:

Link to the paper: here

Abstract: We extend a recent consistent strong learning algorithm for a subclass of probabilistic context-free grammars in Chomsky normal form, [Clark and Fijalkow, 2020], to a much larger class of grammars by weakening the normal form constraints to allow for a richer class of derivation structures that are not necessarily binary branching, including among other possibilities unary branching trees which introduce some technical difficulties. By modifying the structural conditions appropriately we obtain
an algorithm which is computationally efficient, and a consistent estimator for the class of grammars defined.

Any question or comment can be made using the comment section of the page below.

3 Comments

My question is likely a bit naive due to my lack of knowledge in linguistic: How compatible is the anchoring restriction with the current understanding of natural languages? Your examples are quite convincing, but intuitively we may imagine that some abstract categories exist that do not admit a corresponding terminal.
On the same line of idea, is the local non ambiguity compatible with commonly share understanding of natural languages?

Alexsays:

Posted on 23 August 2021 at 3 h 48 min

Yes I think anchoring is too strong in general. I think one problem is that some categories done have any yields of length 1. Local unambiguity seems ok though ( I think)

I have the same kind of question as Rémi that I would like to complete from the formal language point of view: do you have any intuitions about the class of languages that can be represented, for instance in relation to substitutable languages (forgetting weights)?

Comments are closed.