Learning DFAs by Evolving Short Sequences of Merges

Kristian Guillaumier and John Abela

Video:

Link to the paper: here

Abstract: The grammatical inference community has been studying evolutionary methods for DFA learning for almost three decades. These methods typically operate by learning a representation of the target DFA either as a partitioning the states of a prefix tree acceptor or as an encoding of its transition matrix. In this paper, we present an alternative approach for learning random DFAs over binary alphabets from sparse training data. We first conducted several experiments on thousands of problem instances to study their behaviour and to better understand the conditions under which state merging algorithms succeed or fail. Motivated by these observations, we implemented an evolutionary algorithm in which the chromosomes encode short sequences of merges selected from a subset of high state reduction merges. The fitness of a chromosome is then measured by extending it using the EDSM heuristic and the size of the final hypothesis is used to score the entire sequence. To improve runtime performance, we use a method that can reliably estimate the fitness of a sequence of merges without extending it completely. We use the state-of-the-art EDSM algorithm as a baseline to compare our results to and observe that we can find low-error hypotheses or the exact target DFAs with a considerably higher likelihood.

Any question or comment can be made using the comment section of the page below.

2 Comments

You observed it, but do you think that you could also prove that there is always a n+1 state merging path from APTA to target DFA? What would you guess to be then the required conditions (structural completeness obviously, and maybe some characteristic set properties to ensure uniqueness…) ?

Hello Francois,

You are absolutely right. Empirical observation is not a proof. We have observed that if we colour the states of the target a DFA and then colour also the states of the APTA using the colours from the target DFA then we can run EDSM, or any other meta heuristic, and be able to determine when, and perhaps why, EDSM works and, perhaps more importantly, when and why it fails.

When each state in the APTA is coloured and fully labeled (accepting or non-accepting) then EDSM always (observed over 10’s of thousands of experiments) finds the target DFA in n+1 merges.

Therefore, proving that there is always a n+1 state merging path from the APTS to the DFA is equivalent to showing that EDSM will always find the target DFA from a coloured, and fully labelled, APTA. Of course, it is assumed that the training set is structurally complete.

We do not have a proof yet but we have been busy with other things. Kris and I had a long chat today and we will be looking into finding a proof for this result in the next weeks.

Regards,

John

Comments are closed.