NFA = Regular Expressions, part 1
other posts

theory

An equivalence between a class of languages and a class of machines.

This post depends on earlier posts about NFAs and Regular Expressions. It will not make sense without them.

# From Regular Expression to NFA

The following process creates an NFA that accepts any string from the language defined by a given regular expression. I’ll describe it informally, since the formal version is kind of long.

Work backwards from the end of the regular expression. We start with just a single accepting state in the NFA. We’ll keep track of which nodes are “‍live‍”; initially this is the single accepting state.

• When we read an alphabet character such as `a` we’ll add a new node which has an edge labeled `a` pointing to each live node, and then make it the only live node.
• When we read a `?` we’ll add what we normally would for the character before it but keep all of the current live nodes as well as the new ones.
• When we read a `)` we’ll create an entire NFA for the stuff inside the parentheses, except we’ll use our current live nodes instead of the single accepting state.
• When we read a `|` we add any current live nodes to a set of start nodes and reset the live nodes to it’s initial state.
• When we read a star `*` we’ll add what we normally would for the character before it but we’ll make it’s final nodes also have edges to it’s first nodes and we’ll keep all of the current live nodes as well as the new ones.

When we’re all done we add the live nodes to a set of starting nodes.

That’s it. All that’s left is to show an example. Let’s use `a(b*c|d)ef?g`, which we’ll read backwards: `g`, `f?`, `e`, `d)`, `|`, `c`, `b*`, `(` (which takes two steps in the image), `a`, and a final step to finish.

There are many other ways to create an NFA from a regular expression; many texts present the “‍gadget‍” technique instead where each symbol has some set of nodes connected with “‍epsilon transitions‍”, or edges that don’t require any input to follow. I’ve always found the technique I’ve presented much simpler. Either way, the purpose remains clear: given a regular expression we can create an NFA that recognizes its language. And, since we talked recently about how to turn an NFA into a DFA, we can create a DFA too.

In part 2 we’ll see that we can go the other direction as well.