A finite set of states Q 5 {q A subset of Q called the final states. An alphabet S 5{x A transition function d associating each pair of state and letter with a state:

cepted. To get into state z, it is necessary for the string to have the letter b in it. As soon as a b is encountered in the input string, the FA jumps immediately to state z no matter what state it was in before. Once in state z, it is impossible to leave. When the input string runs out, the FA will still be in state z, leading to acceptance of the string. The FA above will accept all strings that have the letter b in them and no other strings. Therefore, the langua ge associated with or accepted by this FA is the one defined by the regular expression a 1 bba 1 b The list of transition rules can grow very long. It is much simpler to summarize them in a table format. Each row of the table is the name of one of the states in the FA, and each col- umn of the table is a letter of the input alphabet. The entries inside the table are the new states that the FA moves into — the transition states. The transition table for the FA we have described is a b Start x y z y x z Final z z z We have also indicated along the left side which states are start and final states. This table has all the information necessary to define an FA. Instead of the lengthy description of the meaning of motion between states caused by input letters, FAs could simply and equivalently have been defined as static transition tables. Any table of the form a b x • • y • • z • • in which the dots are filled with the letters x, y, and z in any fashion, and which specifies the start state and the final states, will be an FA. Similarly, every three-state FA corresponds to such a table. Even though it is no more than a table of symbols, we consider an FA to be a machine, that is, we understand that this FA has dynamic capabilities. It moves. It processes input. Something goes from state to state as the input is read in and executed. We may imagine that the state we are in at any given time is lit up and the others are dark. An FA running on an input string then looks like a pinball machine in operation. We may make the definition of FAs even more mat h e m at i c a l ly ab s t ract with no gre ater precision and decreased understanding by replacing the transition table with a total function whose input is a pair of state and alphabet letter and whose output is a sin- gle stat e. This function is called the transition function, u s u a l ly denoted d l owe rc a s e G reek delta for reasons lost to computer historians. The ab s t ract definition of an FA is then:

1. A finite set of states Q 5 {q

q 1 q 2 . . .} of which q is the start state.

2. A subset of Q called the final states.

3. An alphabet S 5{x

1 x 2 x 3 . . .}. 12479r CohenWiley Yet Another Method for Defining Languages 55

4. A transition function d associating each pair of state and letter with a state:

dq i , x j 5 x k We shall never refer to this transition function again in this volume. From the table format, it is hard to see the moving parts. There is a pictorial representa- tion of an FA that gives us more of a feel for the motion. We begin by representing each state by a small circle drawn on a sheet of paper. From each state, we draw arrows showing to which other states the different letters of the input alphabet will lead us. We label these ar- rows with the corresponding alphabet letters. If a certain letter makes a state go back to itself, we indicate this by an arrow that re- turns to the same circle — this arrow is called a loop. We can indicate the start state by label - ing it with the word “start” or by a minus sign, and the final states by labeling them with the word “final” or plus signs. Notice that some states are neither 2 nor 1. The machine we have already defined by the transition list and the transition table can be depicted by the transition diagram 12479r CohenWiley CHAPTER 5 Finite Automata 56 Sometimes, a start state is indicated by an arrow and a final state by drawing a box or an- other circle around its circle. The minus and plus signs, when employed, are drawn inside or outside the state circles. This machine can also be depicted as Every input string can be interpreted as traversing a path beginning at the start state and moving among the states perhaps visiting the same state many times and finally settling in some particular rest state. If it is a final state, then the path has ended in success. The letters of the input string dictate the directions of travel. They are the directions and the fuel needed for motion. When we are out of letters, we must stop. Let us look at this machine again and at the paths generated by the input strings aaaabba and bbaabbbb. When we depict an FA as circles and arrows, we say that we have drawn a directed graph. Graph theory is an exciting subject in its own right, but for our purposes there is no real need to understand directed graphs in any deeper sense than as a collection of circles and arrows. We borrow from graph theory the name directed edge, or simply edge, for the arrow between states. An edge comes from one state and leads to another or the same, if it is a loop. Every state has as many outgoing edges as there are letters in the alphabet. It is possible for a state to have no incoming edges or to have many. There are machines for which it is not necessary to give the states specific names. For example, the FA we have been dealing with so far can be represented simply as 12479r CohenWiley Yet Another Method for Defining Languages 57 Even though we do not have names for the states, we can still determine whether a par- ticular input string is accepted by this machine. We start at the minus sign and proceed along the indicated edges until we are out of input letters. If we are then at a plus sign, we accept the word; if not, we reject it as not being part of the language of the machine. Let us consider some more simple examples of FAs. EXAMPLE 12479r CohenWiley CHAPTER 5 Finite Automata 58 In the picture above, we have drawn one edge from the state on the right back into itself and given this loop the two labels a and b, separated by a comma, meaning that this is the path traveled if either letter is read. We save ourselves from drawing a second loop edge. We could have used the same convention to eliminate the need for two edges running from the minus state to the plus state. We could have replaced these with one edge with the label a, b, but we did not. At first glance, it looks as if this machine accepts everything. The first letter of the input takes us to the right-hand state and, once there, we are trapped forever. When the input string runs out, there we are in the correct final state. This description, however, omits the possibility that the input is the null string L. If the input string is the null string, we are left in the left-hand state, and we never get to the final state. There is a small problem about understanding how it is possible for L ever to be an input string to an FA, because a string, by definition, is executed run by reading its letters one at a time. By convention, we shall say that L starts in the start state and then ends right there on all FAs. The language accepted by this machine is the set of all strings except L. This has the regular expression definitions a 1 ba 1 b 5 a 1 b 1 ■ EXAMPLE One of the many FAs that accepts all words is H e re, the sign 6 means that the same state is both a start and a final stat e. Because there is o n ly one state and no matter wh at happens we must stay there, the language for this machine is a 1 b ■ Similarly, there are FAs that accept no language. These are of two types: FAs that have no final states, such as and FAs in which the circles that represent the final states cannot be reached from the start state. This may be either because the picture is in two separate components as with 12479r CohenWiley FA S and Their Languages 59 in this case , we say that the graph is disconnected, or for a reason such as that shown be- low: We consider these examples again in Chapter 11. FA S AND THEIR LANGUAGES It is possible to look at the world of FAs in two ways. We could start with the machine and try to analyze it to see what language it accepts, or we could start with a desired language in our mind and try to construct an FA that would act as a language-recognizer or language- definer. Needless to say, in real life we seldom discover an FA falling out of a cereal box or etched onto a mummy’s sarcophagus; it is usually our desire to construct an FA from scratch for the precise purpose of acting as a language-recognizer for a specific language for whic h we were looking for a practical algorithmic definition. When a language is defined by a regular expression, it is easy to produce some arbitrary words that are in the language by making a set of choices for the meaning of the pluses and stars, but it is harder to recognize whether a given string of letters is or is not in the language defined by the expression. The situation with an FA is just the opposite. If we are given a specific string, we can decide by an algorithmic procedure whether or not it is in the lan- guage defined by the machine — just run it and see if the path it determines ends in a final state. On the other hand, given a language defined by an FA, it is not so easy to write down a bunch of words that we know in advance the machine will accept. Therefore, we must practice studying FA from two different angles: Given a language, can we build a machine for it, and given a machine, can we deduce its language? EXAMPLE Let us build a machine that accepts the language of all words over the alphabet {a b} with an even number of letters. We can start our considerations with a human algorithm for iden- tifying all these words. One method is to run our finger across the string from left to right and count the number of letters as we go. When we reach the end of the string, we examine the total and we know right away whether the string is in the language or not. This may be the way a mathematician would approach the problem, but it is not how a computer scientist would solve it. Because we are not interested in what the exact length of the string is, this number r epresents extraneous information gathered at the cost of needlessly many calcula- tions. A good programmer would employ instead what is called a Boolean flag; let us call it E for even. If the number of letters read so far is indeed even, then E should have the value TRUE. If the number of letters read is not even, then E should have the value FALSE. Ini- tially, we set E equal to TRUE, and every time we read a letter, we reverse the value of E un- til we have exhausted the input string. When the input letters have run out, we check the value of E. If it is TRUE, then the input string is in the language; if false, it is not. The program looks something like this: set E 5 TRUE while not out of data do read an input letter E becomes notE if E 5 TRUE, accept the input string else reject the string Because the computer employs only one storage location in the processing of this program and that location can contain only one of two different values, the finite automaton for this language should require only two states: State 1 E is TRUE; this is the start state and the accept or final state. State 2 E is FALSE. Every time an input letter is read, whether it is an a or a b, the state of the FA changes. This machine is pictured below: 12479r CohenWiley long CHAPTER 5 Finite Automata 60 ■ EXAMPLE Suppose we want to build a finite automaton that accepts all the words in the language a a 1 b that is, all the strings that begin with the letter a. We start at state x and, if the first letter read is a b, we go to a dead-end state y. A “dead-end state” is an informal way of describing a state that no string can leave once it has entered. If the first letter is an a, we go to the dead- end state z, where z is a final state. The machine looks like this: The same language may be accepted by a four-state machine, as below: 12479r CohenWiley long FA S and Their Languages 61 Only the word a ends in the first 1 state. All other words starting with an a reach and finish in the second 1 state where they are accepted. This idea can be carried further to a five-state FA as below: ■ The examples ab ove are FAs that have more than one final stat e. From them, we can see that there is not a unique machine for a given language. We may then ask the ques- t i o n , “Is there always at least one FA that accepts each possible language? More pre- c i s e ly, if L is some language, is there necessari ly a machine of this type that accepts ex a c t ly the inputs in L, while fo rsaking all others ? ” We shall see short ly that this question is re l ated to the question, “Can all languages be rep resented by regular ex p re s s i o n s ? ” We shall prove, in Chapter 7, t h at eve ry language that can be accepted by an FA can be defined by a regular ex p ression and, c o nve rs e ly, eve ry language that can be defined by a regular ex p ression can be accepted by some FA. Howeve r, we shall see that there a re languages that are neither defi n able by a regular ex p ression nor accepted by an FA. Remember, for a language to be t h e l a n g u age accepted by an FA means not only that all the wo rds in the language run to final stat e s , but also that no strings not in the l a n g u age do. Let us consider some more examples of FAs. EXAMPLE Let us contemplate the possibility of building an FA that accepts all words containing a triple letter, either aaa or bbb, and only those words. The machine must have a start state. From the start state, it must have a path of three edges, with no loop, to accept the word aaa. Therefore, we begin our machine with 12479r CohenWiley CHAPTER 5 Finite Automata 62 For similar re a s o n s , we can deduce that there must be a path for bbb, t h at has no loop, and uses entire ly diffe rent states. If the b- p ath shared any of the same states as the a- p at h , we could mix a’s and b’s and mistake n ly get to 1 a ny way. We need only two a dditional states because the paths could share the same final state without a pro bl e m , as b e l ow : If we are moving anywhere along the a-path and we read a b before the third a, we jump to the b-path in progress and vice versa. The whole FA then looks like this: We can understand the language and functioning of this FA because we have seen how it was built. If we had started with the final picture and tried to interpret its meaning, we would be sailing uncharted waters. ■ EXAMPLE Consider the FA pictured below: 12479r CohenWiley FA S and Their Languages 63 Before we begin to examine what language this machine accepts, let us trace the paths asso- ciated with some specific input strings. Let us input the string ababa. We begin at the start state 1. The first letter is an a, so it takes us to state 2. From there the next letter, b, takes us to state 3. The next letter, a, then takes us back to state 2. The fourth letter is a b and that takes us to state 3 again. The last letter is an a that returns us to state 2 where we end. State 2 is not a final state no 1, so this word is not accepted. Let us trace the wo rd b abbb. As alway s , we start in state 1. The first letter, b, t a kes us to s t ate 3. An a then takes us to state 2. The third letter, b, t a kes us back to state 3. Now another b t a kes us to state 4. Once in state 4, we cannot get out no matter wh at the rest of the string is. Once in state 4, we must stay in state 4, and because that is the final stat e, the string is accep t e d. There are two ways to get to state 4 in this FA. One is from state 2, and the other is from state 3. The only way to get to state 2 is by reading the input letter a while in either state 1 or state 3. So when we are in state 2, we know we have just read an a. If we read another a immediately, we go straight to state 4. It is a similar situation with state 3. To get to state 3, we need to read a b. Once in state 3, if we read another b immediately, we go to state 4; oth- erwise, we go to state 2. Whenever we encounter the substring aa in an input string, the fi rst a must take us to state 4 or 2. Either way, the next a takes us to state 4. The situation with bb is analogous. If we are in any of the four states 1, 2, 3, or 4 and we read two a’s, we end up in state 4. If we are in any state and read two b’s, we end up in state 4. State 4, once entered, cannot be left. To end in state 4, we must read a double letter. In summary, the words accepted by this machine are exactly those strings that have a double letter in them. This language, as we have seen, can also be defined by the regular ex- pression a 1 baa 1 bba 1 b The four states in this machine can be characterized by the purposes they serve: State 1 Start here but do not get too comfortable; you are going to leave immediately. State 2 We have just read an a that was not preceded by an a and we are looking for a second a as the next input. State 3 We have just read a b that was not preceded by a b and we are looking for a second b as the next input. State 4 We have already discovered the existence of a double letter in the input string and we are going to wait out the rest of the input sequence and then announce acceptance when it is all over. In this characterization, if we read a b while in state 2, we go to state 3, hoping for another b, whereas if we read an a in state 3, we go to state 2, hoping for a baby a. ■ EXAMPLE Let us consider the FA pictured below: 12479r CohenWiley CHAPTER 5 Finite Automata 64 This machine will accept all words with b as the third letter and reject all other words. States 1 and 2 are only waiting states eating up the first two letters of input. Then comes the deci- sion at state 3. A word that has fewer than three letters cannot qualify, and its path ends in one of the first three states, none of which is designated 1. Once we get to state 3, only the low road leads to acceptance. Some regular expressions that define this language are aab 1 abb 1 bab 1 bbba 1 b and a 1 ba 1 bba 1 b 5a 1 b 2 b a 1 b Notice that this last formula is not, strictly speaking, a regular expression, because it uses the symbol 2, which is not included in the kit. ■ EXAMPLE Let us consider a very specialized FA, one that accepts only the word baa: Starting at the start state, anything but the sequence baa will drop down into the collecting bucket at the bottom, never to be seen again. Even the word baabb will fail. It will reach the final state marked with a 1, but then the next letter will suicide over the edge. The language accepted by this FA is L 5 {baa} ■ EXAMPLE The FA below accepts exactly the two strings baa and ab: 12479r CohenWiley FA S and Their Languages 65 Big machine, small language. ■ EXAMPLE Let us take a trickier example. Consider the FA shown below: What is the language accepted by this machine? We start at state 1, and if we are read- ing a word starting with an a, we go straight to the final state 3. We can stay at state 3 as long as we continue to read only a’s. Therefore, all words of the form aa are accepted by this machine. What if we began with some a’s that take us to state 3 but then we read a b? This then transports us to state 2. To get back to the final state, we must pro- ceed to state 4 and then to state 3. These trips require two more b’s to be read as input. No- tice that in states 2, 3, and 4 all a’s that are read are ignored. Only b’s cause a change of state. Recapitulating what we know: If an input string begins with an a and then has some b’s, it must have 3 b’s to return us to state 3, or 6 b’s to make the trip state 2, state 4, state 3 twice, or 9 b’s, or 12 b’s and so on. In other words, an input string starting with an a and having a total number of b’s divisible by 3 will be accepted. If it starts with an a and has a total number of b’s not divisible by 3, then the input is rejected because its path through the machine ends at state 2 or 4. What happens to an input string that begins with a b? It finds itself in state 2 and needs two more b’s to get to state 3 these b’s can be separated by any number of a’s. Once in state 3, it needs no more b’s, or three more b’s, or six more b’s, and so on. All in all, an input string, whether beginning with an a or a b, must have a total number of b’s divisible by 3 to be accepted. It is also clear that any string meeting this requirement will reach the final state. The language accepted by this machine can be defined by the regular expression a abababaa 1 abababa The only purpose for the last factor is to guarantee that L is not a possibility because it is not accepted by the machine. If we did not mind L being included in the language, we could have used this simpler FA: 12479r CohenWiley long CHAPTER 5 Finite Automata 66 The regular expression a 1 babab 1 also defines the original non- L language, whereas the regular expression abababa defines the language of the second machine. ■ EXAMPLE The following FA accepts only the word L: Notice that the left state is both a start and a final state. All words other than L go to the right state and stay there. ■ EXAMPLE Consider the following FA: 12479r CohenWiley long FA S and Their Languages 67 No matter which state we are in, when we read an a, we go to the right-hand state, and when we read a b, we go to the left-hand state. Any input string that ends in the 1 state must end in the letter a, and any string ending in a must end in 1. Therefore, the language ac- cepted by this machine is a 1 ba ■ EXAMPLE The language in the example above does not include L. If we add L, we get the language of all words that do not end in b. This is accepted by the FA below: ■ EXAMPLE Consider the following FA: The only letter that causes motion between the states is a; b’s leave the machine in the same state. We start at 2. If we read a first a, we go to 1. A second a takes us back. A third a takes us to 1 again. We end at 1 after the first, third, fifth, seventh, . . . a. The language accepted by this machine is all words with an odd number of a’s, which could also be de- fined by the regular expression b ababab ■ EXAMPLE Consider the following FA: 12479r CohenWiley CHAPTER 5 Finite Automata 68 This machine will accept the language of all words with a double a in them somewhere. We stay in the start state until we read our first a. This moves us to the middle state. If the very next letter is another a, we move to the 1 state, where we must stay and eventually be accepted. If the next letter is a b, however, we go back to 2 to wait for the next a. We can identify the purposes that these states serve in the machine as follows: Start state The previous input letter if there was one was not an a. Middle state We have just read an a that was not preceded by an a. Final state We have already encountered a double a and we are going to sit here un- til the input is exhausted. Clearly, if we are in the start state and we read an a, we go to the middle state, but if we read a b, we stay in the start state. When in the middle state , an a sends us to nirvana, where ultimate acceptance awaits us, whereas a b sends us back to start, hoping for the first a of a double letter. The language accepted by this machine can also be defined by the regular expression a 1 baaa 1 b ■ EXAMPLE The following FA accepts all words that have different first and last letters. If the word be- gins with an a, to be accepted it must end with a b and vice versa. If we start with an a, we take the high road and jump back and forth between the two top states ending on the right at 1 only if the last letter read is a b. If the first letter read is a b, we go south. Here, we get to the 1 on the bottom only when we read a as the last letter. This can be better understood by examining the path through the FA of the input string aabbaabb, as shown below: 12479r CohenWiley EVEN-EVEN Revisited 69 It will be useful for us to consider this FA as having a primitive memory device. For the top two states, no matter how much bouncing we do between them, remember that the first letter read from the input string was an a otherwise, we would never have gotten up here to begin with. For the bottom two states, remember that the first input letter was a b. Lower non 1 state The input started with a b and the last letter we have read from the input string is also a b. Lower 1 state The input started with a b and the last letter read so far is an a. ■ EVEN-EVEN REVISITED EXAMPLE As the next example of an FA in this chapter, let us consider the picture below: To process a string of letters, we start at state 1, which is in the upper left of the picture. Every time we encounter a letter a in the input string, we take an a train. There are four edges labeled a. All the edges marked a go either from one of the upper two states states 1 and 2 to one of the lower two states states 3 and 4, or else from one of the lower two states to one of the upper two states. If we are north and we read an a, we go south. If we are south and we read an a, we go north. The letter a reverses our updown status. What happens to a word that gets accepted and ends up back in state 1? Without know- ing anything else about the string, we can say that it must have had an even number of a’s in it. Every a that took us south was balanced by some a that took us back north. We crossed the Mason – Dixon line an even number of times, one for each a. So, every word in the lan- guage of this FA has an even number of a’s in it. Also, we can say that every input string with an even number of a’s will finish its path in the north state 1 or 2. Th e re is more that we can say about the wo rds that are accepted by this mach i n e. Th e re are four edges labeled b. Eve ry edge labeled b either takes us from one of the two s t ates on the left of the picture states 1 and 3 to one of the two states on the right stat e s 2 and 4, or else takes us from one of the two states on the right to one of the two states on the left. Eve ry b we encounter in the input is an eastwest reve rs e r. If the wo rd start s out in state 1, wh i ch is on the left, and ends up back in state 1 on the left, it must have c rossed the Mississippi an even number of times. Th e re fo re, all the wo rds in the language a c c epted by this FA have an even number of b’s as well as an even number of a’s. We can also say that eve ry input string with an even number of b’s will leave us in the west stat e 1 or 3. These are the only two conditions on the language. All words with an even number of a’s and an even number of b’s must return to state 1. All words that return to state 1 are in EVEN-EVEN. All words that end in state 2 have crossed the Mason – Dixon line an even number of times but have crossed the Mississippi an odd number of times; therefore, they have an even number of a’s and an odd number of b’s. All the words that end in state 3 have an even number of b’s but an odd number of a’s. All words that end in state 4 have an odd number of a’s and an odd number of b’s. So again, we see that all the EVEN-EVEN words must end in state 1 and be accepted. One regular expression for the language EVEN-EVEN was discussed in detail in the previous chapter. ■ Notice how much easier it is to understand the FA than the regular expression. Both methods of defining languages have ad vantages, depending on the desired application. But in a theory course we rarely consider applications except in the following example. EXAMPLE We are programmers hired to write a word processor. As part of this major program, we must build a subroutine that scans any given input string of English letters and spaces and lo- cates the first occurrence of the substring cat whether it is a word standing alone or part of a longer word such as abdicate. We envision the need for four states: State 1 We have not just read a c; this is the start state. State 2 The last letter read was a c. State 3 The last letter read was an a that came after a c. State 4 We have just encountered the substring cat and control of this program must transfer somewhere else. If we are in state 1 and read anything but a c, we stay there. In state 1 if we read a c, we go unconditionally to state 2. 12479r CohenWiley CHAPTER 5 Finite Automata 70 If we are in state 2 and we read an a, we go to state 3. If we read another c, we stay in state 2 because this other c may be the beginning of the substring cat. If we read anything else, we go back to state 1. If we are in state 3 and we read a t, then we go to state 4. If we read any other letter ex- cept c, we have to go back to state 1 and start all over again, but if we read a c, then we go to state 2 because this could be the start of something interesting. The machine looks like this: 12479r CohenWiley Problems 71 The input Boccaccio will go through the sequence of states 1-1-1-2-2-3-2-2-1-1 and the input will not be accepted. The input desiccate will go through the states: 1-1-1-1-1-2-3-4-4 and terminate which in this example is some form of acceptance before reading the final e. ■ PROBLEMS

1. Write out the transition tables for the FAs on pp. 56, 58 both, 63, 64, and 69 that were