Log based State Machine Construction for (2)

Chair of Technical Information Systems, Dresden University of Technology, Germany
Chair of Technical Information Systems
Department of Computer Science
Dresden University of Technology
D-01062 Dresden, Germany
http://www.iai.inf.tu-dresden.de/en/tis/index.html

Log-based State Machine Construction for
Analyzing Internal Logistics of Semiconductor
Equipment

Technical Report
by Volodymyr Vasyutynskyy

Based on:
Röder, A.; Vasyutynskyy, V.; Kabitzsch, K.; Zarbock, T.; Luhn, G.: Log-based State Machine
Construction for Analyzing Internal Logistics of Semiconductor Equipment.
Proc. MASM2005 International Conference on Modeling and Analysis of Semiconductor Manufacturing,
Singapore, October 2005, pp. 54-60.

Technical Report, 2005


Log-based State Machine Construction for Analyzing Internal
Logistics of Semiconductor Equipment
André Röder1, Volodymyr Vasyutynskyy1, Klaus Kabitzsch1, Thomas Zarbock2, Gerhard Luhn2
1

Dresden University of Technology, Faculty of Computer Science
D-01062 Dresden, Germany. E-mail: {aroeder, vv3, kk10}@inf.tu-dresden.de
2
Infineon Technologies Dresden GmbH & Co. OHG
Königsbrücker Str. 180, D-01099 Dresden, Germany
E-mail: {Gerhard.Luhn,Thomas.Zarbock}@infineon.com

Abstract
For evaluation of equipment productivity and diagnosis purposes state-transition models of equipments are necessary. In most
cases these models are not provided by equipment manufacturers and therefore must be constructed by analysts (process
engineers) based on log files. The construction of such models is a very time-consuming process that is impeded by the unknown
structure of the events in the logs. The approach presented in this paper aims to automate the process of model construction by
combining finite state machine generation with event text analysis. Some heuristics and user interaction are used to increase the
efficiency of the model generation process. The approach was tested on different equipment types at Infineon Dresden.


Section 1 Introduction
Today more than ever, a high pressure of competition and a small profit per unit characterizes the semiconductor
market. As production equipment is more expensive compared to other industries, the raising of equipment
utilization is very important for manufacturers competing in this market to follow Moore’s Law. In fact, nowadays
equipment productivity is, after miniaturization, the second most significant factor in lowering production costs and
rising efficiency [1], and its potential is still growing.
To support semiconductor manufacturers in measuring the equipment utilization, Semiconductor Equipments and
Materials International organization (SEMI) released several standards defining productivity indicators. One of them
is SEMI E79 Standard [2] introducing overall equipment efficiency (OEE) and intrinsic equipment efficiency (IEE).
While OEE calculation is already widely automated, there are still some obstacles in the automation of IEE
calculation. The major problem is the measurement of the “value-added in-process time”, which reflects the time
periods used for the processing steps for one object (e.g. wafer) done by the equipment. An evaluation of these time
periods is enabled by event log messages provided by the equipment. The events contain information about changes
of the inner states of each of the equipment modules, eventually along with the actual time stamp and the identifier
of processed object. To calculate time periods from the event log, a state-transition model of the equipment and the
proper assignment of events to equipment states is required. Unfortunately, these models are normally not provided
by equipment manufacturers. Therefore, these models have to be re-engineered by analysts at the equipment users
side. Another SEMI Standard used in this context is SEMI E90 [3], which provides information needed for IEE
calculation. Both events and their specification are satisfying this standard only poorly and are under steady changes

by most equipment suppliers since the SEMI E90 is relatively new. Manual inspections of equipment event logs at
Infineon Dresden showed the need for optimization of equipment internal logistics (IEE improvement). As the
equipment internal logistic problems are equipment specific, inspections have to be done for each equipment type.
Since manual log inspection is a very time-consuming process, an automatic construction of state-transition models
is required. The automation in this task enables a fast identification of logistic problems and a detailed basis for toolto-tool matching. The resulting state-transition models can be also used for diagnosis purposes [4].
Some recent works ([5], [6], [7]) used log files to generate the state-transition models in form of finite state
machines or Petri nets for applications in various domains like software path analysis, data mining etc. However,
these works assumed that the event structure is known and have not taken in account event attributes that may be
ambiguous. For semiconductor equipment a standardized event protocol is absent not only between different
manufacturers but also within the same equipment type. As a consequence, event logs of different equipment types
or different modules of same equipment differ a lot, even if the tools are providing equal functionality. In addition,
not all available events are enabled by default, causing gaps in the logs. As consequence, the analyst needs help for
investigating the event structure.
The method introduced in this paper helps to facilitate semi-automatic building of equipment state models on the
basis of given event logs for equipment modules, event types and its patterns. The method combines state machine

generation with the analysis of event text structure with the help of Levenshtein distance [8]. It also uses some
heuristics and user interaction to increase the analysis efficiency. It allows the user to analyze event logs of
semiconductor equipment profoundly, resulting in rapid evaluation of productivity indicators for different types of
equipment. Further applications of generated models are fast analysis of equipment internal substrate movement

parameters (usage of equipment modules, process duration etc.), fault detection and classification, tool-to-tool
matching (best module, „golden tool“ construction) etc. If test logs of equipment are available, even evaluation of
the equipment before its purchase is possible. The generated models can also provide data for simulation tools. The
obtained models and performance characteristics can also be helpful for communication with equipment
manufacturers in case of trouble, since more precise situation descriptions can be delivered. The corresponding tool
ModelGenerator implements the approach. The results of the log analysis can be visualized in the tool in multiple
ways, supporting the analysts’ intuition. The algorithm was tested on a set of logs from four equipment types at the
Infineon Factory in Dresden.
The article is organized as follows. Section 2 describes the general algorithm of the model generation. In Section 3
the relevant steps of the algorithm are considered. The achieved results and conclusions are discussed in Sections 3
and 4 appropriately.

Section 2 Method Description
Assumptions
First of all, some considerations about equipment construction and event logging have to be made. From the
logistical point of view, equipment consists of stations (modules) that transport, process or store substrates, e.g.,
wafers. Events are generated if some actions or logging conditions have been fulfilled. The event logs are
permanently saved in operational databases. The separate events (records) include the following attributes:



Time stamp of event logging. The time format may differ for different equipment types. The granularity of the
time stamp should be sufficiently small to define the order of events. This condition is not always fulfilled.
• Module information like module name etc. These names are very often ambiguous. Then, it may be necessary
to compose lists of equivalents of modules and their names.
• Substrate information. To enable substrate tracking through the equipment, unique substrate names should be
provided. The minimum level of detail requested by the SEMI E90 standard is the substrate name providing
information of the substrate lot and the slot used by the substrate. However, in most cases equipments only
provide substrate names that consist of the name of the station through which the substrates transport box
entered the equipment and the slot name in the box. It is important to track out the parallel moves of substrates
in equipment, otherwise the complicated and not always robust methods are required to separate such
concurrent processes like in [9].
• Event parameters that include information about process values like temperature, pressure, etc. These
parameters as rule have no influence on the construction of state-transition models, so they can be omitted
during model generation.
The following assumptions are made about the event structure:


The time format of the events has to be known. Usually the time format can easily be read from the log
manually. An automatic recognition of the time format is not provided as the algorithm would need too much
background information.

• The level of details of the logs must be sufficient for analysis purposes. Usually not all possible events are
enabled since it would lead to an overload of communication channels and produce an unnecessary data
volume. Nevertheless, some types of events must always be enabled to support the determination of important
characteristics. In some cases, additional events can be switched on to test some modules of the equipment more
precisely.
• The events have to be separated from each other in a log. Usually, each event is deposed in a new text line.
• Log files must be long enough to allow the complete coverage of all possible states. This length depends on
scheduling mechanisms. The log must include several substrate-processing cycles.
• The parameters of the events have to support the aggregation of events to event types. Therefore, they have to
provide a unique text pattern for each event type, e.g., some parts of the events must give a hint about its event
type. This feature allows applying the text analysis algorithms.
Another issue are the different views on the equipment. In most cases the analysis of an equipment is driven by two
questions: “How long did a substrate spend where?” and “Which tasks where performed by the equipment modules
during the processing of a lot?”. As the equipment modules work in parallel, a model of the equipment should be
able to describe such parallel processes. Hence a hierarchical model of equipment was chosen, which consists of two
types of submodels in form of finite state machines (FSMs, see for instance [11]):



Equipment model (substrate move model), representing the equipment from the substrate point of view. It

consists of the equipment modules and describes the internal structure of the equipment. It also describes the
paths on which substrate objects are routed through the equipment. The non-deterministic finite automata
(NFA) are used, where the equipment stations correspond to states and the moves of the substrate objects
between them to the transitions of NFA. These moves are identified by so called move events. This model is
able to describe the parallel wafer moving processes. The moving route of a single wafer is a subset of this
model.
• Station model (module model), representing the order of events in a single station or module of the equipment.
It specifies the inner states and transitions inside of one equipment component. Since only one substrate can be
processed in the station at a moment, the deterministic finite automata (DFA) are sufficient for this case.
The hierarchical model allows the
adequate representation of the
cha
chb
chc
chd
equipment and simplifies the model
Loading Started
generation, since the event log can be
filtered for each submodel before
Loading Completed

generation. For the construction of
mfr[1]
mfr[2]
the equipment model only move
Started Pump
Started Vent
events are used. If the construction of
lla[1] lla[2]
llb[2] llb[1]
a station model is intended, the log is
Completed Pump
Completed Vent
ori
filtered for events associated to this
fia
fib
Unloading Started
station. Figure 1 shows the equipment
pst
model of AMAT Centura Etching

Unloading Completed
Tool along with a subordinated
LPA
LPB
station model.
Station model of station llb[1]

Equipment model

Figure 1: The hierarchical model of equipment.

General Algorithm
According to the assumptions made, an algorithm for the semi-automatic construction of equipment models was
developed. The flow chart of model generation is shown in Figure 2.
Equipment
Log

Event Log Import
and
Literal Extraction


Literal
Dictionary

Stations and
Substrate
Definition

Stations and
Substrate
Assignment

Events

Stations Substrates
Associated Event
History

Construction of
Station Models


Station
Model

Construction of
Substrate Move Model

Visualisation

Substrate
Move
Model

Figure 2: Flow chart of model generation.



The basic steps of the algorithm are as follows:
Event log import. It is provided using standard database interfaces. Filtering and converting of log records may
be made in this step.



Literal extraction. The text of each event is divided in character sequences, named literals, which are defined as
a sequence of characters between two delimiters, e.g., blanks. The result is saved in a dictionary, which is
intended to help the analyst to identify equipment module and substrate object names used in the event log. The
user can assign one or more literals to each equipment module or substrate object, since modules and objects
can be labeled ambiguously in a log.
• Assignment of events to modules and substrates. Based on the assignments of the previous step, the event log is
parsed again. Every log event is assigned to the equipment modules and substrate objects, whose literals were
found in the event text.
• Aggregation of event types. A similarity recognition algorithm is used to aggregate event types from the events
of the given event log. The basic idea behind this step is that events of one type are equal in a certain sequence
of literals, called stem. They may differ in other literals representing actual values provided by the event, e.g.,
wafer number, temperature etc. The similarity of the event to the event type is defined according to a modified
version of the Levenshtein distance [8].
• Construction of the equipment model. To build the equipment model, the algorithm extracts move events that
always belong to two equipment modules and a substrate object.
• Construction of module models. The temporal order of the assigned event types in the log is analyzed to provide
more profound information about the internal states of an equipment module. The analyst may disable irrelevant
types to support the algorithm. The result is a complete state-transition model of a module.
• Visualization. The results of log analysis are presented in a variety of visualizations.
The steps are discussed in the next section in detail.

Section 3 Model Generation
Preliminary analysis
Event Log Import and Literal Extraction
Before the identification of event structure can start, the text of each event is divided in character sequences, or
literals, which are substrings separated by some delimiters (usually blanks or semicolons). The delimiters have to be
specified by the analyst. The literals are the elementary smallest “bricks” of the event text taken in account. The
auxiliary characters like numbers or symbols (commas, hyphens etc.) are ignored since they do not provide relevant
information about the event type. Only literals containing at least one character, which were not ignored, are saved
then in a literal dictionary.

Station and Substrate Definition
The next problem to be solved is the ambiguous labeling of items (modules and substrates). To enable the
construction of an equipment model all synonym labels of a station or a substrate have to be associated to their
owner. Since the station and substrate terms in the log are not standardized, a user dialog is necessary to perform this
task. The assignment has to be done only once, when a new equipment type is analyzed or new events were
switched on for logging in the already analyzed equipment. Otherwise, the synonyms are simply mapped to the
stations or substrates on the basis of the available synonym dictionary while reading the log file.

Equipment Model Construction
The equipment model represents the paths within the equipment on which substrate objects can move between the
equipment stations. The states of the model correspond to the stations. Move events represent the substrate
movements in an event log. If a move event exists that reflects the move of a substrate between two stations, the
station nodes in the model are linked in the appropriate direction. The model-constructing algorithm filters such
events from the event log and tracks the path of every substrate.
To enable the algorithm to distinguish the initial and following stations of a move event, the analyst has to define
one or more entry stations, through which substrates enter the equipment. This manual assignment is needed in this
step, because in most cases substrate identifiers are not labeled by uniquely. It allows simplifying the analysis, since
the circles are absent and the model building becomes a straightforward process of sequentially adding the stations
in their temporal order. Therefore, there is no need in special probabilistic algorithms as in [6], which depend
strongly on statistical properties of the log like event probability. The information about entry stations is also used to
automatically subordinate the stations in the model. The entry stations are assigned to level zero. All stations
directly reachable from the entry stations are assigned to the next higher level and so on. This subordination results
in a more adequate presentation of the equipment model, as shown in Figure 3. The comparison with the station
model in Figure 1 shows the correctness of the generated model.

The analyst can edit the generated model to
adapt the presentation to his taste. The
models can be saved for using in later
evaluation, e.g., comparing with other
equipments.
In
addition,
statistical
information about the frequency of the
usage of paths between two stations is
obtained in this step.

Figure 3: Equipment model generated based on an event log with 2000
records (screenshot of ModelGenerator). Equipment: AMAT Centura
dry etching tool.

Aggregation of Events to Event-Types
Next problem is the unknown structure of the event text that is in most cases formulated in sentences that can be
read by analysts. The parameter values are spread over the event text without a particular structure. For model
analysis an event type must be defined, which characterizes the basic meaning of certain groups of events. Events of
the same type refer to the same state transition within a station. They may differ in the parameter values or referred
substrates.
In the majority of cases the equipment manufacturers do not provide any description of available event types.
Therefore, the event types have to be aggregated from the set of log event instances. This is achieved by comparing
events and grouping similar ones. The idea behind this comparison is that events of the same type are equal in some
literals, a so-called kernel of the event type. Similar Events
events are grouped by using a kind of prefix trees
that contain the literals of the event text. Each literal Rcp mgr : Started recipe "P_AV_D11_385_B" on wafer "PodB 22"
mgr : Started recipe "P_AV_D11_385_B" on wafer "PodB 25"
position in the event is represented as a branch in the Rcp
Rcp mgr : Started recipe "P_AV_WARMUPb"
on wafer "PodB 4"
event type. The literals of similar events are added Rcp mgr : Started recipe "P_AV_D11_385_B" on wafer "PodA 1"
as leafs to the branches of the event type stem. The Rcp mgr : Started recipe "P_AV_WARMUPb" on wafer "PodB 19"
kernel of the event type is represented by the
branches. These branches have only one leaf, since
Kernel
Parameter
they are supposed to contain the unique attribute of
the event type. The literals that contain the
Event Type
parameter values, e.g., wafer parameters or
on wafer "PodB 22"
temperature, correspond to the branches with more Rcp mgr : Started recipe "P_AV_D11_385_B"
"P_AV_WARMUPb"
"PodB 25"
than one leaf, because their values vary. An example
"PodB 4"
...
of the reconstruction of the event type from a set of
events is shown in Figure 4.
Figure 4: Reconstruction of the event type from five events.

Distance Function
The manual assignment of events to event types by the analyst is a very time-consuming and error-prone process. To
automate this process, similarity metrics for events and event types have to be defined. Levenshtein distance ([8],
[10]) is used in this approach for this goal, which is both simple and general enough to cope with different types of
logs. In its classic form, Levenshtein distance defines the similarity of two words. It uses a number of elementary
operations to transform one word in another by transformation of single symbols. To adapt this similarity metric to
the analysis of literals, Levenshtein distance was slightly modified in following way.
The distance between event e and event type E is defined as the number of literal modifications required to
transform the event type in certain event. Three elementary transformation operations are distinguished:




Insert a literal of the event into a branch of the event type: insLeaf.
Insert an empty literal into a branch of the event type: insNull.
Insert a new branch into the event type: insBranch.

The distance between the event and the event type is calculated by assigning costs to each elementary
transformation operation and summing the costs of all operations needed to modify the event type. The path with
minimal costs is calculated using the distance matrix with dimension i × j (s. Figure 5), where:
• i = (number of literals in e) + 1
• j = (number of branches in E) + 1
The costs of each operation type ci are given, as well as the maximal allowed distance from the event to the event
type Dmax. The event types of a station are aggregated with the following procedure:






For each event of the station e and
every known event type E the
distance D(e, E) is calculated as
shown in Figure 5.
If D(e, E)