Machine Learning based Optical Proximity Correction Techniques

Research Article Current Issue • Versions 2 Vol 3 (4) : 20030408 2020

Pengpeng Yuan, Taian Fan, Yaobin Feng, Peng Xu, Yayi Wei

DOI: 10.33079/jomm.20030408

： 2020 - 10 - 09

： 2020 - 12 - 15

： 2020 - 12 - 30

4613 90 0

Abstract & Keywords

Abstract: The shrinking of the size of the advanced technological nodes brings up new challenges to the semiconductor manufacturing community. The optical proximity correction (OPC) is invented to reduce the errors of the lithographic process. The conventional OPC techniques rely on the empirical models and optimization methods of iterative type. Both the accuracy and computing speed of the existing OPC techniques need to be improved to fulfill the stringent requirement of the research and design for latest technological nodes. The emergence of machine learning technologies inspires novel OPC algorithms. More accurate forward simulation of the lithographic process and single turn optimization methods are enabled by the machine learning based OPC techniques. We discuss the latest progress made by the OPC community in the process simulation and optimization based on machine learning techniques.

Keywords: optical proximity correction; machine learning; deep learning; lithography

1. Introduction

Optical proximity correction (OPC) becomes critical for the process of current advanced technological nodes. The conventional methods of the optical proximity correction rely on the empirical rules or the combination of the parametric models and traditional optimization methods most likely in the iterative sense. The empirical rules highly depend on the experience of process engineers and works well for the early technological nodes with larger critical dimensions, but the model based methods are required for the sub 100nm technological nodes. Considering the difficulty of the rigorous mathematical simulation of the physical and chemical process involved in the optical lithographic process, simplified models with empirical parameters are usually applied in the actual OPC process^{[1, 2]}. Even the contemporary numerical methods such as Finite Difference Time Domain (FDTD)^[3–5] or Finite Element Method (FEM) et. al. are able to provide more accurate solutions to the optical imaging process, photo-chemical reactions and so on, the formidable cost of computation power hinders the application of such methods to larger scale systems such as full chip level optimization problems. The past efforts in deriving more accurate analytical or semi-numerical models for the forward simulations of relevant physical chemical processes^[6,7] boost the development of more practical and efficient OPC technologies. Even the complexity of the photo-resist reactions slows down the progress of obtainment of more reliable resist model, and so does the etch model, the optical imaging problems can usually be well resolved after simplified assumptions are made to the optical imaging systems. The appearance of the computational tractable models makes the iterative optimization of the mask shapes possible which becomes the corner stone of the current OPC technologies. Combining with the latest setup of the optical lithography machines which enables the variable illumination conditions^{[8, 9]}, source-mask optimization (SMO)^[10–12] also becomes an important part of the OPC workflow. Until the techniques of the insertion of sub-resolution assist features (SRAF) being added to the arsenal of OPC toolkit to enlarge the process window of the optimized mask patterns, the framework of contemporary OPC is settled down. However, the room of the improvement of current OPC workflow remains and the rise of data science as well as machine learning provides huge amount of opportunities for the computational lithography community.

2. Optical Lithographic System

Figure 1. Illustration of an optical lithographic projection system^[13].

The optical microlithography system mainly includes four parts: source, mask/reticle, exposure system and wafer. To avoid the inhomogeneity of illumination on the photomask, Kohler’s method of illumination is applied. The source or the image of the source is placed at the focal plane of the condenser. The photomask/reticle is then illuminated by the parallel beam and the energy distribution on the top plane of the mask is then homogenous in the idealistic situation. Traditional mask is the binary intensity mask. They are formed by the chromium on glass. Different types of fused silicon are applied for varied illumination wavelengths. Phase shift masks are also introduced to improve the image quality. The image of the scattering sources on the photomask is formed on the wafer after the projection optics. The standing wave pattern is formed by the reflection from the photoresist/wafer interface. There are two types of photoresist: positive resist and negative resist. They response differently to the illumination.

3. Machine Learning Based OPC

The machine learning and data driven perspective may change the OPC workflow mainly in three aspects. Firstly, more accurate and fast models become accessible after the introduction of novel tools such as deep neural networks as a good universal approximator which should be beneficial even they are simply embedded into the traditional OPC framework; Secondly, the expensive and time consuming iterative optimization process of prevailing OPC techniques may be replaced by the single run computation of well trained models which directly perform the optimization process including the mask optimization (MO), SMO, SRAF insertion and so on; Third, the whole workflow of the OPC may be modified by the data driven methodology and the changes may not be constrained within the scopes of feature pattern selection and hotspots detection. The novel full chip level solution may be enabled in the future. We shall discuss the recent progress of the machine learning based OPC technologies in the three directions mentioned above separately.

3.1. Machine Learning Based Simulators

The simulation of the lithography mainly contains three parts: Optical model, Resist model and Etch model. The first two process is coarsely shown in Figure 1.

Figure 2. Lithography simulation^[14].

The ideal situation of the forward simulation is the success of the ab initio calculation. For the optical imaging system, this target is more achievable. The realistic imaging system is usually simplified and mathematical abstraction can be done within the framework of optics^[15]. In Figure 2, a typical optical configuration is shown. The aerial image is computed approximately with methods such as Hopkins method et.al. The chemistry involved in the resist model is complicated to compute from the first principle^[16-18]. Sometimes, a simple threshold model is applied for the resist model and the threshold can be either constant or variable^[19]. The etch model is more intractable due to the complicated physical- chemical process and multi-factorial control parameters involved such as plasma nature, chamber configuration et.al. Historically, variable etch bias (VEB) model is applied for the optimization purpose^[20]. However, the approaches mentioned above may not be able to meet the requirement of the OPC techniques developing for the advanced technological nodes and more accurate and rigorous models are necessary while the nodes shrink. The machine learning based resist model and etch model turn out to be effective and becomes good candidates for future OPC application.

The general purpose of the ML based simulators is to obtain a general function approximator with the local geometric features as input and values of the height or threshold at the pixel level as the output. In principle, it can be done with the multilayer neural networks.

Figure 3. ANN-based 3D resist model^[21].

Seongbo Shim et. al. applied the full connected neural network to fit the resist model with the points sampled from the geometry of layout as the input and the resist height at center of the window as the output. The configuration of their model is shown in Figure 3. Youngchang Kim et. al. use the similar method to realize the prediction of etch bias^[20].

Figure 4. CNN architecture^[14].

Since the inventions of new architectures of neural networks emerge, more efficient and suitable approaches are fetched by the OPC community to improve the performance of the forward simulators. Yuki Watanabe et. al. use convolutional neural networks which are widely applied in the computer vision computation to estimate the resist pattern instead^[14]. The architecture of their net is shown in Figure 4. Since sometimes, the rigorous simulation or experimental data are hard to obtain especially for new technical nodes, to obtain the trained model with the required accuracy with fewer data, Yibo Lin et. al. take the advantage of the transfer learning and active learning while they are trying to solve the same problem^[22]. Later, the generative adversarial net is also introduced by the same group for simulating purpose^[23].

3.2. Machine Learning Based Optimizers

Early days, the mask optimization relies on the empirical rules which usually depend on the geometry of the layout patterns. The contours of the layout patterns are decomposed into the edges and corners and the positions of their end points are varied and optimized according to the rules subtracted from the experimental facts^[24]. Even though computationally efficient, the rule based optimization methods are not able to provide the required accuracy and fail to fulfill the requests of the advanced technological nodes. A more robust iterative optimization method based on the optical models, photoresist models et. al. is introduced. The basic idea is to change the positions of the end points of the edges and corners mentioned above and the simulated images on the wafer are obtained accordingly. The full optimization cycle is stopped once the target patterns and the simulated images match each other. Different kinds of error functions are applied to provide a quantitative estimation of the deviations between the target patterns and the simulated images. The Edge Displacement Error (EDE)^[25], Edge Placement Error (EPE)^[26] or Pixel-wise Error Summation^[27] et. al. are usually calculated. The optimization process is usually computationally expensive due to the slow convergence of the iterations. While coarser models which are more computationally tractable are applied at the cost of the accuracy, the effort of the reduction of the iterations inspired the early application of machine learning techniques to the OPC regime and it remains as a main purpose of machine learning based OPC packages till nowadays.

The earlier attempt to obtain a better initial guess for the mask optimization process with the linear regression methods done by the researchers at University of California, Berkeley becomes an excellent start point in this track^[28]. Taking advantage of the large dataset of the modified mask patterns after OPC by the commercial EDA packages, the authors estimate the expected fragment movement by the simplest linear statistical model provided an input target layout pattern. It provides a prototype of the basic ideas of the machine learning based OPC technologies within the framework of the supervised learning method. The realization of the workflow still requires the involvement of the advanced commercial EDA packages to generate the labeled data (the correct fragment movement given a specific mask pattern as the original input). As a result, the upper bond of the accuracy of this type of method is constrained by the correctness of the simulators and the efficiencies of the optimizers of the relevant commercial software. And the performance of such method is further compromised by the oversimplification of the mapping from the original input mask pattern to the predicted fragment movements by the application of a linear model. However, the trained linear model serves as a coarse optimizer of the input mask pattern which provides the optimized mask pattern in a single run while it is fed by the input feature vectors representing the original mask patterns. The authors successfully reduce the number of iterations of the traditional OPC workflow by the replacement of the original mask pattern with the statistically learned one as the initial condition of the subsequent optimization flow. Another significant contribution of the authors is that they successfully introduce a representation method for the input mask layout which makes the further calculation they complete computationally feasible. The discrete cosine transform (DCT) is applied to the input mask layout, and first hundreds of the DCT coefficients is collected in the Zig-Zag order (Shown in Figure 5.) as the input feature vectors of the linear model to be trained.

Figure 5. Zig–zag ordering of DCT coefficients^[28].

The feature engineering accomplished this way serves as one of the mainstream techniques in the OPC community before more efficient and universal feature learning techniques fitting the requirement of the end to end learning such as the prevalent convolutional neural network (CNN) techniques are introduced from the deep learning community. The DCT is also applied by other researchers in the OPC community later in different ways including the variant form of the Fourier Transforms^[29–32]. Even after the CNN et. al. deep learning techniques are introduced and the representation learning is realized automatically independent of the input data formulations, DCT is sometimes still used as the pre-processed data type as the neural network inputs^[33].

The trained models after the supervised learning as the optimizer instead of the traditional iterative optimization circle are further improved mainly in two aspects: more complicated and accurate model instead of the linear statistical model are used to approximate the mapping between the input mask pattern and the optimized mask pattern (the optimization can be in the form of either motions of the specific edge fragment or the modified mask patterns as a whole.); Different feature engineering can be done or the representation learning within the scope of the deep learning can be applied to the mask pattern and the dimensional reduction can be realized in varied ways accordingly^[34].

A direct improvement of the representation capability of the linear model has been done by Tetsuaki Matsunawa et. al.^[35] by the application of the generalized linear mixed model instead to include the edge type effect. Considering the universal approximation property of the multilayer neural network, replacing the linear model with the typical multilayer neural network becomes another natural choice and has been done by Rui Luo^[36]. The author considering the estimation of the binary value of the central pixel of the square modified mask pattern by the standard three layer neural network with the original pixel level binary mask pattern as the input instead of estimating the motion of the central fragment. To obtain the whole modified mask pattern, the author has to scan the three layer model over the original mask pattern. The schematics of the NN is shown in Figure 6. Such kind of scanning can be done naturally by the introduction of the convolutional neural networks and the three layer neural network above can actually be treated as the convolutional layer.

Figure 6. The schematics of the NN for OPC^[36].

The contemporary convolutional neural networks (CNN) with varied architectures have been invented and widely applied to different scenes such as image segmentation, object recognition, image classification et. al.^[37]. Basically, it is critical that the actual input of the prevalent CNNs is usually the tensor type data instead of the flatten one used in the Rui’s work, and the convolution layer/Kernel layer with the shared weight parameters slides across the input tensor. The pooling layers are usually applied to further reduce the dimensions of the features learned. After the invention of the training methods of the deep neural networks such as the backpropagation et al.^[38], the CNNs emerges. The critical advantage of the deep CNNs is that they permit the representation learned from the multiple levels of the abstraction which are realized by the stacking of varied convolutional kernels and pooling layers. It avoids the necessity of the designing effort of feature engineering by human wisdom and enables the end to end training of models which can be widely applied. The CNNs are immediately fetched by the OPC community and relevant works have been done recently. Once we constrain our discussion within the mask pattern optimization or source optimization problems, the representation of the image patterns by the latent vectors and their decoding are naturally involved and can be directly linked to the encoder-decoder structures. For example, the convolutional autoencoder is trained to do the Source Mask Optimization by Ying Chen et. al.^[39] to dramatically raise the speed of the optimization process by a factor of 10⁵. Their model output is shown in Figure 7.

Figure 7. Illustrations of (a) a layout clip, (b) a model-based source, and (c) an autoencoder-based source.^[39]

Similarly, the stacking convolutional architectures are also implemented by Haoyu Yang et al.^[40] to form the generator and discriminator of the generative adversarial network (GAN)^[41] when they succeed in realizing the mask optimization with the modified discriminator design. After the GAN converges, the generator can be used to calculate the optimized mask pattern of the original input one within 0.2s which is negligible compared with the traditional OPC methods. The convolutional autoencoders (CAE) are also applied in other regimes such as the insertion of the Sub Resolution Assist Features (SRAF) et. al.^[42]. They can be trained as GAN shown in Figure 8.

Figure 8. An overview of the CGAN functionality^[42].

Basically transformed into a image generation or translation problem^{[43, 44]}, the graphic generation of the modified mask pattern can be done by the mainstream computer vision techniques. Proper modifications made to the design of the specific architectures are necessary. Autoencoders can serve as the models or function approximators of the mapping between the input mask pattern and optimized mask pattern. The training process or the learning of the relevant parameters are finished in the supervise learning mode. In fact, the trained models as the optimizers are not necessarily functioned as the generators of the optimized mask or source patterns. They can also be easily applied as the classifiers for other OPC purposes. We are trying to separate these applications into different categories of OPC techniques although mathematically they are the same in the sense that they eventually act as function approximators providing the appropriate mappings minimizing the designed loss functions. The output can be either mask patterns, source patterns or the labels. We will leave these discussion to the next section where the pattern selection and hotspots detection et. al. are discussed.

3.3. Machine Learning Modified Workflow

As discussed by Peter De Bisschop^[26], the whole OPC workflow strongly depends on the data collection and selection. The main point is: firstly, the feature structures among the billions on the VLSI chip should be selected to build the empirical models unless the physical process is clear enough to be simulated in the first principle way. The latter is rarely the case we confront in the realistic optical lithographic and etching processes. So the establishment and verification of the models as the simulators of the lithographic or etching processes require the data collection and selection even before the machine learning techniques are widely introduced into the OPC regime; secondly, after OPC process, the modified mask patterns or the source conditions should be verified both by the computational method (computational verification) and experimental method (on-wafer verification) before the masks are accepted for the production. As a result, feature pattern selection for the model calibration and on wafer verification et. al. become critical steps. The data sampling problem becomes important for an efficient and robust OPC workflow. The machine learning techniques can solve such kind of problems well. The basic idea is that we should be able to find a proper space defined with correct basis, in which the dimensional reduction of the original data set can be naturally realized. Or, the low dimension manifold in a high dimensional space is discovered and the sampling is done on the manifold only. Both methods can dramatically reduce the required number of sampling points and the cost of the time consuming and expensive computational or experimental verification processes. Dmitry Vengertsev et. al.^[45] define a hybrid space formed by the direct sum of image parameter space and geometric sensitivity space and use a modified K means method to cluster the data within the hybrid space. As a typical unsupervised learning method, data clustering helps the selection of the representative patterns and serves as a kind of dimension reduction process. Instead of the K means method, the singular value decomposition (SVD) which can be treated as a form of the principle component analysis (PCA) can also be applied to the matrix representation of the layout patterns defined in the vector space manually constructed^[46].

We already discuss the importance of the latent feature vector generation under the background of the machine learning based optimizer. It is also the foundation of the pattern selection we just discussed because the dimension reduction we mentioned is actually finished by the learning of a low dimensional representation of the original dataset. Now, the same thing goes with the hotspots detection. We need to identify the layout structures which can not be manufactured with the acceptable EPE et. al. under the current process conditions and carry out finer OPC for them. We are not able to carry out the forward simulation for all the structures on the chip due to the huge computational power that requires, or we just want a better solution^[47]. We are neither satisfied with the traditional pattern match method^{[48, 49]} because it can not predict the hotspot correctly when patterns not included in the library are met. Transforming such problems into the image classification problem^[50] and solving it with the prevailing machine learning techniques then become interesting. The basic idea is we learn the low dimensional feature vectoral representation of the layout patterns and use the classifier to distinguish the pattern with hotspots from the pattern without hotspots within certain region in the latent space formed by the learned feature vectors. You can also use them to do data clustering and realize the pattern feature selection. The effectiveness of such kind of method strongly depends on the generalization capability of the machine learning model. It is not well understood when the learned model generalizes well especially when the deep learning techniques are applied. Even without the theoretical guarantee, these machine learning methods are applied in the hotspots detection widely and they are proven effective by the experimental facts. Matsunawa et. al.^[51] use the human designed feature vectors to do the classification for the hotspots detection with Adaboost method. Taking advantage of the end to end training capability of deep CNNs, Moojoon Shin^[52] et. al. apply different architectures of CNN binary classifier to fulfill the speed and accuracy requirement of hotspots detections. The probability of a pixel being classified as the hotspot is predicted by inputting the image centered at that pixel into the CNN. After scanning the whole layout, the probabilistic distribution of the hotspots at the pixel level is output as the final result. The schematics is shown in Figure 9.

Figure 9. HS detection using sliding window scan and coordinate extraction^[52].

Of course, even CNNs have the advantage in the sense that they automatically include the translational invariance and tend to learn the local information of image while encoding thus dramatically reduce the number of learnable parameters, the general fully connected deep neural network (DNN) can also be applied to carry out the hotspots detection task^[53]. To improve the performance of the DNN hotspots detectors, different variants of DNN have been explored^[33]. For example, inception mechanism is introduced by Ran Chen et. al.^[54]. Haoyu Yang et. al. modifies the CNN architecture and replace all the pooling layers with 3×3 convolution layers^[55].

4. Conclusions

Machine learning techniques especially the deep learning method can dramatically improve the accuracy and computation speed of simulation and optimization process and the full chip level optimization techniques should become available and it will further change the whole workflow of current OPC technology^[56].

Acknowledgments

This work is supported by National Science and Technology Major Project of China (2017ZX02315001-003, 2017ZX02101004-003), National Natural Science Foundation of China (61874002, 61804174), Beijing Natural Fund (4182021).

[1] J. W. Thackeray, “Stochastic exposure kinetics of extreme ultraviolet photoresists: simulation study,” J. Micro/Nanolithography, MEMS, MOEMS 10 (3), 033019 (2011).