Abstract: Scanning electron microscope (SEM) metrology is critical in semiconductor manufacturing for patterning process quality assessment and monitoring. Besides feature width and feature-feature space dimension measurements from critical dimension SEM (CDSEM) images, visual inspection of SEM image also offers rich information on the quality of patterning. However, visual inspection alone leaves considerable room of ambiguity regarding patterning quality. To narrow the room of ambiguity and to obtain more statistically quantitative information on patterning quality, SEM-image contours are often extracted to serve such purposes. From contours, important information such as critical dimension and resist sidewall angle at any location can be estimated. Those geometrical information can be used for optical proximity correction (OPC) model verification and lithography hotspot detection, etc. Classical contour extraction algorithms based on local information have insufficient capability in dealing with noisy and low contrast images. To achieve reliable contours from noisy and low contrast images, information beyond local should be made use of as much as possible. In this regard, deep convolutional neural network (DCNN) has proven its great capability, as manifested in various computer vision tasks. Taking the full advantages of this maturing technology, we have designed a DCNN network and applied it to the task of extracting contours from noisy and low contrast SEM images. It turns out that the model is capable of separating the resist top and bottom contours reliably. In addition, the model does not generate false contours, it also can suppress the generation of broken contours when ambiguous area for contour extraction is small and non-detrimental. With advanced image alignment algorithm with sub-pixel accuracy, contours from different exposure fields of same process condition can be superposed to estimate process variation band, furthermore, stochastic effect induced edge placement variation statistics can easily be inferred from the extracted contours.
Keywords: SEM images; contour extraction; machine leaning (ML); deep convolution neural network (DCNN); edge placement variation
CD measurement data from SEM images are widely used in OPC model calibration and verification [1-2]. As the complexity of the lithography process continues to increase, OPC model calibration using CD measurement data only is facing difficulty in obtaining model that can meet the stringent accuracy requirements. As a remedy, OPC model calibration including SEM image contours have been proposed in recent years [3-7]. From the information theoretic point of view, SEM image contours have significant advantages in terms of pattern coverage and sampling numbers, as a result, OPC models with better accuracy and better generalization capability are expected. Besides the gradual adoption of SEM image contours in OPC model calibration, more detailed quantitative analysis based on resist top contours and resist bottom contours can greatly facilitate lithography hotspot detection with improved certainty.
In advanced low K1 lithography processes, SEM images are intrinsically low contrast. The situation exacerbates when only few scanning frames can be averaged in order to avoid resist shrinkage from e-beam exposure. This presents enormous challenges for contour extraction from such SEM images. The classical methods utilizing edge detection operators [8], such as Sobel, Prewit, log, etc., are incapable of extracting contours from low contrast SEM images with robustness. The Canny algorithm [9], on the other hand, is capable of extracting contours with better robustness, however, it lacks the ability to distinguish resist top contours from resist bottom contours. The limitation of the classical image contour extraction methods lies in its use of information. All classical methods use very local information only for contour extraction. To overcome the limitation, new algorithms should be developed, in which information beyond local can be made use of fully. In this respect, DCNN is considered to be the best choice [10-11], because it has proven itself to possess capabilities of exploring spatial information hierarchically on a large spatial scale. Furthermore, SEM images from lithography process only have three distinct areas, namely, the resist top areas, the substrate areas, and the resist edge areas of varying slopes. This simplicity suggests the possibility of employing DCNN to develop contour extraction models that are capable of separating resist top contours from resist bottom contours. Besides its capability in exploring spatial information, DCNN, as a machine learning model, is fully compatible with parallel computation architecture, making full chip SEM image contour extraction feasible. To develop such DCNN based SEM image contour extraction models, we have prepared a set of SEM images together with their resist top contours and resist bottom contours, and use a special DCNN structure to train the contour extraction models. The special DCNN structure possesses characteristics as of being wide in receptive field, moderate in network depth, and high in resolution.
Several contour extraction methods for post-lithography SEM images have been developed in commercial available tools, such as DesignGauge-Analyzer (DG-A, Hitachi) [6, 7], ASELTA solution [12], Canny-improved method [5], etc. Put the contour extraction robustness aside, these methods do not offer capability of separating resist top contours from resist bottom contours. To offer such a capability and full chip operation capability, we have proposed a method based on deep learning architecture for SEM contour extraction, which is inspired by the enormous success achieved so far from deep learning technology in various computer vision areas, such as classification, object detection and semantic segmentation [13-15]. DCNN is superior to the traditional methods because it has large receptive field. Empirically, a large receptive field is able to achieve a better sparse representation for image based on the image’s priors of nonlocal self-similarity (NSS) [16]. From the information theoretic point of view, the sparser the image representation is, the less the information loss is for image restoration [17].
The neural network architecture is shown in Figure 1. The first layer is reversible down-sampling operation, used to lower the dimensions for input SEM image. The split sub-images are arranged in a stack format as the input of the subsequent module (plain Deep Convolution Neural Network, DCNN). For instance, if the down-sampling factor is 2, then the input image of DCNN is: M/2×N/2×4C, where M and N are the height and width of the original SEM image, and C represents the number of channels in original image. Since SEM images are gray scale images, therefore, C is equal to 1 in our study.
The DCNN module acts as a non-linear mapping from gray scale SEM image to contour image. In order to learn the mapping function, the DCNN module learns features from input image with convolutional operators. The first layer has convolutional operation followed by rectified linear unit (ReLU) activation function operation; the last layer has convolutional operation only; while other layers have convolutional operation followed by batch normalization (BN) [18] and ReLU activation function operation. Zero padding is used to maintain feature map image size for each intermediate layer. The final size of the output from DCNN module is M/2×N/2×4C. The final image is constructed from the images of the 4C channels through up-sampling operation to restore the original image size M×N×C. The total number of layers is 15, the number of channels in each layer is 64, the convolutional kernel size is 3x3, and the image area used for cost function calculation is the center area of each image with size 128x128. The non-central image area is excluded from cost function calculation, because those areas are corrupted by zero padding and convolutional operation.
The down-sampling operation does the trick for the subsequent DCNN to explore wider spatial information with shallower network without the use of residual connections [19]. Residual connections are designed to facilitate the training of very deep convolutional neural network [20], however, models of good performance can still be achieved using plain neural network of moderate depth by advanced optimization techniques such as BN and Adam optimizer [21].
To train the DCNN network for contour extraction, we need to prepare a high quality training dataset. The original SEM images of 28nm technology node are adopted and the pixel size is 1.32nm, the SEM images are often noisy, and some SEM images even have patterning failures of varying degree of severity, such as scums, neckings, bridgings. To obtain the contours from such a SEM image, the SEM image must first go through a denoising process, followed by contour extraction operation using in-house refinement tool.
The denoising operation should remove the noise as much as possible while at the same time retaining signal information (details) as much as possible. A sophisticated DCNN denoising model has been applied to achieve the satisfactory results. The residual noise map shows no noticeable structure in it, suggesting that the detailed structural information of the original image is retained successfully in the denoised image, as shown in Figure 2.
In-house contour refinement tool is developed to further process the raw contours from Canny algorithm. The contour refinement operations include connecting the broken contours when the broken area is small, removing redundant fake contour segments. The refined contours are classified into two classes: one is the top resist contour class and the other is the resist bottom contour class. Each class dataset is split into three sub datasets for training, verification and test with the ratio 6:2:2. The total image number for training is around 6k. The batch size used for training is 16.
The bottom and top contour models are trained independently based on the prepared datasets. The learning rate is set at 5e-4, and the training error patient is fixed at 20 (if the training error stays unchanged in 20 consecutive epochs, then the parameters of each batch are merged to form the model). Other parameters are used as default in Adam. Data augmentation such as flip and rotation are adopted to expand the dataset. The bottom and top contour extraction model training take ~10 and 18 hours with single GPU, respectively. Although the model training takes considerable amount of time, the test is very fast with a speed of ~ 50ms/frame using single CPU only and ~20ms/frame using single GPU.
The test dataset includes post lithography SEM images with necking and bridging of varying degrees, some examples are shown in Figure 3. Because there is no universal standard for contour definition from SEM images, robustness and consistency are more important for a contour extraction algorithm, since dimension difference between contours extracted from different algorithms can be calibrated against each other easily as long as both algorithms possess internal consistency. From a practical point of view, contour extraction algorithm assessment should focus on the following aspects: i) it does not miss contour in a visually distinct boundary; ii) it does not produce fake contour in the background or in the foreground; iii) it has the flexibility and capability of repairing broken contours if the broken segments gap is small; iv) it should be fast enough to support Fab production needs.
The training and test results of the top resist contour model and the resist bottom contour model are summarized in Table 1. Areas with relatively large “broken” gap between contour segments are identified as “broken” contours; and contours that are not at feature edges are identified as “false” contours. The results indicate that the bottom contour extraction model can achieve much better result than the top contour extraction model in training, validation and test, with only few percentage of “broken” and “false” contour areas. By no means does this observation implies that the bottom contour extraction model is trained better or perform better, rather, this is because the boundary between resist and substrate is usually defined better than the boundary between the resist sidewall and the resist top surface (often rounded). The roundness of the resist top boundary leads to more ambiguity for the contour extraction model to judge whether an image pixel belongs to contour class or not. The high percentage of “broken” contours or “false” contours existing in resist top contour set is entirely due to blurred boundaries inherited in the original SEM images.
4.1. Top & Bottom Contour Extraction Model Performance
A typical result for contour extraction is illustrated in Figure 4. A general two dimensional (2D) pattern with considerable noise and some distortions is shown in Figure 4 a). Figure 4 c) and 4 d) are the top (white) and the bottom (yellow) contour results, respectively. Figure 4 b) shows the overlaid top and bottom contours on the original image, from which high fidelity and consistency of the contours relative to the original image structures are clearly demonstrated.
DCNN-based method can outperform traditional image contour extraction methods in two aspects with significantly large margin: suppressing the redundant contours and “repairing” broken contours within a local area, as shown in Figure 5. Although Canny algorithm is the landmark in image contour extraction because of its simplicity and efficiency, it often fails to achieve satisfactory results when an image is not in good condition, e.g. the boundary is not clear or continuous locally. Detailed comparison of contour extraction performance between DCNN-based method and the Canny method is shown in Figures 5 a) – 5 d). It should be understood that ‘broken’ and ‘false contours could exist in extracted contour sets despite DCNN-based method’s great versatility, as shown in Figure 6. Those “unsuccessful” contour extraction areas are very often indicative of lithography patterning failures, which can be easily identified and marked for further visual inspection.
4.2.
Process Variability Bands Calculation
For advanced lithography processes, process windows and stochastic effect induced variations are of central importance in characterizing a lithography process’ capability. To assess a lithography process in a more comprehensive and stricter fashion, CD based method is gradually supplemented with or replaced by the process variation band method. With robust SEM image contour extraction method at hand, process variation band can be estimated easily when equipped with the enhanced correlation coefficient image alignment algorithms [22]. More importantly, with contours, the statistics of edge placement variation can also be estimated. In our work, we used contour edge placement variation center as the reference, instead of the OPC lithography target edge, to calculate the edge placement variation statistics.
Figure 7.
The PV bands of SEM image with hole pattern: a) the contours of independent SEM image stack on a single one (averaged image based on aligned independent ones) to form a band; b) contours of PV bands as boarders (yellow and green lines); c) enlarged a local area (orange box) for better view of the band; d) band width calculation (red lines) at different locations.
Figure 7 shows the process variability (PV) band of a hole-array pattern. For a given pattern, images are taken from wafer processed with a single process condition. Before image contour extraction, images need to go through some pre-processing steps: 1) SEM images taken from different exposure fields are first aligned to a selected anchor image; 2) The aligned images are then denoised with DCNN based denoising model.
In Figure 7 a), the purple bands indicate the range of the contour variation and the PV band edges can be found in Figure 7 b) as the green and yellow contours. The local PV band dimension can be calculated by measuring the distance between the outer contour (green) and the inner contour (yellow), as shown in Figure 7 c) and d). Besides the PV band statistics and hole CD statistics, edge placement variation statistics can also be estimated directly. Although these three statistics are related to each other, the edge placement variation statistics characterizes the lithography process fully with more details. The edge placement variation is calculated and shown in Figure 8, the red line (fitted curve of edge placement variation) indicates a Gaussian distribution which is consistent with previous reports [6,23], and the 3σ region can be easily estimated for real-time process control such as advanced process control (APC).
A specially designed DCNN structure is proposed for lithography SEM image contour extraction. The DCNN SEM image contour extraction models outperform the prevalent traditional Canny method in a significant large margin in terms of robustness. When trained separately, the DCNN contour extraction models are capable of separating resist top contours from resist bottom contours. This capability has paved the way of estimating resist sidewall angles at any location, it can also greatly facilitate the identification of “soft defects” (the defect with high inconsistency of judgments from different lithography engineers) [24] in a lithography SEM image where contours are broken. Together with sophisticated image alignment algorithm, the statistics of the process variation band, the critical dimension, the edge placement variation, etc., can all be estimated on a full chip scale.
[1] A. Yen, P. Tzviatkov, A. Wong, et al., “Optical proximity correction for 0.3um i-line lithography”, Microelectronic Engineering, 30(1-4), 141-144 (1996).
[2] A. Yen, M. Burkhardt, J. R. Hzhoefer, et al., “Optical Proximity Correction and its Application to CD Control in High-speed Microprocessor”, Microelectronic Engineering, 41-42, 65-70 (1998).
[3] P. Filitchkin, T. Do, I. Kusunadi, et al., “Contour Quality Assessment for OPC Model Calibration”, Proc. of SPIE, 7272, 72722Q-1-7 (2009).
[4] T. Shibahara, T. Minakawa, M. Oikawa, et al., “A CD-gap-free contour extraction technique for OPC model calibration”, Proc. of SPIE, 7971, 79710O-1-8 (2011).
[5] L. Schneider, V. Farys, E. Serret, and E. Fenouillet-Beranger, “Framework for SEM contour analysis”, Proc. of SPIE, 10145, 1014513-1-14 (2017).
[6] A. Lakcher, B. Le-Gratiet, J. Ducote, et al., “Robust 2D patterns process variability assessment using CD-SEM contour extraction offline metrology”, Proc. of SPIE, 10145, 1014514-1-14 (2017).
[7] F. Wersbuch, J. Schatz, M. Ruhm, “Measuring inter-layer edge placement error with SEM contours”, Proc. of SPIE, 10775, 107750O-1-9 (2018).
[8] R. C. Gonzalaz, and R. E. Woods, “Digital Image Processing”, Third Edition, the 9th chapter, 2011.
[9] J. Canny, “A Computational Approach to Edge Detection”, IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-8(6), 679-698(1986).
[10] W. J. Luo, Y. J. Li, R. Urtasun, and R. Zemel, “Understanding the Effective Receptive Field in Deep Convolutional Neural Networks”, arXiv: 1071.04128v2 (2017).
[11] V. Gal, J. Hamori, T. Roska, et al., “Receptive Field Atlas and Related CNN Models”, International Journal of Bifurcation and Chaos, 14(2), 551-584(2004).
[12] B. Le-Gratiet, Bouyssou, Regis, et al., “Contour based metrology: getting more from a SEM image”, Proc. of SPIE, 10959, 109591M-1-9 (2019).
[13] W. Rawat, and Z. H. Wang, “Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review”, Neural Computation, 29, 2352-2449(2017).
[14] L. C. Jiao, F. Zhang, F. Liu, et al., “A Survey of Deep Learning-based Object Detection”, IEEE Access, 7, 128837-128868 (2019).
[15] A. Garcia-Garcia, S. Orts-Escolano, S. O. Oprea, et al., “A Review on Deep Learning Techniques Applied to Semantic Segmentation”, arXiv: 1074.06857v1 (2017).
[16] Q. Guo, C. M. Zhang, Y. F. Zhang, and H. Liu, “An Efficient SVD-Based Method for Image Denoising”, IEEE Transactions on circuits and systems for video technology, 26(5), 868-880 (2016).
[17] L. Zhang, and W. M. Zuo, “Image Restoration: From Sparse and Low-Rank Priors to Deep Priors”, IEEE Signal Processing Magazine, 34, 172 – 179 (2017).
[18] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International Conference on Machine Learning (2015).
[19] K. Zhang, W. M. Zuo, L. Zhang, et al., “FFDNet: Toward a Fast and Flexible Solution for CNN-Based Image Denoising”, IEEE Transactions on Image Processing, 27(9), 4608-4622 (2018).
[20] K. M. He, X. Y. Zhang, S. Q. Ren, and J. Sun, “Deep Residual Learning for Image Recognition”, in IEEE Conference on Computer Vision & Pattern Recognition (2016).
[21] D. Kingma, and J. Ba, “Adam: A method for stochastic optimization,” in International Conference for Learning Representations (2015).
[22] Georgios D. Evangelidis and Emmanouil Z. Psarakis, “Parametric Image Alignment Using Enhanced Correlation Coefficient Maximization”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(10), 1-9 (2008).
[23] J. D. Biafore, M. D. Smith, S. A.Robertson, and T. Graves, “Mechanistic Simulation of Line-Edge Roughness”, Proc. of SPIE, 6519, 95190Y-1-17 (2007).