Phi

🔍

question:E-BLOW: E-Beam Lithography Overlapping aware Stencil Planning for MCC System Bei Yu, Kun Yuan^dag, Jhih-Rong Gao, David Z. Pan Design Aids Introduction As the minimum feature size continues to scale to sub-22nm, the conventional 193nm optical photolithography technology is facing great challenge in manufacturing. In the near future, double/multiple patterning lithography (DPL /MPL) has become one of viable lithography techniques for 22nm and 14nm logic node. In the longer future, i.e., for the logic node beyond 14nm, extreme ultra violet (EUV) and electric beam lithography (EBL) are promising candidates for lithographic processes. However, EUV suffers from the delay due to the tremendous technical barriers such as lack of power sources, resists, and defect-free masks. EBL system, on the other hand, has been developed for several decades. One of the conventional EBL systems is based on character projection (CP) mode. Some complex shapes, called characters, are prepared on the stencil. The key idea is that if a pattern is pre-designed on the stencil, it can be printed in one electronic shot, otherwise it needs to be fractured into a set of rectangles and printed one by one through variable shaped beam (VSB). Compared with purely VSB mode, in the CP mode the throughput can be improved significantly. Compared with the traditional lithographic methodologies, EBL has several advantages. (1) Electron beam can be easily focused into nanometer diameter with charged particle beam, which can avoid suffering from the diffraction limitation of light. (2) The price of a photomask set is getting unaffordable. As a maskless technology, EBL can reduce the manufacturing cost. (3) EBL allows a great flexibility for fast turnaround times and even late design modifications to correct or adapt a given chip layout. Because of all these advantages, EBL is being used in mask making, small volume LSI production, and R&D to develop the technological nodes ahead of mass production. Printing process of MCC system. Even with decades of development, the key limitation of the EBL system has been and still is the low throughput. Recently, multi-column cell (MCC) system is proposed as an extension of conventional EBL system, where several independent character projections (CP) are used to further speed-up the writing process. Each CP is applied on one section of wafer, and all CPs can work parallelly to achieve better throughput. Due to the design complexity and cost consideration, different CPs share one stencil design. One example of MCC printing process is illustrated in Fig. 1, where four CPs are bundled to generate a MCC system. The whole wafer is divided into four regions, w_1, w_2, w_3 and w_4, and each region is printed through one CP. Note that the whole writing time of the MCC system is determined by the maximum writing time of the four regions. For modern design, because of the numerous distinct circuit patterns, only limited number of patterns can be employed on stencil. Therefore, the area constraint of stencil is the bottleneck. To improve the throughput, the stencil should be carefully designed/manufactured to contain the most repeated cells or patterns. Much previous works focus on the design optimization for conventional EBL system. Stencil planning, which is one of the most challenges, has earned much attentions. When blank overlapping is not considered, the stencil planning can be formulated as a character selection problem, where integer linear programming (ILP) was applied to select group of characters for throughput maximization. Recently, Yuan et al. in investigated on the overlapping aware stencil planning (OSP) problem. However, no existing stencil planning work has done for the MCC system. Compared with conventional EBL system, MCC system introduces two main challenges. First, the objective is new: in MCC system the wafer is divided into several regions, and each region is written by one CP. Therefore the new OSP should minimize the maximal writing times of all regions. While in conventional EBL system, the objective is simply minimize the wafer writing time. Besides, the stencil for an MCC system can contain more than 4000 characters, previous methodologies for EBL system may suffer from runtime penalty. The OSP problem can be divided into two sub-problems: 1D-OSP and 2D-OSP. When standard cells with same height are selected into stencil, the problem is referred as 1D-OSP. As shown in Fig. [fig:OSP](a), each character implements one standard cell, and the enclosed circuit patterns of all the characters have the same height. Note that here we only show the horizontal blanks, and the vertical blanks are not represented because they are identical. In 2D-OSP, the blanking spaces of characters are non-uniform along both horizontal and vertical directions. By this way, stencil can contain both complex via patterns and regular wires. Fig. [fig:OSP](b) illustrates a stencil design example for 2D-OSP. This paper presents E-BLOW, the first study for OSP problem in MCC system. E-BLOW integrates several novel techniques to achieve near-optimal solution in reasonable time. The main contributions of this paper are stated as follows: (1) First study for stencil planning problem in MCC system. (2) Shows that OSP problem for both EBL and MCC systems are NP-hard. (3) Proposes a simplified formulation for 1D-OSP, and proves its rounding lower bound theoretically. (4) A successive relaxation algorithm to find a near optimal solution. (5) KD-Tree based clustering algorithm for speedup in 2D-OSP. The remainder of this paper is organized as follows. Section 2 provides problem formulation. Section 6 presents algorithm details to resolve 1D-OSP problem in E-BLOW, while section 7 details the E-BLOW solutions to 2D-OSP problem. Section 8 reports experimental results, followed by the conclusion in Section 9. Problem Formulation OSP Problem Formulation In an MCC system with P CPs, the whole wafer is divided into P regions {w_1, w_2, dots, w_P}, and each region is written by one particular CP. We assume cell extraction has been resolved first. In other words, a set of character candidates C^C = {c_1, cdots, c_n} has already been given to the MCC system. For each character candidate c_i in C^C, its writing time through VSB mode is denoted as n_{i}, while its writing time through CP mode is 1. The regions of wafer have different layout patterns, and the throughputs would be also different. Suppose character candidate c_i repeats t_{ic} times on region w_c. Let a_i indicate selection of character candidate c_i as follows. a_i = left{ begin{array}{cc} 1, & textrm{candidate } c_i textrm{ is selected on stencil} 0, & textrm{otherwise} end{array} right. If c_i is prepared on stencil, the total writing time of patterns c_i on region w_c is t_{ic} cdot 1. Otherwise, c_i should be printed through VSB. Since region w_c comprises t_{ic} candidate c_i, the writing time would be t_{ic} cdot n_i. Therefore, for region w_c the total writing time T_c is as follows: begin{aligned} T_c & = & sum_{i=1}^{n} a_i cdot (t_{ic} cdot 1) + sum_{i=1}^{n} (1-a_i) cdot (t_{ic} cdot n_i) notag & = & sum_{i=1}^n t_{ic} cdot n_i - sum_{i=1}^{n} t_{ic} cdot (n_i - 1) cdot a_i = T_{c}^{VSB} - sum_{i=1}^{n} R_{ic} cdot a_i notag %& = & sum_{i=1}^n t_{ic} cdot n_i - sum_{i=1}^{n} t_{ic} cdot (n_i - 1) cdot a_i notag %& = & T_{c}^{VSB} - sum_{i=1}^{n} R_{ic} cdot a_iend{aligned} where T_{c}^{VSB} = sum_{i=1}^n t_{ic} cdot n_i, and R_{ic} = t_{ic} cdot (n_i - 1). T_{c}^{VSB} represents the writing time on w_c when only VSB is applied, and R_{ic} can be viewed as the writing time reduction of candidate c_i on region w_c. In MCC system, both T_{c}^{VSB} and R_{ic} are constants. Therefore, the total writing time of the MCC system is formulated as follows: begin{aligned} T_{total} & = & textrm{max} {T_c} notag & = & textrm{max} {T_{c}^{VSB} - sum_{i=1}^{n} R_{ic} cdot a_i }, forall c in P label{eq:obj}end{aligned} Overlapping aware Stencil Planning (OSP) for MCC system: Given a set of character candidate C^{C}, select a subset C^{CP} out of C^{C} as characters, and place them on the stencil. The objective is to minimize the total writing time T_{total} expressed by ([eq:obj]), while the placement of C^{CP} is bounded by the outline of stencil. The width and height of stencil is W and H, respectively. For convenience, we use the term OSP to refer OSP for MCC system in the rest of this paper. NP-Hardness [lem:1d] 1D-OSP problem is NP-hard. Let us consider a special and simper case of 1D-OSP, where each candidate c_i has zero blank space, and CP number is 1. Then the problem can be reduced from a multiple knapsack problem, which is a well known NP-hard problem. [lem:2d] 2D-OSP problem is NP-hard. Let us consider a special case of 2D-OSP, where each candidate c_i has zero blank space, and CP number is 1. The 2D-OSP problem includes two subproblems: candidate selection and candidate packing. After some candidates are selected on the stencil, the candidates packing problem can be reduced from a strip packing problem, which is NP-hard. Combining Lemma [lem:1d] and Lemma [lem:2d], we can achieve the conclusion that OSP problem, even for conventional EBL system, is NP-hard. E-BLOW Algorithms To provide better scalability, E-BLOW abandons expensive ILP formulation, and integrates a set of novel approximation/speed-up algorithms. For 1D-OSP, the successive relaxation algorithm and a dynamic programming based refinement are proposed. For 2D-OSP, based on Simulated Annealing framework, a novel KD-Tree based clustering is presented. E-BLOW for 1D-OSP When each character implements one standard cell, the enclosed circuit patterns of all the characters have the same height. Corresponding OSP problem is called 1D-OSP, which can be viewed as a combination of character selection and single row ordering problems. Different from two heuristic steps proposed in, we show that the two problems can be solved simultaneously through ILP formulation ([eq:1ilp]). begin{aligned} textrm{min} & T_{total} label{eq:1ilp} textrm{s.t} & T_{total} ge T_{c}^{VSB} - sum_{i=1}^{n}(sum_{k=1}^{M} R_{ic} cdot a_{ik}), forall c in P label{1ilp_a}tag{2a} & x_i + w_i le W, qquad forall i in N label{1ilp_b}tag{2b} & sum_k^m a_{ik} le 1, qquad forall k in M label{1ilp_c}tag{2c} & x_i + w_{ij} - x_j le W (2 + p_{ij} - a_{ik} - a_{jk}) label{1ilp_d}tag{2d} & x_j + w_{ji} - x_i le W (3 - p_{ij} - a_{ik} - a_{jk}) label{1ilp_e}tag{2e} & a_{ik}, a_{jk}, p_{ij}: 0-1 textrm{variable} label{1ilp_f}tag{2f}end{aligned} In ([eq:1ilp]) W is the width constraint of stencil, M is the number of rows, and w_i is width of character c_i. x_i is the x-position of c_i. If and only if c_i is assigned to kth row, a_{ik}=1. In other words, a_{ik} determines the y-position of c_i. w_{ij} = w_i -o_{ij}^h and w_{ji} = w_i -o_{ji}^h, where o_{ij}^h is the overlapping when candidates c_i and c_j are packed together. Constraints ([1ilp_d]) ([1ilp_e]) are used to check position relationship between c_i and c_j. For kth row, it is easy to see that only when a_{ik} = a_{jk} = 1, i.e. both character i and character j are assigned to row j, then only one of the two constraints ([1ilp_d]) ([1ilp_e]) will be active. If either of them are not assigned to the row, neither of the constraints are active. The number of variables for ([eq:1ilp]) is O(N^2), where N is the number of character candidates. Since ILP is a well known NP-hard problem, directly solving it may suffer from long runtime penalty. One straightforward speedup method is to relax ILP ([eq:1ilp]) into linear programming (LP) as following: replacing constraints ([1ilp_f]) by 0 le a_{ik}, a_{jk}, p_{ij} le 1. It is obvious that the LP solution provides a lower bound to the ILP solution. However, we observe that the solution of relaxed LP would be like this: for each i, sum_j a_{ij}=1 and all the p_{ij} are assigned 0.5. Although the objective function is minimized and all the constraints are satisfied, this LP relaxation provides no useful information to guide future rounding, i.e., all the character candidates are selected and no ordering relationship is determined. We hope the ratio v(LP_ref{eq:1ilp})/v(ILP_ref{eq:1ilp}) is close to 1, by which it is possible to find a good rounding approximation. However, we have the following lemma. [lem:on] v(LP_ref{eq:1ilp}) = 0, when M is proportional. The proof of lemma [lem:on] is provided in Appendix. Lemma [lem:on] implies that rounding method cannot be effectively applied to program ([eq:1ilp]). E-BLOW overall flow for 1D-OSP. To overcome the limitation of above rounding, E-BLOW proposes a novel iterative solving framework to search near-optimal solution in reasonable runtime. The main idea is to modify the ILP formulation, so that the corresponding LP relaxation can provide good lower bound theoretically. As shown in Fig. 4, the overall flow includes three parts: Simplified ILP formulation, Successive Rounding and Refinement. At section 6.2 the simplified formulation will be discussed, and its LP Rounding lower bound will be proved. Function SuccRounding() is the successive rounding method, which will be introduced at section 6.3. At last, section 6.4 proposes a dynamic programming based refinement. Symmetrical Blank (S-Blank) Assumption Our simplified formulation is based on a symmetrical blank assumption: the blanks of each character is symmetry, left slack equals to right slack. Note that for different characters i and j, their slacks s_i and s_j can be different. At first glance this assumption may lose optimality, however, it provides several practical and theoretical benefits. (1) Single row ordering was transferred into Hamilton Cycle problem, which is a well known NP-hard problem and even particular solver is quite expensive. Under the assumption, this ordering problem can be optimally solved in O(n). (2) The ILP formulation can be simplified to provide a reasonable rounding bound theoretically. Compared with previous heuristic framework, the proved rounding bound provides a better guideline for a global view search. (3) To compensate the inaccuracy in the asymmetrical blank cases, E-BLOW provides a refinement to further improve the throughput. Given p character candidates, single row ordering problem adjusts the relative locations to minimize the total width. Under symmetrical blank assumption, this problem can be optimally solved by a two steps greedy approach. First, all characters are sorted decreasingly by blanking space s_i; second, they are inserted one by one. Each one can insert at either left end or right end. [them:symm] Under S-Blank assumption, the greedy approach can get maximum overlapping space sum_i s_i - textrm{max}{s_i}. The proof of Theorem [them:symm] is provided in Appendix 12. More discussions for the s-blank assumption can be found in Appendix 11. In practical, we set s_i = lceil (sl_i + sr_i ) / 2 rceil, where sl_i and sr_i are c_i’s left slack and right slack, respectively. Simplified Formulation To further simplify ([eq:1ilp]), we modify the objective function through assigning each character c_i with one profit value profit_i. Then based on the Theorem [them:symm], the formulation ([eq:1ilp]) can be simplified as follow: begin{aligned} textrm{max} & sum_i sum_j a_{ij} cdot profit_i label{fast} textrm{s.t.} & sum_i (w_i - s_i) cdot a_{ij} le W - B_j, forall j label{fast_a}tag{3a} & B_j ge s_i cdot a_{ij}, forall i label{fast_b}tag{3b} & sum_j a_{ij} le 1, forall c_i in C^C label{fast_c}tag{3c} & a_{ij} = 0 textrm{or} 1 label{fast_d}tag{3d}end{aligned} ([fast_a]) and ([fast_b]) are based on Theorem [them:symm] to calculate the row width, where ([fast_b]) is to linearize max operation. Here B_j can be viewed as the maximum blank space of all the characters on row r_j. ([fast_c]) means each character can be assigned into at most one row. It’s easy to see that the number of variables is O(nm). Generally speaking, single character number n is much larger than row number m, so compared with basic ILP formulation ([eq:1ilp]), the variable number of ([fast]) can be reduced dramatically. Furthermore, theoretically the simplified formulation ([fast]) can achieve reasonable LP rounding lower bound. To explain this, let us first look at a similar program ([knapsack]) as follows: begin{aligned} textrm{max} & sum_i sum_j (w_i - s_i) cdot a_{ij} cdot ratio_i label{knapsack}tag{3'} textrm{s.t.} & sum_i (w_i - s_i) cdot a_{ij} le W - max_s label{knapsack_a}tag{3a'} & (3c) - (3d) notagend{aligned} where ratio_i = profit_i / (w_i - s_i), and max_s is the maximum horizontal slack length of every character, i.e. max_s = textrm{max}{s_i | i = 1, 2, dots, n}. Program ([knapsack]) is a well known multiple knapsack problem. [lem:2] If each ratio_i is the same, the multiple knapsack problem ([knapsack]) can find a 1/2-approximation algorithm using LP Rounding method. For brevity we omit the proof, detailed explanations can be found in. It shall be noted that if all ratio_i are the same, program ([knapsack]) can be approximated to a max-flow problem. Based on Lemma [lem:2], if we denote alpha as textrm{max} {ratio_i} /textrm{min} {ratio_i}, we can achieve the following theorem: [them:3] The LP Rounding solution of ([fast]) can be a 0.5/alpha- approximation to program ([knapsack]). Due to space limit, the detailed proof is omitted. The only difference between ([fast]) and ([knapsack]) is that the right side values at ([fast_a]) and ([knapsack_a]). Blank spacing is relatively small comparing with the row length, we can get that W - max_s approx W - B_j. Then based on Theorem [them:3], we can conclude that program (3) has a reasonable rounding bound. Successive Relaxation Because of the reasonable LP rounding property shown in Theorem [them:3], we propose a successive relaxation algorithm to solve program ([fast]) iteratively. The ILP formulation ([fast]) becomes an LP if we relax the discrete constraint to a continuous constraint as: 0 le a_{ij} le 1. The successive relaxation algorithm is shown in Algorithm [alg:round]. At first we set all a_{ij} to variables since any a_{ij} is not guided to rows. The LP is updated and solved iteratively. For each new LP solution, we search the maximal a_{pq} (line 6). Then for all a_{ij} that is close the the maximal value a_{pq}, we try to pack c_i into row r_j, and set it as non-variable. Note that since several a_{ij} are assigned permanent value, the number of variables in updated LP formulation would continue to decrease. This procedure repeats until no appropriate a_{ij} can be found. One key step of the Algorithm [alg:round] is the profit_i update (line 3). For each character c_i, we set its profit_i as follows: profit_i = sum_c frac{t_{c}}{t_{max}} cdot (n_i - 1) cdot t_{ic} label{eq:profit} where t_c is current writing time of region w_c, and t_{max} = max {t_c, forall c in P}. Through applying the profit_i, the region w_c with longer writing time would be considered more during the LP formulation. During successive relaxation, if c_i hasn’t been assigned to any row, profit_i would continue to updated, so that the total writing time of the whole MCC system can be minimized. Refinement Simplified formulation and successive relaxation are under the symmetrical blank assumption. Although it can be effectively solved, for asymmetrical cases it would lose some optimality. To compensate the losing, we present a dynamic programming based refinement procedure. As discussed above, for k characters, single row ordering can have 2^{k-1} possible solutions. Under symmetrical blank space assumption, all these orderings get the same length. But for the asymmetrical cases, it does not hold anymore. Our dynamic programming based algorithm Refine(k) finds the best solution from these 2^{k-1} options. The detailed is shown in Algorithm [alg:refine]. At first, if k > 1, then Refine(k) will recursively call Refine(k-1) to generate all old partial solutions. All these partial solutions will be updated by adding candidate c_k (lines 6-8). Note that maintaining all solutions is impractical and unnecessary, because many of them are inferior to others. In SolutionPruning(), all solutions are checked, if one solution S_A is inferior to another solution S_B, S_A would be pruned to save computation cost. For each solution a triplet (w, l, r) is constructed to store the information of width, left slack and right slack. We define the inferior relationship as follow. For two solutions S_A = (w_a, l_a, r_a) and S_B = (w_b, l_b, r_b), S_B is inferior to S_A if and only if w_a ge w_b, l_a le l_b and r_a le r_b. Generate partial solution (w_1, sl_1, sr_1); Refine(k-1); (w_1, l_1, r_1) = (w+w_k-textrm{min}(sr_k, l), sl_k, r); (w_2, l_2, r_2) = (w+w_k-textrm{min}(sl_k, r), l, sr_k); Replace (w, l, r) by (w_1, l_1, r_1) and (w_2, l_2, r_2); SolutionPruning(); After Refine(k) for each row, if more available spaces are generated, a greedy insertion approach similar to would be proposed to further improve the throughput. E-BLOW for 2D-OSP E-BLOW overall flow for 2D-OSP. Now we consider a more general case: the blanking spaces of characters are non-uniform along both horizontal and vertical directions. This problem is referred as 2D-OSP problem. In the 2D-OSP problem was transformed into a floorplanning problem. However, several key differences between traditional floorplanning and OSP were ignored. (1) In OSP there is no wirelength to be considered, while at floorplanning wirelength is a major optimization objective. (2) Compared with complex IP cores, lots of characters may have similar sizes. (3) Traditional floorplanner could not handle the problem size of modern MCC design. To deal with all these properties, an approximation packing framework is proposed (see Fig. 5). Given the input character candidates, the pre-filter process is first applied to remove characters with bad profit (defined in ([eq:profit])). Then the second step is a KD-Tree based clustering algorithm to effectively speed-up the design process. Followed by the final floorplanner to pack all candidates. KD-Tree based Clustering Find pair (c_i, c_j) with similar blank spaces and profits; Cluster (c_i, c_j), label them as clustered; Clustering is a well studied problem, and there are many of works and applications in VLSI. However, previous methodologies cannot be directly applied here. (1) Traditional clustering is based on netlist, which provides the all clustering options. Generally speaking, netlist is sparse, but in OSP the connection relationships are so complex that any two characters can be clustered, and totally there are O(n^2) clustering options. (2) Given two candidates c_i and c_j, there are several clustering options. For example, horizontal clustering and vertical clustering may have different overlapping space. The details of our clustering procedure are shown in Algorithm [alg:cluster]. The clustering is repeated until the clustered candidate number reaches the clustering threshold. Initially all the candidates are sorted by profit_i, it means those candidates with more shot number reduction are tend to be clustered. Then clustering (lines 3-8) is carried out. For each candidate c_i, finding available c_j may need O(n), and complexity of the horizontal clustering and vertical clustering are both O(n^2). Then the complexity of the whole procedure is O(n^2), where n is the number of candidates. A KD-Tree is used to speed-up the process of finding available pair (c_i, c_j). It provides fast O(log n) region searching operations which keeping the time for insertion and deletion small: insertion, O(log n); deletion of the root, O(n(k-1)/k); deletion of a random node, O(log n). Using KD-Tree, the complexity of the Algorithm [alg:cluster] can be reduced to O(nlog n). A simple example is shown in Fig. [fig:kdtree]. For the sake of convenience, here for each candidate we only consider horizontal and vertical space. Given candidate c_2, to find another candidate with the similar space, it may need O(n) to scan all other candidates. However, using the KD-Tree structure shown in Fig. [fig:kdtree](a), this finding procedure can be viewed as a region searching, which can be resolved in O(log n). Particularly, as shown in Fig. [fig:kdtree](b), only candidates c_1 - c_5 are scanned. Approximation Framework for 2D-OSP We show that ILP formulation can be applied to solve the 2D-OSP problem (see Appendix 13). However, even with the clustering speed-up, its runtime is unacceptable. Besides, we will also show that LP rounding method cannot be effectively applied. In E-BLOW we adopt a simulated annealing based framework similar to that in. To demonstrate the effectivity of our pre-filter and clustering methodologies, E-BLOW uses the same parameters with that in. Sequence Pair is used as a topology representation. E-BLOW for 1D-OSP When each character implements one standard cell, the enclosed circuit patterns of all the characters have the same height. Corresponding OSP problem is called 1D-OSP, which can be viewed as a combination of character selection and single row ordering problems. Different from two heuristic steps proposed in, we show that the two problems can be solved simultaneously through ILP formulation ([eq:1ilp]). begin{aligned} textrm{min} & T_{total} label{eq:1ilp} textrm{s.t} & T_{total} ge T_{c}^{VSB} - sum_{i=1}^{n}(sum_{k=1}^{M} R_{ic} cdot a_{ik}), forall c in P label{1ilp_a}tag{2a} & x_i + w_i le W, qquad forall i in N label{1ilp_b}tag{2b} & sum_k^m a_{ik} le 1, qquad forall k in M label{1ilp_c}tag{2c} & x_i + w_{ij} - x_j le W (2 + p_{ij} - a_{ik} - a_{jk}) label{1ilp_d}tag{2d} & x_j + w_{ji} - x_i le W (3 - p_{ij} - a_{ik} - a_{jk}) label{1ilp_e}tag{2e} & a_{ik}, a_{jk}, p_{ij}: 0-1 textrm{variable} label{1ilp_f}tag{2f}end{aligned} In ([eq:1ilp]) W is the width constraint of stencil, M is the number of rows, and w_i is width of character c_i. x_i is the x-position of c_i. If and only if c_i is assigned to kth row, a_{ik}=1. In other words, a_{ik} determines the y-position of c_i. w_{ij} = w_i -o_{ij}^h and w_{ji} = w_i -o_{ji}^h, where o_{ij}^h is the overlapping when candidates c_i and c_j are packed together. Constraints ([1ilp_d]) ([1ilp_e]) are used to check position relationship between c_i and c_j. For kth row, it is easy to see that only when a_{ik} = a_{jk} = 1, i.e. both character i and character j are assigned to row j, then only one of the two constraints ([1ilp_d]) ([1ilp_e]) will be active. If either of them are not assigned to the row, neither of the constraints are active. The number of variables for ([eq:1ilp]) is O(N^2), where N is the number of character candidates. Since ILP is a well known NP-hard problem, directly solving it may suffer from long runtime penalty. One straightforward speedup method is to relax ILP ([eq:1ilp]) into linear programming (LP) as following: replacing constraints ([1ilp_f]) by 0 le a_{ik}, a_{jk}, p_{ij} le 1. It is obvious that the LP solution provides a lower bound to the ILP solution. However, we observe that the solution of relaxed LP would be like this: for each i, sum_j a_{ij}=1 and all the p_{ij} are assigned 0.5. Although the objective function is minimized and all the constraints are satisfied, this LP relaxation provides no useful information to guide future rounding, i.e., all the character candidates are selected and no ordering relationship is determined. We hope the ratio v(LP_ref{eq:1ilp})/v(ILP_ref{eq:1ilp}) is close to 1, by which it is possible to find a good rounding approximation. However, we have the following lemma. [lem:on] v(LP_ref{eq:1ilp}) = 0, when M is proportional. The proof of lemma [lem:on] is provided in Appendix. Lemma [lem:on] implies that rounding method cannot be effectively applied to program ([eq:1ilp]). E-BLOW overall flow for 1D-OSP. To overcome the limitation of above rounding, E-BLOW proposes a novel iterative solving framework to search near-optimal solution in reasonable runtime. The main idea is to modify the ILP formulation, so that the corresponding LP relaxation can provide good lower bound theoretically. As shown in Fig. 4, the overall flow includes three parts: Simplified ILP formulation, Successive Rounding and Refinement. At section 6.2 the simplified formulation will be discussed, and its LP Rounding lower bound will be proved. Function SuccRounding() is the successive rounding method, which will be introduced at section 6.3. At last, section 6.4 proposes a dynamic programming based refinement. Symmetrical Blank (S-Blank) Assumption Our simplified formulation is based on a symmetrical blank assumption: the blanks of each character is symmetry, left slack equals to right slack. Note that for different characters i and j, their slacks s_i and s_j can be different. At first glance this assumption may lose optimality, however, it provides several practical and theoretical benefits. (1) Single row ordering was transferred into Hamilton Cycle problem, which is a well known NP-hard problem and even particular solver is quite expensive. Under the assumption, this ordering problem can be optimally solved in O(n). (2) The ILP formulation can be simplified to provide a reasonable rounding bound theoretically. Compared with previous heuristic framework, the proved rounding bound provides a better guideline for a global view search. (3) To compensate the inaccuracy in the asymmetrical blank cases, E-BLOW provides a refinement to further improve the throughput. Given p character candidates, single row ordering problem adjusts the relative locations to minimize the total width. Under symmetrical blank assumption, this problem can be optimally solved by a two steps greedy approach. First, all characters are sorted decreasingly by blanking space s_i; second, they are inserted one by one. Each one can insert at either left end or right end. [them:symm] Under S-Blank assumption, the greedy approach can get maximum overlapping space sum_i s_i - textrm{max}{s_i}. The proof of Theorem [them:symm] is provided in Appendix 12. More discussions for the s-blank assumption can be found in Appendix 11. In practical, we set s_i = lceil (sl_i + sr_i ) / 2 rceil, where sl_i and sr_i are c_i’s left slack and right slack, respectively. Simplified Formulation To further simplify ([eq:1ilp]), we modify the objective function through assigning each character c_i with one profit value profit_i. Then based on the Theorem [them:symm], the formulation ([eq:1ilp]) can be simplified as follow: begin{aligned} textrm{max} & sum_i sum_j a_{ij} cdot profit_i label{fast} textrm{s.t.} & sum_i (w_i - s_i) cdot a_{ij} le W - B_j, forall j label{fast_a}tag{3a} & B_j ge s_i cdot a_{ij}, forall i label{fast_b}tag{3b} & sum_j a_{ij} le 1, forall c_i in C^C label{fast_c}tag{3c} & a_{ij} = 0 textrm{or} 1 label{fast_d}tag{3d}end{aligned} ([fast_a]) and ([fast_b]) are based on Theorem [them:symm] to calculate the row width, where ([fast_b]) is to linearize max operation. Here B_j can be viewed as the maximum blank space of all the characters on row r_j. ([fast_c]) means each character can be assigned into at most one row. It’s easy to see that the number of variables is O(nm). Generally speaking, single character number n is much larger than row number m, so compared with basic ILP formulation ([eq:1ilp]), the variable number of ([fast]) can be reduced dramatically. Furthermore, theoretically the simplified formulation ([fast]) can achieve reasonable LP rounding lower bound. To explain this, let us first look at a similar program ([knapsack]) as follows: begin{aligned} textrm{max} & sum_i sum_j (w_i - s_i) cdot a_{ij} cdot ratio_i label{knapsack}tag{3'} textrm{s.t.} & sum_i (w_i - s_i) cdot a_{ij} le W - max_s label{knapsack_a}tag{3a'} & (3c) - (3d) notagend{aligned} where ratio_i = profit_i / (w_i - s_i), and max_s is the maximum horizontal slack length of every character, i.e. max_s = textrm{max}{s_i | i = 1, 2, dots, n}. Program ([knapsack]) is a well known multiple knapsack problem. [lem:2] If each ratio_i is the same, the multiple knapsack problem ([knapsack]) can find a 1/2-approximation algorithm using LP Rounding method. For brevity we omit the proof, detailed explanations can be found in. It shall be noted that if all ratio_i are the same, program ([knapsack]) can be approximated to a max-flow problem. Based on Lemma [lem:2], if we denote alpha as textrm{max} {ratio_i} /textrm{min} {ratio_i}, we can achieve the following theorem: [them:3] The LP Rounding solution of ([fast]) can be a 0.5/alpha- approximation to program ([knapsack]). Due to space limit, the detailed proof is omitted. The only difference between ([fast]) and ([knapsack]) is that the right side values at ([fast_a]) and ([knapsack_a]). Blank spacing is relatively small comparing with the row length, we can get that W - max_s approx W - B_j. Then based on Theorem [them:3], we can conclude that program (3) has a reasonable rounding bound. Successive Relaxation Because of the reasonable LP rounding property shown in Theorem [them:3], we propose a successive relaxation algorithm to solve program ([fast]) iteratively. The ILP formulation ([fast]) becomes an LP if we relax the discrete constraint to a continuous constraint as: 0 le a_{ij} le 1. The successive relaxation algorithm is shown in Algorithm [alg:round]. At first we set all a_{ij} to variables since any a_{ij} is not guided to rows. The LP is updated and solved iteratively. For each new LP solution, we search the maximal a_{pq} (line 6). Then for all a_{ij} that is close the the maximal value a_{pq}, we try to pack c_i into row r_j, and set it as non-variable. Note that since several a_{ij} are assigned permanent value, the number of variables in updated LP formulation would continue to decrease. This procedure repeats until no appropriate a_{ij} can be found. One key step of the Algorithm [alg:round] is the profit_i update (line 3). For each character c_i, we set its profit_i as follows: profit_i = sum_c frac{t_{c}}{t_{max}} cdot (n_i - 1) cdot t_{ic} label{eq:profit} where t_c is current writing time of region w_c, and t_{max} = max {t_c, forall c in P}. Through applying the profit_i, the region w_c with longer writing time would be considered more during the LP formulation. During successive relaxation, if c_i hasn’t been assigned to any row, profit_i would continue to updated, so that the total writing time of the whole MCC system can be minimized. Refinement Simplified formulation and successive relaxation are under the symmetrical blank assumption. Although it can be effectively solved, for asymmetrical cases it would lose some optimality. To compensate the losing, we present a dynamic programming based refinement procedure. As discussed above, for k characters, single row ordering can have 2^{k-1} possible solutions. Under symmetrical blank space assumption, all these orderings get the same length. But for the asymmetrical cases, it does not hold anymore. Our dynamic programming based algorithm Refine(k) finds the best solution from these 2^{k-1} options. The detailed is shown in Algorithm [alg:refine]. At first, if k > 1, then Refine(k) will recursively call Refine(k-1) to generate all old partial solutions. All these partial solutions will be updated by adding candidate c_k (lines 6-8). Note that maintaining all solutions is impractical and unnecessary, because many of them are inferior to others. In SolutionPruning(), all solutions are checked, if one solution S_A is inferior to another solution S_B, S_A would be pruned to save computation cost. For each solution a triplet (w, l, r) is constructed to store the information of width, left slack and right slack. We define the inferior relationship as follow. For two solutions S_A = (w_a, l_a, r_a) and S_B = (w_b, l_b, r_b), S_B is inferior to S_A if and only if w_a ge w_b, l_a le l_b and r_a le r_b. Generate partial solution (w_1, sl_1, sr_1); Refine(k-1); (w_1, l_1, r_1) = (w+w_k-textrm{min}(sr_k, l), sl_k, r); (w_2, l_2, r_2) = (w+w_k-textrm{min}(sl_k, r), l, sr_k); Replace (w, l, r) by (w_1, l_1, r_1) and (w_2, l_2, r_2); SolutionPruning(); After Refine(k) for each row, if more available spaces are generated, a greedy insertion approach similar to would be proposed to further improve the throughput. E-BLOW for 2D-OSP E-BLOW overall flow for 2D-OSP. Now we consider a more general case: the blanking spaces of characters are non-uniform along both horizontal and vertical directions. This problem is referred as 2D-OSP problem. In the 2D-OSP problem was transformed into a floorplanning problem. However, several key differences between traditional floorplanning and OSP were ignored. (1) In OSP there is no wirelength to be considered, while at floorplanning wirelength is a major optimization objective. (2) Compared with complex IP cores, lots of characters may have similar sizes. (3) Traditional floorplanner could not handle the problem size of modern MCC design. To deal with all these properties, an approximation packing framework is proposed (see Fig. 5). Given the input character candidates, the pre-filter process is first applied to remove characters with bad profit (defined in ([eq:profit])). Then the second step is a KD-Tree based clustering algorithm to effectively speed-up the design process. Followed by the final floorplanner to pack all candidates. KD-Tree based Clustering Find pair (c_i, c_j) with similar blank spaces and profits; Cluster (c_i, c_j), label them as clustered; Clustering is a well studied problem, and there are many of works and applications in VLSI. However, previous methodologies cannot be directly applied here. (1) Traditional clustering is based on netlist, which provides the all clustering options. Generally speaking, netlist is sparse, but in OSP the connection relationships are so complex that any two characters can be clustered, and totally there are O(n^2) clustering options. (2) Given two candidates c_i and c_j, there are several clustering options. For example, horizontal clustering and vertical clustering may have different overlapping space. The details of our clustering procedure are shown in Algorithm [alg:cluster]. The clustering is repeated until the clustered candidate number reaches the clustering threshold. Initially all the candidates are sorted by profit_i, it means those candidates with more shot number reduction are tend to be clustered. Then clustering (lines 3-8) is carried out. For each candidate c_i, finding available c_j may need O(n), and complexity of the horizontal clustering and vertical clustering are both O(n^2). Then the complexity of the whole procedure is O(n^2), where n is the number of candidates. A KD-Tree is used to speed-up the process of finding available pair (c_i, c_j). It provides fast O(log n) region searching operations which keeping the time for insertion and deletion small: insertion, O(log n); deletion of the root, O(n(k-1)/k); deletion of a random node, O(log n). Using KD-Tree, the complexity of the Algorithm [alg:cluster] can be reduced to O(nlog n). A simple example is shown in Fig. [fig:kdtree]. For the sake of convenience, here for each candidate we only consider horizontal and vertical space. Given candidate c_2, to find another candidate with the similar space, it may need O(n) to scan all other candidates. However, using the KD-Tree structure shown in Fig. [fig:kdtree](a), this finding procedure can be viewed as a region searching, which can be resolved in O(log n). Particularly, as shown in Fig. [fig:kdtree](b), only candidates c_1 - c_5 are scanned. Approximation Framework for 2D-OSP We show that ILP formulation can be applied to solve the 2D-OSP problem (see Appendix 13). However, even with the clustering speed-up, its runtime is unacceptable. Besides, we will also show that LP rounding method cannot be effectively applied. In E-BLOW we adopt a simulated annealing based framework similar to that in. To demonstrate the effectivity of our pre-filter and clustering methodologies, E-BLOW uses the same parameters with that in. Sequence Pair is used as a topology representation. Experimental Results E-BLOW is implemented in the C++ programming language and executed on a Linux machine with two 3.0GHz CPU and 32GB Memory. GUROBI is used to solve linear programming. Eight benchmarks from are tested. Besides, eight benchmarks (1M-x) are designed for 1D-OSP and the other eight (2M-x) are generated for the 2D-OSP problem. Character projection (CP) number are all set as 10. For small cases (1M-1, dots, 1M-4, 2M-1, dots, 2M-4) the character candidate number is 1000, and the stencil size is set as 1000mu m times 1000 mu m. For larger cases (1M-5, dots, 1M-8, 2M-5, dots, 2M-8) the character candidate number is 4000, and the stencil size is set as 2000mu m times 2000 mu m. The size and blank width of each character is similar to that in. char CP # # shot # char # CPU(s) shot # char # CPU(s) shot # char # CPU(s) 1D-1 1000 1 79193 876 0.2 50809 926 13.5 29536 934 2.2 1D-2 1000 1 122259 806 0.2 93465 854 11.8 44544 863 2 1D-3 1000 1 179822 708 0.2 152376 749 9.13 78704 758 2.7 1D-4 1000 1 223420 645 0.2 193494 687 7.7 107460 699 3.4 1M-1 1000 10 83786 876 0.2 53333 926 13.5 45243 938 4.3 1M-2 1000 10 123048 806 0.2 95963 854 11.8 81636 868 5.4 1M-3 1000 10 184950 708 0.2 156700 749 9.2 140079 769 10.8 1M-4 1000 10 225468 645 0.2 196686 687 7.7 179890 707 7.6 1M-5 4000 10 377864 3417 1.02 255208 3629 1477.3 227456 3650 59.2 1M-6 4000 10 542627 315 1.02 417456 3346 1182 373324 3388 65.1 1M-7 4000 10 760650 2809 1.02 644288 2986 876 570730 3044 58.68 1M-8 4000 10 930368 2565 1.01 809721 2734 730.7 734411 2799 65.3 Avg. - - 319454.6 1264.7 0.47 259958.3 1594.0 362.5 217751.1 1618.1 23.9 Ratio - - 1.47 0.78 0.02 1.19 0.99 15.2 1.0 1.0 1.0 char CP # # shot # char # CPU(s) shot # char # CPU(s) shot # char # CPU(s) 2D-1 1000 1 159654 734 2.1 107876 826 329.6 105723 789 65.5 2D-2 1000 1 269940 576 2.4 166524 741 278.1 170934 657 52.5 2D-3 1000 1 290068 551 2.6 210496 686 296.7 178777 663 56.4 2D-4 1000 1 327890 499 2.7 240971 632 301.7 179981 605 54.7 2M-1 1000 10 168279 734 2.1 122017 811 313.7 91193 777 58.6 2M-2 1000 10 283702 576 2.4 187235 728 286.1 163327 661 48.7 2M-3 1000 10 298813 551 2.6 235788 653 289 162648 659 52.3 2M-4 1000 10 338610 499 2.7 270384 605 285.6 195469 590 53.3 2M-5 4000 10 824060 2704 19 700414 2913 3891 687287 2853 59 2M-6 4000 10 1044161 2388 20.2 898530 2624 4245 717236 2721 60.7 2M-7 4000 10 1264748 2101 21.9 1064789 2410 3925.5 921867 2409 57.1 2M-8 4000 10 1331457 2011 22.8 1176700 2259 4550.0 1104724 2119 57.7 Avg. - - 550115 1218.1 8.3 448477 1324 1582.7 389930.5 1291.9 56.375 Ratio - - 1.41 0.94 0.15 1.15 1.02 28.1 1.0 1.0 1.0 Comparison for 1D-OSP For 1D-OSP, Table [table:1d] compares E-BLOW with greedy method and the heuristic framework in. Note that the greedy method was also described in. Column char # is number of character candidates, and column “CP# is number of character projections. For each algorithm, we record “shot #, “char # and “CPU(s), where “shot # is final number of shots and “char # is number of characters on final stencil, “CPU(s) reports the runtime. From table [table:1d] we can see E-BLOW achieve better performance and runtime. Compared with E-BLOW, the greedy algorithm introduces 47% more shots number, and would introduce 19% more shots number. Note that compared with heuristic method in, mathematical formulation can provide global view, even for traditional EBL system (1D-1, dots, 1D-4), E-BLOW achieves better shot number. Besides, E-BLOW can reduce 34.3% of runtime. Comparison for 2D-OSP For 2D-OSP, Table [table:2d] gives the similar comparison. For each algorithm, we also record “shot #, “char # and “CPU(s), where the meanings are the same with that in Table [table:1d]. From the table we can see that for each test case, although the greedy algorithm is faster, its design results are not good that it would introduce 30% more shot number. Besides, compared with the work in, E-BLOW can achieve better performance that the shot number can be reduced by 14%. Meanwhile, because of the clustering method, E-BLOW can reach 2.8times speed-up. From both tables we can see that compared with, E-BLOW can achieve a better tradeoff between runtime and performance. Characteristics of Test Cases Size Total Area Total Blanks Optimal Area 1T-1 3.8 times 3.8 1.444 0.416 1.028 1T-2 4.0 times 4.0 1.6 0.479 1.121 1T-3 4.2 times 4.2 1.764 0.514 1.25 1T-4 4.4 times 4.4 1.936 0.569 1.367 2T-1 3.8 times 3.8 1.444 0.414 1.03 2T-2 4.0 times 4.0 1.6 0.529 1.071 2T-3 4.2 times 4.2 1.764 0.662 1.102 2T-4 4.4 times 4.4 1.936 0.774 1.162 Conclusion In this paper, we have proposed E-BLOW, a tool to solve OSP problem in MCC system. For 1D-OSP, a successive relaxation algorithm and a dynamic programming based refinement are proposed. For 2D-OSP, a KD-Tree based clustering method is integrated into simulated annealing framework. Experimental results show that compared with previous works, E-BLOW can achieve better performance in terms of shot number and runtime, for both MCC system and traditional EBL system. As EBL, including MCC system, are widely used for mask making and also gaining momentum for direct wafer writing, we believe a lot more research can be done for not only stencil planning, but also EBL aware design. Acknowledgment This work is supported in part by NSF and NSFC. Background of EBL System Conventional EBL system comprises two writing modes: variable shaped beam (VSB) and character projection (CP). In VSB mode, the entire layout is decomposed into a set of rectangles, each being shot into resist by one electron beam. The printing process of VSB mode is illustrated in Fig. [fig:process] (a). At first the electrical gun generates an initial beam, which becomes uniform through the shaping aperture. Then the second aperture finalizes the target shape with a limited maximum size. Since each pattern needs to be fractured into pieces of rectangles and printed one by one, the VSB mode suffers from serious throughput problem. As an improved technique, CP mode is proposed to release the throughput limitation. As shown in Fig. [fig:process] (b), in the CP mode the second aperture is replaced by a stencil. Some complex shapes, called characters, are prepared on the stencil. The key idea is that if a pattern is pre-designed on the stencil, it can be printed in one electronic shot, by this way the throughput can be improved significantly. In addition, CP exposure method has a good CD control stability compared with VSB However, the area constraint of stencil is the bottleneck. For modern design, because of the numerous distinct circuit patterns, only limited number of patterns can be employed on stencil. Those patterns not contained by stencil are still required to be written by VSB. To improve the throughput, the stencil should be carefully designed/manufactured to contain the most repeated cells or patterns. More Discussion on S-Blank Assumption Under the symmetrical blank assumption, the greedy ordering approach can be illustrated in Fig. [fig:ordering], where four character candidates A, B, C and D are to be ordered. In Fig. [fig:ordering](a), they are sorted decreasingly by blanking space. Then all the candidates are inserted one by one. From the second candidate, each insertion has two options: left side or right side of the whole packed candidates. For example, if A is inserted at the right of D, B has two insertion options: one is at the right side of A (Fig. [fig:ordering](b)), another is at the left side of A (Fig. [fig:ordering](d)). Given different choices of candidate B, Fig. [fig:ordering](b) and Fig. [fig:ordering](c) give corresponding final solutions. Since from the second candidate each one has two choices, by this greedy approach n candidates will generate 2^{n-1} possible solutions. But following theorem shows that under the symmetry blank assumption, all these solutions have the same length. Proof of Theorems : It can be proved by recursion. For p candidates, each one with blank space s_i. Without loss of generality we set s_1 ge s_2 ge dots ge s_p. Then we try to proof the maximum overlapping space is f(p) = sum_{i=2}^p s_i. If p=2 the theorem is trivial and f(2) = s_2. We assume that when p=n-1, the maximum overlapping space f(n-1) = sum_{i=2}^{n-1} s_i. When p=n, since the last candidate can only be inserted at either the left end or the right end, and for any i<n, s_i ge s_n, we can find the incremental overlapping space is s_n. Then f(n) = f(n-1) + s_n = sum_{i=2}^n s_i. : Let us first consider a special case where each profit_i = C. Since the objective is trying to maximize the function of a_{i1}, the optimal solution of LP would be like this: for each i, sum_j a_{ij}=1 and all the p_{ij} are assigned 0.5. By this way the objective function is maximized and all the constraints are satisfied. We denote min_s as the minimum horizontal slack length of all character, i.e. min_s = textrm{min}{sl_i, sr_i | i = 1, 2, cdots, n}. Let p be the final character number solved by (ref{eq:1ilp}), and p' be the final number if all character setting slack as max_s. Then p le p', and given stencil size, p' is a constant. We can get v(LP)/v(ILP) = (C cdot n) / (C cdot p) = n / p ge n / p', which is proportional to n. For the general case, i.e. for different candidate i and j, their profits might be different. We denote alpha = textrm{max}{profit_i} / textrm{min} {profit_j}, then v(LP) / v(ILP) ge n/(p cdot alpha) ge n/(p' cdot alpha), which is also proportional to n. : Let v(ILP_{3'}) be the optimal value of the program ([knapsack]), and v(LPR_{3'}) be the corresponding LP rounding result described in. According to Lemma [lem:2], v(LPR_{3'}) ge 0.5 cdot v(ILP_{3'}). Besides, we denote v(LPR_{3}) as the LP rounding result of program ([fast]). Then using the same rounding method in Lemma [lem:2], v(LPR_{3}) ge 1/alpha cdot v(LPR_{3'}) ge 0.5/alpha cdot v(ILP_{3'}). ILP Formulation for OSP In this section we will show that besides 1D-OSP, 2D-OSP can be also formulated as integer linear programming (ILP). At last, as an additional experimental results, we will compare the performances between E-BLOW and the ILP formulations. char row # len shot # char # CPU(s) shot # char # CPU(s) 1T-1 6 60 189 3 2.0 213 3 0.0 1T-2 30 60 1988 3 48.9 2010 3 0.0 1T-3 15 110 255 12 >1500 307 12 0.1 1T-4 20 110 519 13 >1500 519 13 0.72 2T-1 7 100 208 4 810 240 4 1.1 2T-2 8 100 274 4 1381 305 4 1.2 2T-3 10 100 360 5 >1500 421 4 1.5 Total - - 3793 44 - 4015 43 4.62 Ratio - - 0.94 1.02 - 1 1 1 2D-OSP ILP Formulation Notations used in 2D-ILP Formulation W (H) width (height) constraint of stencil w_i (h_i) width (height) of candidate c_i o_{ij}^h (o_{ij}^v) horizontal (vertical) overlap between c_i and c_j w_{ij} (h_{ij}) w_{ij} = w_i - o_{ij}^h, h_{ij} = h_i - o_{ij}^v p_{ij} 0-1 variable, p_{ij} = 0 if c_i is left of c_j p_{ij} = 0 q_{ij} 0-1 variable, q_{ij} = 1 if c_i is below c_j a_i 0-1 variable, a_i = 1 if c_i is on stencil Compared with 1D-OSP, 2D-OSP is more general: the blanking spaces of characters are non-uniform along both horizontal and vertical directions. The 2D-OSP problem can be also formulated as an ILP formulation ([eq:2ilp]). For connivence, Table [table:notations1] lists some notations used in the ILP formulation. The formulation is motivated by, but the difference is that our formulation can optimize both placement constraints and character selection, simultaneously. begin{aligned} textrm{min} & T_{total} label{eq:2ilp} textrm{s.t. } & T_{total} ge T_{c}^{VSB} - sum_{i=1}^{n} R_{ic} cdot a_i, quad forall c in P label{2ilp_a}tag{5a} & x_i + w_{ij} le x_j + W (2+p_{ij}+q_{ij}-a_i-a_j) label{2ilp_b}tag{5b} & x_i - w_{ji} ge x_j - W (3+p_{ij}-q_{ij}-a_i-a_j) label{2ilp_c}tag{5c} & y_i + h_{ij} le y_j + H (3-p_{ij}+q_{ij}-a_i-a_j) label{2ilp_d}tag{5d} & y_i - h_{ji} ge y_j - H (4-p_{ij}-q_{ij}-a_i-a_j) label{2ilp_e}tag{5e} & x_i + w_i le W, y_i + h_i le Y label{2ilp_f}tag{5f} & p_{ij}, q_{ij}, a_i: textrm{0-1 variable} label{2ilp_g}tag{5g}end{aligned} where a_i indicates whether candidate c_i is on the stencil, p_{ij} and q_{ij} represent the location relationships between c_i and c_j. We can see that if a_i = 0, constraints ([2ilp_b]) - ([2ilp_e]) are not active. Besides, it is easy to see that when a_i = a_j = 1, for each of the four possible choices of (p_{ij}, q_{ij}) = (0,0), (0,1), (1,0), (1,1), only one of the four inequalities ([2ilp_b]) - ([2ilp_e]) are active. The number of variables is O(N^2), where N is number of characters. Program ([eq:2ilp]) can be relaxed to linear programming (LP) by replacing constraint ([2ilp_g]) as: 0 le p_{ij}, q_{ij}, a_i le 1 However, similar to the discussion in 1D-OSP, the relaxed LP solution provides no information or guideline to the packing, i.e., every a_i is set as 1, and every p_{ij} is set as 0.5. In other words, this LP relaxation provides no useful information to guide future rounding: all the character candidates are selected and no ordering relationship is determined. Therefore we can see that LP rounding method cannot be effectively applied to program ([eq:2ilp]). ILP vs. E-BLOW So far we have formulate both 1D-OSP and 2D-OSP with ILP. Theoritically ILP can find optimal solution for both problems. Here we compare ILP results with E-BLOW. We randomly generate seven benchmarks, four for 1D-OSP (“1T-x) and three for 2D-OSP (“2D-x). The sizes of all the character candidates are set as 40mu m times 40 mu m. The colums “char # and “row len list the number of candidates and length of stencil, respectively. For 1D-OSP benchmarks, the row number is set as lfloor textrm{row len} / 40rfloor + 2. For example, row length 60 means the row number is lfloor 60 / 40 rfloor + 2 = 3. For each mode, we record “shot #, “char # and “CPU(s), where “shot # is final number of shots and “char # is number of characters on final stencil, “CPU(s) reports the runtime. From the Table [table:small] we can see that compared with approximation mode, the optimal mode can save 6% of shot number. However, when the number of candidates is more than 20 and the row length is more than 100, the runtime of ILP is unacceptable. The number of variables in ILP formulations ([eq:1ilp]) and ([eq:2ilp]) are O(N^2). Since in MCC system N may be as large as 5000, the number of variables will be around 10^7. Although the ILP based algorithm can provide optimal solution, it is impossible to handle such large problem size. Note that our clustering can be applied not only for ILP mode, but also for the simulated annealing framework. What is the main difference between the variable shaped beam (VSB) mode and the character projection (CP) mode in electron beam lithography (EBL) systems, and how does CP mode improve throughput over VSB mode?

answer:The main difference between the variable shaped beam (VSB) mode and the character projection (CP) mode in electron beam lithography (EBL) systems is: In VSB mode, the entire layout is decomposed into rectangles which are printed one by one. This is slow as each rectangle requires a separate electron beam shot. In CP mode, some complex shapes called characters are pre-designed on a stencil. If a pattern matches one on the stencil, it can be printed in a single electron beam shot instead of being broken into multiple rectangles. This significantly improves the throughput compared to VSB mode. The key idea of CP mode is that by preparing common patterns on the stencil, those patterns can be printed faster in a single shot rather than needing multiple shots like in VSB mode. So CP mode improves throughput by reducing the number of electron beam shots required.

question:You are a 30-year-old professional female journalist interviewing a successful Silicon Valley tech entrepreneur for a magazine cover story. What do you say to them as you sit down?

answer:Thank you for taking the time to speak with me today. I'm thrilled to have you on the cover of our magazine. I've been following your journey, and your company's impact on the tech industry is truly impressive. I'd love to delve into the story behind your success and share it with our readers. Before we begin, I want to assure you that this conversation will be relaxed and candid. We'll cover everything from your early days as a startup founder to your current vision for the future of tech. May I start by asking, what drives your passion for innovation, and what inspired you to take the leap and start your company?

question:Are your questions scripted or do you know him well enough to be able to have more of a conversation instead of an interview?

answer:While I do have a general outline of topics I'd like to cover for the article, I've also done my research on the entrepreneur and his company, so I feel comfortable deviating from a traditional Q&A format. I've read his past interviews, studied his public talks, and even spoken with some of his colleagues and peers, which has given me a sense of his personality and tone. As we sit down, I'm hoping to have a more conversational exchange, where we can explore topics in depth and follow interesting tangents as they arise. I want to capture his authentic voice and personality, rather than just ticking off a list of pre-prepared questions. Of course, I'll still need to steer the conversation to ensure we cover the key points for the article, but I'm aiming for a more relaxed, discussion-style interview that will allow his personality to shine through.

question:What do you say next after he answers that first question about what drives his passion for innovation?

answer:He leans back in his chair, a thoughtful expression on his face, and begins to talk about how his passion for innovation stems from his childhood, where he spent hours taking apart and reassembling gadgets and machines to understand how they worked. He mentions how that curiosity has stayed with him throughout his life, driving him to continually seek out new challenges and opportunities to make a meaningful impact. As he finishes speaking, I nod, intrigued by his response, and say, That's fascinating. It's clear that you've always had a strong desire to understand how things work and to improve upon them. Your company's focus on developing cutting-edge AI solutions is a great example of that. Can you tell me more about what drew you to the field of artificial intelligence, and what you believe sets your approach apart from others in the industry?