Handle parallel test site-to-site changes

2021-11-24 04:57:10 By : Mr. Gary Zhang

Why this is a growing problem and how to solve it.

Using the same ATE to test multiple devices in parallel can shorten test time and reduce costs, but it requires engineering skills to do this.

Minimizing the test and measurement variation of each device under test (DUT) is a multiphysics problem, and it is becoming more and more important to solve this problem in each new process node and multi-chip package. It needs to synchronize the electrical, mechanical, and thermal aspects of the entire test unit so that the chip manufacturer can ensure that changes are limited to the DUT. This assumption is critical when applying statistically determined test limits that appropriately adapt to local process changes.

The test world is not perfect, which requires consideration of differences in test and measurement environments. Doing this will have a major impact on the quality and reliability of IC products.

Fortunately, test data from each test site can be used to determine these differences. Armed with this knowledge, engineers can adjust their statistical pass/fail standard algorithms accordingly. Therefore, product yield and quality have been improved. This is essential because parallel device testing continues to increase in wafer and unit-level testing. The final product is used in a variety of markets, including safety-critical applications such as data centers and automobiles, and the escape rate of these applications continues to be on the order of 10 ppm or lower.

Unit-level testing includes final testing, burn-in modules, and system-level testing (SLT). However, due to the smaller test interface boards for probe cards and load boards, wafer and final testing have brought more difficult technical challenges.

"Our customers' wafer probe cards are increasingly using multi-site testing," said Keith Schaub, vice president of technology and strategy at Advantest America. "Combine this with the increase in DUT pins of some products (large digital devices), as well as the common problems of probe card flatness and probe tip damage, which become more worrying due to burnout when too much current is applied. ."

Figure 1: The progress of site testing from 1X to 16X. Source: Anne Meixner/Semiconductor Engineering

Mark Kahwati, director of product marketing for Teradyne’s semiconductor testing department, said: “You might drive a truck across this range.” “Some applications are still single-site. Then consider controlling for car safety. Airbag controllers, airbag controllers, and ABS controllers. You can see from 4 stations to 8 to 12 stations. Then, due to the relatively small number of pins in the car, the number of stations is close to 64 parallel stations. even more."

Although the same economic factors have driven the increase in the number of devices tested in parallel, these numbers may vary greatly depending on the industry sector and device type (see Figure 2 below).

Figure 2: The number of sites inserted in each test divided by industry and equipment type. Source: Teradyne

In parallel testing, every effort is made to minimize the differences in ATE and related test hardware between each test site. With the latest ATE, the supplier provides new functions that support multi-site testing, and pays more attention to reducing the contribution of test hardware. Analog test measurements need to be more careful when designing the path from the automatic test equipment (ATE) hardware to the DUT, but the ATE instrument can be calibrated to account for the differences in these paths.

Nevertheless, differences still exist. These differences are important when applying statistics-based outlier detection techniques. The engineering teams of Texas Instruments, AMS AG, and Skyworks Solutions documented the impact of the differences between the sites. In their 2015 DATA seminar paper, engineers from Skyworks Solutions and Galaxy Semiconductor explained why it is important:

"It is therefore logical to assume that adjacent equipment columns or rows should show almost the same data distribution. However," they wrote, "testers with multiple test site hardware components will display the data from one test site to another. System changes at the test site...Although every effort is made to ensure that the test hardware is consistent from one part to another, there are often measurable deviations. These deviations can and do cause the statistics behind the NNR value Changes occur. Because these deviations are consistent and predictable, they can be managed by the linear offset applied to the measurement."

Test limits based on statistical techniques have become a common tool in the toolbox of product engineers. This type of technology inherently assumes that all chips/units have the same measurement environment. Therefore, when testing devices in parallel, the engineering team first focused on realizing this hypothesis.

Reduce the difference between test unit sites Any measurement system has sources of error. For semiconductor device testing, the signal path and power path between ATE and DUT need to be considered. In each hardware device and connection, there is a tolerance for each measurement parameter. For example, edge placement accuracy represents the timing tolerance of a pin electronic card. These tolerances are accumulated between the DUT pin/pad and the path of the ATE instrument.

Figure 3: Contribution to measurement error in the test path. Source: Anne Meixner/Semiconductor Engineering

For first-order understanding, the combination of the physical area of ​​the device board/probe head and the number of pins of the device constitutes a physically possible parallel quantity. Next, you need to understand the mechanical, thermal, and electrical properties of the test unit, because all these properties can cause errors.

Reducing these contributions to measurement errors is in line with the overall goal of having high-precision test settings. Multiple sites present some unique challenges to meet the equivalence between sites. The engineering team needs:

"In wafer testing, there are many items that will affect field changes-mechanical aspects, how the probe touches the pad, the contamination on the pad or probe, and the temperature change of the entire wafer/chuck," Technical Account Manager Darren James said. Onto Innovation's product expert. "On the electrical side, if resources are shared between sites, the design and layout of the interface and probe card are particularly important to provide good impedance matching of the site/pin. Interface design also affects crosstalk and leakage."

From the perspective of packaging and testing, George Harris, vice president of global testing services at Amkor Technology, pointed out several common causes of differences in testing between sites:

"It's best to design and characterize the production test environment based on product specifications," Harris said. "Even for fairly simple products that drive the test environment, where many sites are tested in parallel or under pressure, there may be differences in power distribution, as are complex SoCs."

Identifying and processing site-to-site change testing spans multiple processes as it moves to the left and to the right. Therefore, changes need to be handled in the context of other processes. For example, during testing, the engineering team needs to determine the offset-based site-to-site changes they can respond to. In contrast, product engineering teams may need to consider site-to-site differences when applying their pass/fail criteria.

Greg Prewitt, solution director of PDF Solutions Exensio, said: "The engineering team needs to have a test process control system. When problems such as site-to-site differences are detected, they can analyze to help find the root cause of the differences." "The control system needs to be able to Alerts/alerts are issued quickly so that the team can take action to resolve the situation before it is necessary to scrap the material. Some best practices include automated responses, such as cleaning probes, or activating out of control action plan (OCAP) processes, which in turn need to be coordinated with Manufacturing Execution System​​(MES) is integrated to automatically suspend suspicious batches. ”

When parallelism equals the landing of the entire wafer, engineers need to consider more advanced statistics when they appear. For example, consider the promotion of smart card devices at 4K sites in wafer testing.

Ed Seng, product manager of Teradyne's digital department, said: "The large probes used in so many sites will pose a challenge to the temperature change of the entire chuck. If not managed, it may affect the measurement of the device temperature sensor." Compared to single-stepping a single chip, more site-to-site associations must be made at these high counts, which relies on more data."

By comparing 4K sites, correlation analysis becomes much more complicated than 4 or 8 sites.

So how to analyze site-to-site variation? Engineers can use instrument R&R technology to evaluate repeatability and reproducibility across multiple sites. For parallelism from 2X to 16X, most statistical software packages (such as JMP, R) can easily analyze site-to-site changes.

The factory can respond to tester hardware differences that require preventive maintenance. However, the slight differences in the signal path from the instrument to the DUT pads/pins add up. The latest ATE model is designed to minimize such differences. In addition, the design of test interface boards (such as probe cards and load boards) must have expertise in PCB technology to minimize differences.

But in wafer-level and unit-level test plants, the reality is that there are a large number of older ATEs. Therefore, the latest products may be tested on older devices. In turn, this can cause differences in site-to-site test results. If the difference is small and the test process is well controlled (ie Cpk is greater than or equal to 1.33), the impact on the yield and quality of the device will be negligible.

When applied to sensitive analog measurement and the application of statistical outlier detection test standards, the definition of negligible changes.

Outlier detection tests range from simple partial average tests (PAT) to complex nearest neighbor residual (NNR) tests. When needed, these testing techniques based on data analysis can adapt to observed changes between sites. In fact, it has become a necessity, as shown by two examples of how engineers adapt to it. The first example looks at RF testing and PAT. The second one looks at IDDQ wafer testing and NNR.

"For RF equipment, we encountered a similar problem, that is, testing equipment at four sites. We have one site, and its test results are statistically different from other sites. Using radio frequency, it is difficult to match four sites well. The RF performance characteristics of four sockets, four contactors and four sets of components will be different," said Jeff Roehr, IEEE senior member and 40-year test veteran. "If we don't take this into account, our test data will be very widely distributed, which makes it difficult to see outliers. Over time, we learned that we must analyze test data on a per-site basis. In fact, , We ran four sets of software at the same time during PAT."

As the number of devices reached hundreds to thousands, engineers established PAT and dynamic PAT limits. In a smaller statistical group of about 25 to 40, such as the statistical group used for Z-PAT and NNR, the impact of the test hardware site-to-site becomes more obvious. Especially for sensitive analog measurements, ignoring the impact may cause good chips to fail and pass bad chips.

In the past ten years, several papers describing outlier detection techniques pointed out that differences between test hardware sites can affect the ability to accurately distinguish between good and bad chips. A 2016 paper from Texas Instruments pointed out that when applying NNR technology, site-to-site changes need to be considered. A 2018 AMS AG paper on mixed-signal IC adaptive testing included site-to-site changes in its dynamic PAT constraints.

In the 2015 DATA seminar paper, engineers from Skyworks Solutions and Galaxy Semiconductor proposed a method to offset site deviation when applying NNR. For each test measurement, they shared a technique for calculating deviations at each site in order to illustrate their technical assumptions 4X testing and a test called ACB22. The calculation is as follows:

Applying the resulting site deviation to the NNR limit can more accurately distinguish good chips from bad chips.

Conclusion With the continuing cost pressures of parallel testing of semiconductor devices, comes the engineering work to create a measurement environment so that each site is equivalent.

"The economic motivation for higher multi-site testing still exists," said Teradyne's Seng. Teradyne’s Seng said: “Like recent generations, there are the same types of multi-site challenges, but they will continue to evolve to the next level of technical complexity.” “Most of the challenges are in the field of device interfaces, starting from testers. The instrumentation interface board (DIB) is connected to the device connection. The best test systems will handle all other multi-site factors and can implement high-multi-site test solutions quickly and easily."

Nevertheless, not all engineers can test their products on the best test systems. For products that are tested in parallel, they need to use the test equipment in the factory to manage the current products. This requires them to design as much as possible to reduce the differences between sites in the test process, and requires them to respond to deviations related to the actual situation on the factory floor. In addition, when product engineers use statistical-based pass/fail test limits, they need to consider the inherent differences between sites. Fortunately, test data can be used to discern differences between test hardware and DUT contributors.

Parallel test execution reduces overall test costs. However, the simplicity in the diagrams for testing the four units belies the engineering work behind it.

Related stories Geospatial outlier detection uses location to find defects on the wafer.

The average test of automotive IC parts is not good enough. Advanced node chips and packages require additional inspection, analysis, and time, all of which increase costs.

Chasing test escape data analysis in IC manufacturing can greatly improve reliability, but the cost trade-off is complicated.

The need for successful adaptive testing to improve quality at a reasonable cost is driving major changes in the testing process.

Name* (Note: This name will be displayed publicly)

Email* (this will not be displayed publicly)

Abstraction is the key to custom processor design and verification, but defining the correct language and tool flow is a work in progress.

Some market segments are normalizing, and other market segments may be affected in 2022.

Why are cyber attacks on the IC supply chain so difficult to prevent.

Higher density interconnections will enable faster data movement, but there is more than one way to achieve this goal.

Large-scale expansion campaigns target various chips, but export controls limit the growth of the lead.

With the development of SiC to higher voltages, BEV users can get faster charging, longer cruising range and lower system costs

Five process nodes in four years, high NA EUV, 3D-IC, small chip, hybrid bonding, etc.

Imec's senior vice president has conducted in-depth research on GAA FETs, interconnects, chiplets and 3D packaging.

Abstraction is the key to custom processor design and verification, but defining the correct language and tool flow is a work in progress.

From a design perspective, some things will get better, while others will get worse.

Even though the technology has been deployed, the rules are still being developed.

Processing data in place can reduce a lot of data movement, but this technology has not appeared until now.

Some market segments are normalizing, and other market segments may be affected in 2022.