KEYNOTE PRESENTATION Prognostics and Health Monitoring of Electronic Systems Pradeep Lall(1), Ryan Lowe(I), Kai Goebel(2) (I)Auburn University Department of Mechanical Engineering NSF Center for Advanced Vehicle and Extreme Environment Electronics (CAVE3) Auburn, AL
36849
(2�ASA Ames Research Center, Moffett Field, CA Tele:
+
94035
1 (334) 844-3424
E-mail:
[email protected]
2009a,b,Constable 1992,2001].
Abstract Structural damage to BGA interconnects incurred during vibration testing has been monitored in the pre-failure space using resistance spectroscopy based state space vectors,
rate
of
change
of
the
state
acceleration of the state variable. intended
for
condition
variable,
and
The technique is
monitoring in high reliability
applications where the knowledge of impending failure is critical and the risks in terms of loss-of-functionality are too high to bear.
Future state of the system has been
estimated based on a second order Kalman Filter model and a Bayesian Framework. The measured state variable has been related to the underlying interconnect damage in the form of inelastic strain energy density. Performance of
the
during
prognostication the
vibration
health
test
management algorithm
has
been
performance evaluation metrics. been
demonstrated
on
been
correlated
leadfree with
using
The methodology has area-array
assemblies subjected to vibration. have
quantified
electronic
Model predictions
experimental
data.
The
presented approach is applicable to functional systems where corner interconnects in area-array packages may be often redundant. Prognostic metrics including a-A metric, sample standard deviation,
mean square error,
mean
absolute percentage error, average bias, relative accuracy, and cumulative relative accuracy have been used to assess the performance of the damage proxies.
The presented
approach enables the estimation of residual life based on level of risk averseness.
1.
in
primarily
electronics
focuses
on
high
damage
reliability diagnosis
involving built in self test (BIST) to monitor for failure [Steininger
2006].
1999,
Harris
2002,
Hashempour
2004,
Suthar
Damage diagnosis typically focuses on reactive
failure detection and provides limited to no insight into the system reliability and residual life. Previously damage initiation, damage progression, and residual life in the pre-failure
space
has
been
correlated
with
micro
structural damage based proxies, feature vectors based on time, spectral and joint time-frequency characteristics of electronics [Lall
2004a•d, 2005a•b, 2006a•f, 2007"'0, 2008a•f].
Precise resistance measurements based on the resistance spectroscopy
tracking, and self separation. Complex electrical power systems
(EPS)
method
which
broadly
comprise
of
energy
generation, energy storage, power distribution, and power management, have a major impact on the operational availability,
and
reliability
of
electronic
systems.
Technology trends in evolution of avionics systems point towards more electric aircraft [Downes
2007]
and the
prevalent use of power semiconductor devices in future aircraft
and
space
platforms.
Advanced
health
management techniques for electrical power systems and avionics
systems
are
required
to
maintainability,
reliability,
meet
and
the
safety,
supportability
requirements of aeronautics and space systems. Current health
management
techniques
in
EPS
and
avionic
systems provide very-limited or no-visibility into health of power electronics, packaging to predict impending failures.
[McCann
Shiroishi
Marko1996,
2005,
Schauz
1996,
1997].
Maintenance corrective
has
evolved
maintenance
preventive
over
to
maintenance.
the
years
performing
Future
from
time-based
improvements
in
reduction of system downtime require emphasis on early detection
of
development
degradation of
mechanisms.
prognostics
and
Incentive
health
for
management
methodologies has been provided by need for reduction in operation and maintenance process costs [Jarrell catalyzed
management
applications
aircraft control and navigation, flight path prediction and
2002].
Advances in sensor technology and failure analysis have
Introduction
Health
Avionics systems require
ultra-high reliability to fulfill critical roles in autonomous
has
been
used
to
monitor
interconnects for damage and prognosticate failure [Lall
978-1-4577-0106-1/11/$26.00 ©20111EEE
-
a
broadening
prognostication
of
application
systems
to
scope
include
for large
electromechanical systems such as aircraft, helicopters, ships, power plants,
and many industrial operations.
Current PHM application areas include, fatigue crack damage in mechanical structures such as those in aircraft [Munns
2000],
surface
infrastructure [Chang
2005] and power plants Kalman
ships
filtering
[Jarrell is
a
[Baldwin
2002],
civil
railway structures [Barke
2003],
2002]. recursive
algorithm
that
estimates the true state of a system based on noisy measurements [Kalman
1960,
Zarchan
2000].
Previously,
the Kalman Filter has been used for navigation [Bar Shalom
2001],
economic forecasting [Solomou
and online system identification [Banyasz
1992].
navigation examples include tracking [Herring
1117
1998], Typical
1974],
-
2011 12th. Int. Conf. on Thermal, Mechanical and Multiphysics Simulation and Experiments in Microelectronics and Microsystems, EuroSimE 2011
ground navigation [Bevly 2007], altitude and heading
range of 64 to 676 110, pitch sizes are in the range of
reference [Hayward 1997], auto pilots [Gueler 1989],
O.Smm to Imm, and package sizes are in the range of
dynamic
positioning
[Balchen
1980],
GPS/INS/IMU
6mm to 27mm (Table 2). The package parameters of this
guidance [Kim 2003]. Application domains include GPS,
board are shown in Table 2.
missiles, satellites, aircraft, air traffic control, and ships.
the test board B is shown in Figure 3.
Representative samples of
The ability of a Kalman filter to smooth noisy data measurements is utilized in gyros, accelerometers, radars, and odometers. Prognostication of failure using Kalman filtering has been demonstrated in steel bands and aircraft power generators [Batzel 2009, Swanson 2000, 2001]. Numerous
applications
algorithms using more
in
prognostics
also
exist
for
advanced filtering algorithms,
known as particle filters. The state of charge of a battery was estimated and remaining useful life was predicted in b [Saha 2009a, ]. Use of Kalman Filtering for prognostication of electronic reliability based on the underlying damage mechanics is new.
In this paper, a
prognostic and health monitoring capability for electrical components based on changes in resistance has been
Figure 1: Test Board A
presented. Failure modeling of BGA interconnects is combined with Kalman filtering for plastic strain state estimation and a Bayesian framework for PHM. 2.
Two
test
boards
were
used
for
experimental
27!Tmt, 676 I/O PBGA
8-packages on one side of the test board. All packages on Interconnect
the
technologies
same
interconnect
studied
include,
type.
lead solder (63Sn37Pb), high lead solder (SnlOPb90) and Table 1 shows the
10 nun.
package parameters for test board A. A representative Figure 1.
280
I/O Flex
15 nUll, 196 I/O PBGA
144 I/O Tape Array
. ... ... . .. .. . ...... ... . ..... ...... .... . ... . .. . .. ...... .. ... . .. . . .. ..... ..... ._... ... ..... .. . .............. .;;.. , .. "... ........ . _ .... . .......... ...... . .. ......... ............. ..... ..............
copper
board with each package numbered Ul - U8 is shown in
iii:::::::::::
10 Imn.
.................... ................... ••••••••••• •• M •••• ................... .... ........ ....
reinforced solder column grid array(CCGA), Eutectic tin SAC30S solder (Sn3AgO.SCu).
. .............
••••••••••••
ceramic packages with 400 110 each. Each test board has have
������im�
::::::::::::
different configurations. The boards have daisy-chained
test-assembly
. .... __ ....
............ ............ •••••••••••• ............ ............ •••••••••••• ••••••••••••
measurements. Test board A, was manufactured in four
a
..............
::::::::::::
Test-Vehicle
. .......... . . · · ........ . · · · • · • ·
. . . .... . . .... . • •••• • . .... . • • ........
........ . . ..
• ••••••••• . . •• ••
. . . • . • .
••
•• •• •• •• •• •• ••••••••• • ••••••••••
••
: .......... : 71lliII,
BGA
84110
CABGA
6 nUll,
64 I/O
Tave Arrav
Figure 2: Interconnect array configuration for Test Board B.
Table 1: Package Architectures for Test Board A Parameter 0 0-
.D
� If: E �� t:
�� O
U-� u c55 c U(6
uc7j
� t:I
....
.g
=
t:I r./J o::l
t:l o::l .....
:::l
��
t:lo o::l e/)
U� M t:
r./J
Length(mm)
21
21
21
21
Width(mm)
21
21
21
21
Thickness
2.4
2.4
2.4
2.4
(mm) 11O
400
400
400
400
Pitch(mm)
1
1
1
1
Ball Dia(mm)
0.6
0.6
0.6
0.6
Joint Height
2 mm
0.6
0.6mm
0.6mm
mm
Figure 3: Test Board -B
Test Board-B includes package architectures such as, plastic ball-grid arrays, chip-array ball-grid arrays, tape array ball-grid arrays, and flex-substrate ball-grid arrays (Figure 3). The experimental matrix has ball counts in the
-
2117 -
2011 12th. Int. Con! on Thermal, Mechanical and Multiphysics Simulation and Experiments in Microelectronics and Microsystems, EuroSimE 2011
Table 2: Package Architectures used for Test Board B.
phase dependency has been eliminated by using a second phase-sensitive detector. The signal has been multiplied
i;'
E t: E "" '" Q) tl.
I
60
1l c: � 40
failure.
50
5
4
Board-B, PBGA676,127kHz)
shift measurements of a package versus number of shock
�1l
3
Figure 16: Phase shift as a lead indicator of failure (Test
Figure 15 shows the confidence value based on phase
�
2
iii Q) Cl:
Time [Hours]
Figure 14: Repeatability of phase shift measurement on
\
�
�
----
10
10
�
3 Time
[Hrs]
-----�
4
'\
5
6
Figure 18: Raw resistance data. The data used as a input data vector is shown in the brackets The failure criteria for resistance change outlined in
input frequency. In general it has been noticed that the
JESD22-B103, and IPCSM785 for the number, duration,
correlation between the degradation in confidence value
and severity of intermittent events is used as the definition
and
of failure. It should be noted that the smaller step
increase in
resistance is better at higher
input
frequencies.
increases of 0.05 n during the first 90 minutes of the test
-
6/17
-
2011 12th. into Con! on Thermal, Mechanical and Multiphysics Simulation and Experiments in Microelectronics and Microsystems, EuroSimE 2011
Table 3' Anand's Constants for SAC305
are experimental noise which can be reproduced by motion of the system connections during shock and
So QIK
45.9MPa
vibration. Resistance data two-hours after the initiation of the test till failure has been studied for the construction of
A
5.87e6 lIsec
feature vector for identification of impending failure. J _1_ ..l _1 1-1- i -I J _1_ ..l _I 1-1- i J _1_ ..1 _
I J I J
�
2
M
0.0942
ho
9350MPa
n
0.015
;:--" __ --::: ""'" L _::;;: _ : _ ::: : _ _= _ -tEE _._ �E13
_"""" ...."T __--.... ..,... ___ 3.1 -
�
�
I I 3.125 -- 1---1--I I I
I
Table
3.12 ----I I I I � 3.115 -- ---, ---1--I I I .� 3.11 - - � - - _1--I I I :{l '" I 3.105 --
I
-�-
£.
:
4
7460 11K
a
l.5
s
58.3MPa
shows
the
dimensional
parameters
for
the
undeformed geometry of a typical solder ball based on the manufactures data sheet.
;;
Previous studies have shown
that tensile stress in the out-of-plane z-direction is the primary
stress
during
the
shock
test
in
interconnects Darveaux 2006, Chong 2006]. interconnect 2.5
deformation
during
the
the
solder
The solder
shock
test
was
simulated using non-linear finite elements by constraining
4.5
Figure 19: Zoomed view of resistance data between 2 hrs and failure
the solder interconnect along the bottom of the joint, and applying a displacement load on the top (Figure 20).
A subset of the resistance data has been used since field
Table 4 Undefiormed geometry
data will often involve electronic assemblies with accrued damage and not involve pristine assemblies.
Figure 19
shows a zoomed view of the input data highlighting the experimental noise between two hours and failure. The experimental noise is due in part to the challenges with overcoming the variance in contact resistance in the
Parameter
0f
soIder baII Specification
Solder ball diameter (mm)
0.63
Solder ball land
0.45
(mm, board and package) Solder ball height after reflow (mm)
0.48
presence of transient dynamic motion in shock or steady state vibration. Step changes in the resistance data can be seen at 2.8 and 4.9 hours respectively.
However, the
distinctive increase of about 25 mn during the vibration test
is
easily
experimental
discernable
noise.
even
in
Immediately
the
presence
of
before
failure
the
resistance increase is approximately exponential in nature. The change in resistance is attributed to change in geometry, since the resistivity of the solder interconnect is
Figure 20: Constraints on solder ball for FEM simulation
expected to stay constant. Change in trace geometry is the basis of operation for traditional strain gages and can be explained in a cylindrical conductor by
R
=
pL/A,
where R is the resistance of the conductor, p is the material property resistivity, L is length and A is the cross sectional area. By logarithmically differentiating both sides, and assuming linear elastic properties a relation between
dR
=
strain
and
ROEa (1 + 2v),
resistance
Ro is the initial resistance,
v
can
be
derived
as
where dR is the change in resistance, Ca
is the elastic axial strain and
is the Poisson ratio. Since the material properties and
geometry of a solder ball are non-linear a finite element simulation was used to map the change in resistance of an interconnect interconnect
to
the
was
state
of
feeling.
plastic The
strain
that
simulation
the Figure 21: Meshed model of solder ball
was
implemented in ANSYSTM Version 12 using Anand's Viscoplasticity and VISC0107 elements. The Anand's constants used for the simulation are shown in Table 3.
Resistance of the solder interconnect was computed by converting the VISOC107 elements to SOLID5 elements after intermediate steps in the deformation. A steady state conductance simulation was run using the deformed
-
71 1 7
-
2011 12th. Int. Con! on Thermal. Mechanical and Multiphysics Simulation and Experiments in Microelectronics and Microsystems. EuroSimE 2011
geometry after each sub-step. Using the built in macro command GMATRIX the conductance of the solder ball in the deformed state could be calculated. The conductance is the inverse of the resistance. The meshed geometry before deformation can be seen in Figure 21, while the deformed geometry can be seen in Figure 22. Deformation was applied to the solder joint at a specified strain rate of 1 sec-I typical of a shock test. An example of this mapping is shown in Figure 23. om-I JVI aU TlU-l.19" 011% -.$""-4))
account for changes in resistance of the entire package. This was achieved by assuming that every interconnect feels the same strain. Therefore the critical resistance is multiplied by the number of 110 in the package, i.e. 676 for the PBGA 676 to obtain the overall critical resistance value (676x5xlO-5 n 3.38xlO-2n). =
7.
PHM Framework
The strain-resistance relationships have been used to correlate the measured feature vector with the underlying damage state of the system. Feature vectors monitoring system damage have been constructed based on the sensor output (Figure 24). The feature vector is an input into the PHM algorithms. Feature Vector
PHM
Algorithm
Figure 24: Flowchart for PHM framework
Figure 22: Deformed and undeformed geometry of solder ball
"E
1
. Q) E
�
I I I I I - - - -l- - - - -1- - - - --t - - - -I- - - - -1I I I I I
-1 I
I
I
Time [Hrl
:::I
-
Figure 10-
4
2
° 10 Scale Factor
10.
2
'
10
10
10
6
Figure 45: Variation in the sum of beta calculation for variations in tunable the process noise parameter
critical value of state variable can severely hurt the performance of the PHM algorithm.
A physics-based
understanding of the degradation mechanism and its relationship
to
system
performance
is
critical
for
implementation of the PHM algorithm. Cumulative beta of the process is less sensitive to process noise and therefore
was
magnitude.
varied
over
a
number
of
orders
of
An incorrect selection of either critical
threshold for state variable or the process noise will have adverse effect on the performance of the PHM
algorithm.
The practical results of predicting RUL is to make critical decisions about future use and replacement of a component can be justified using statistics.
In an ultra
high reliability system, a critical decision is whether to Assume that it takes I-hour to
order and receive a replacement component from the Given the level of mission criticality of the
application the maximum acceptable probability of failure may be allowed at no higher than 1%. The following . calculatIOn can be made to determine when to order the replacement part and schedule downtime for maintenance.
to,de,
=
Where
RULpred;ct;on crRUL
-
2.576cr RUL
-
(21)
tleadtimc
is the standard deviation of the remaining
useful life, and tleadltme is the lead time for receiving the component after placement of the order.
This equation
implemented on the data for the vibration test is shown in Figure 46.
The 2.576 parameter indicates the t-statistic
for 99% confidence.
to
order
replacement
component
calculation vs time
13. Summary and Conclusions A framework for prognostication of area-array electronics resistance spectroscopy measurements, Kalman Filtering and Bayesian PHM framework.
The measured state
variable has been related to the underlying damage state by correlating the resistance change to the plastic strain accrued in interconnects using non-linear finite element analysis. The strain-resistance relationship has been used to define the critical resistance failure threshold for the component.
The Kalman filter was used to estimate the
state variable, rate of change of the state variable, acceleration of the state variable and construct a feature vector.
The estimated state-space parameters were used
to extrapolate the feature vector into the future and predict the failure threshold.
decisions. In the Bayesian framework used in this paper
warehouse.
Time
the time-to-failure at which the feature vector will cross
12. Risk-Based Decision Making
replace a component.
46:
has been developed based on state-space vectors from
The sensitivity study shows that underestimating the
an
I
-3 �2--�2�.5�--�3--'-3�.�5--�4'---4�.�5--�5�---5.L5�
E
1
I
-2: - - - - � - ---:- - - - � - - - - � - ---:- - - - � - - - - �-
j!l3 ,!8 '02