Equipment
Reliability Institute - your reliability newsletter
November, 2000 |
|
Hello,
readers! We hope you will enjoy reading this Fall 2000 issue of
our Reliability Quarterly. Well, it's supposed to be quarterly.
For various reasons we've missed a few issues. We want to introduce
two contributors to this one: Dan Conine of Sheboygan, Wisconsin
and Kirk Gray of Louisville (pronounced lewisville not looieville),
Colorado. Brief author biographies are posted at the ends of the
articles.
Wayne Tustin
|
*******************************
|
Doing
Good Work
by Dan Conine
In Zen and
the Art of Motorcycle Maintenance, the author talks extensively
about quality, and how to determine its attributes; how to know
quality in the world around us, and what parts of life are considered
quality of life. In recent years, manufacturers around the world
have been actively pursuing quality programs in order to lay claim
to building quality products. These programs come under the guise
of 'zero-defect', 'six-sigma', 'ISO-XXXX', etc. The goal of all
of these programs has been to improve the reliabillity of products.
This reliability is usually pre-defined as some finite number of
measurements which can be made on the product, thus quantifying
the output of a particular production line as based on the design
intent.
What this means
is: you can make a product that is totally useless crap, as long
as it is consistently useless and crappy. The entire 'quality' myth
is based on the assumption that quality is quantifiable in a finite
number of steps. This may be true, but good work is an infinite
process which cannot end. By the same token, it is infinitely simpler
than a quality program. Good work cannot be completely measured.
It lies in the heart of the lover of that particular product. Not
in the mind of an inspector following a blueprint, but in the soul
of the user.
What makes a
Stradivarius ? Not quality, but good work. It cannot be measured
except by the ears of a lover of the music, and only when played
by a lover of that same music. It is action to be lived in, not
looked at. The commonest example of good work in modern times are
the software programmers who live in their work for days at a time.
The world tries to quantify and compare one software program to
another by counting bugs, or tallying accounts, but the good work
is done in the wee hours of the night, and it is appreciated by
a dedicated user who brings the code to life, knowing he is performing
feats which may not be remembered by posterity, but by his own soul.
The same relationship
exists in many places, but fewer and farther between than ever in
our history. This relationship between producer and user is the
basis of good work. It can easily be confused with 'Flow', that
feeling by a single person that they are part of the bigger picture,
and fit well with it. This can also be mixed with 'Inspiration',
sometimes felt as 'divine'. But these things fail to reach the level
of good work because they fail the relationship between producer
and user. In good work, the user may even be the same person that
produced the work, but the relationship is still there. The producer
of the work is compelled to embellish it with details: cleaning
all the cracks and crevices, removing sharp edges, streamlining
the code, pulling the weeds. The user cannot use the product without
appreciating the usefullness of the thing: the reliability of a
program, the noiselessness of an automobile, the responsiveness
of a control, the cleanliness of the food.
The failing
of a quality program to achieve good work lies in the inherent design
of any quality program. A quality program is designed to allow the
production of things by replaceable 'units' without the knowledge
and skill required of a good work producer. Quality is, at most,
reduced to 'acceptable by the average user', and at least, to 'tolerable
by some users'. In the case of Microsoft, it is reduced to 'sometimes
useable by disgruntled, enslaved users'.
This reduction
to 'acceptability levels' has permeated our society, not only in
the production of things, but also in education, food, transportation,
and philosophy. We no longer seek out great thinkers who expand
the meaning of words and their uses, but rather, we only accept
works that are 'peer reviewed and published'. We no longer grow
our own vegetables and seek out the wild fruits on fencerows, we
expect any and all of the sweetest, purest, insect-free foods to
be in the supermarket. We no longer buy or make tools that feel
good in our hands, now they must be safety-certified, and mass-produced
to fit 3 sigma of the population who is interested in a hobby.
Good work is
not taught in schools. To the contrary: if an engineer has a tendency
toward perfection in a product, they are criticized for delaying
a production schedule and 'over-engineering' something. The goal
is always to make a quick profit in order to continue on in business.
Pursuit of good work in a product is not even thought of as a goal
in itself. Quality is looked at from the profitability/ROI point
of view: How much quality vs the cost of implementing the program?
If you want the best motorcycle, you find a lover of motorcycles
and ask them what should be done. A survey of motorcyle users yields
too many compromises in size, price, and features. A motorcyle lover
that smooths out the sharp edges, fits pieces to exacting tolerance,
and takes care of rattles and leaks knows where to find the problems
that a loving user will appreciate being absent.
The high cost
of this investment is mainly in two areas: ego and time. The inventor
of a thing must put aside the notion that he can know all about
something, and accept criticism. The production line must slow down
enough to allow the producers of the product to get to know each
part. Just as a good doctor knows each patient grows differently,
each part on a machine, or program, or garden must be seen in its
own light. Defects only show up the same way twice if they are designed
that way, and if they are designed to be that way, a quality program
will not look for them.
Good work is
the intentional complication of a task. Quality measurement and
control is the intentional simplification and specialization of
the same task. To produce a good chair, you carefully cut, sand
, and polish it until you are done finding things to sand and polish.
To produce a quality chair, you set up a system which makes as few
cuts, the least number of sanding requirements, and a finite number
of steps to completion. The quality chair may be used, accepted,
and may even last a finite period of time. The good chair will be
loved, traded, gifted, repaired, and eventually worn into kindling.
Dan Conine
wrote "Doing
Good Work",
later in this issue. Dan's firm, Product Discovery, is marketing
and licensing the patents of its founder, Gregory R. Brotz. See
www.invedyne.com on the Web.
You can e-mail Dan at dan@productdiscovery.com.
|
*******************************
|
Electronics
Testing into the 21st Century:
Success in Test Is in Capabilities, Not Specifications
by Kirk Gray and Wayne Tustin
Development
of electronics, with increasing shorter market windows, and the
rapid pace of electronics invasion into almost every appliance and
machine manufactured, is requiring that electronics be reliable
and mature at market introduction. There may not be enough time
in the market to improve a poor design. Your customers are not willing
to risk it, to take another risk to purchasing another electronic
product from you, and they will probable tell others about your
products poor quality. Yet, today there are extremely efficient
and cost effective methods to prevent field failures. These methods
have been around for at least fifteen years. The evidence of the
effectiveness is overwhelming, yet unpublished, because of its effectiveness.
Would you publish methods that reduced your field failures by a
factor of ten, letting all your competitors in on it? Probably not,
and that's why the electronics testing community is still very reluctant
to accept accelerated stress testing and screening as a standard
approach to reliability improvement.
The methods
that we are referring to are called Highly Accelerated Life Test
or HALT and Highly Accelerated Stress Screening or HASS. These are
methods of testing that take a fundamentally different approach.
The difference is in finding the actual, not specified, operation
and destruct limits and then driving those margins to the fundamental
limit of the technology. The fundamental limit of the technology
is the point at which the product margins cannot be extended without
the use of exotic materials or methods. An example is the melting
point of wire insulation or component packaging. Most electronic
components are typically heated to well over 175°C for solder re-flow
and can withstand temperatures much higher. Yet, most testing occurs
at temperatures around operational specifications based on the end-use
environment. Materials in electronics are very capable of operation
in environments (typically -40 °C to +110°C) that are well beyond
most electronic operation specifications (typically 0 °C to +55
°C). Those component and design weak links that limit large operational
margins are the same components and design limits that will have
a very significant effect on field reliability. Only by taking the
products to the limits through increasing steps of stress, investigating
the root cause and understanding the physics of failure, then improving
that weak link, can you make a robust system in the shortest possible
time. A robust system will be capable of short effective intensive,
but safe, environmental screens using combined environments while
being powered and monitored.
Electronics
was, for many years, fragile. Glass tubes, filaments also had inherent
wear-out modes that gave the electronics a limited life. Early solid
state devices had mechanisms that would also cause failures in time,
such as chemical contamination, metallization defects, and packaging
defects that resulted in corrosion and delamination. A large percentage
of these defects were accelerated by high temperature, giving rise
to the successful use of "burn-in" to weed out "infant mortality".
Statistical
prediction in the 1960's and '70's was accepted because designs
at that time were made of mostly discrete components and statistical
estimates the life of a new designs had a reasonable correlation
to the actual MTBF. This was most likely due to, relative to today,
a small number of devices, manufacturers, and manufacturing techniques.
Today, hundreds of new electronic components are introduced to the
market every week, and at the same time hundreds are being taken
off the market. It is no longer possible or reasonable to even attempt
statistical estimates of reliability based on a summation of components
reliability, even if accurate data on current components was available.
It would be virtually impossible to and useless to obtain because
variation in the huge number of applications and end-use conditions.
Today's components
do not have wear-out modes that are within most electronics technologically
useful life. Therefore, the vast majority, if not all, of electronics
failures are due to defects, either in the design or introduced
in manufacturing. The most significant effects on reliability are
caused by unplanned events during manufacture causing a lowering
in the operating margins. It can be an EC, a change in machine operators,
or a change in your vendors manufacturing capabilities that introduces
a decrease in margins and a resulting increase field failures. Engineers
must quit wasting time and resources trying to statistically calculate
estimates of reliability. The future is unpredictable and in electronics,
predictive reliability cannot be done to any accuracy that would
be beneficial to a designer. Only through discovering the real capabilities
and the root causes of the weak links in the design or manufacturing
process, and improving them, is significant improvement in reliability
realized.
The relation
of field stresses and inherent field strength of a product is illustrated
in Fig. 1. The
variation in manufacturing the product is generally much less variable
than the end use environment. Failures occur when the weakest units
are subject to the highest stresses as shown in Fig. 2.
|
Fig. 1
|
 |
 |
Fig. 2
|
It is important
to remember that when a electronic product is manufactured in volume,
there is a distribution of its inherent strength around its original
designed strength. The end-use environment is even more uncontrolled
and has much wider distributions in most cases. No matter how you
specify the end-use conditions, your customers will push those limits.
By developing a robust design, the product can better survive these
extremes. The difference in costs between a robust, well centered
design, and one with design weaknesses is usually very small, if
any. Changing the orientation of a component, location, or using
a more capable component, is very cost effective in the design phase.
It can even be very cost effective after the product has been in
production for some time. An example would be the case of one AcceleRel
Engineering client, we'll call Company A. Company A had a product
with a 5% annualized field return rate. For reliability testing
they had an elevated temperature (90 °F) burn-in room. The product
was powered, but not monitored, for 24 hours. HALT testing found
that the operating limit, also the destruct limit was as low as
15°C above the design specification (35°C). The limit was caused
by one failing component (two per unit). By replacing the component
with the same type component of higher current capabilities, the
operating and destruct limit was move to 90°C. With the new operating
margins, a short HASS process, lasting one hour for two units was
developed. The product was powered and monitored while applying
10gRMS (200 - 2KHz) of random multi-axis vibration and four rapid
thermal transitions of -30°C to +70°C at a rate of 60°C/ minute
(measured on the product). The field return rate dropped to 0.5%.
To no ones surprise, Company A has made this process company confidential,
and has implemented it across most all products in production.
Another example
of how easy it can be to increase an operating margin was again
with client Company A. The UUT had an operating limit of 60°C, with
the small +15 vdc auxiliary power supply inside a RF power supply.
The limiting component was found to be a small regulating diode
located next to a heat sink, but not touching it. By bending over
the component to make contact with the heatsink the operation limit
was raised by 30°C to 90°C. Large margin improvements can be made
sometimes by just repositioning a component. This would not have
been found if the operating limit had not been discovered through
step stress. The product had easily passed its operation specifications
when originally design.
Starting and
continuing a HALT and HASS program is a major commitment of resources
for any company, but the ROI in reduced warranty costs, re-design,
re-work, and lost sales is tremendous. Developing a robust design
using HALT, even if screening cannot be implemented, is extremely
valuable tool for reaching a rapid design maturity and should be
a standard evaluation for all new designs. HASS is a process to
precipitate and detect defects and shifts in margins that end up
being field failures.
It is important
to realize that even though the concept of testing to find limits,
basing the testing on actual material capabilities, not specifications,
and extending those capabilities to the best possible with current
technology is really very simple. HALT will not find every and all
defects, but it will find greater than 95% that cause field failures.
Convincing designers that improving margins beyond what was originally
specified is a difficult task for those who are not experienced
in this new approach. Once the benefits are demonstrated, companies
readily embrace the HALT and HASS processes. It is taking that first
step of finding the limits and following through with changes to
improve them. Designers and companies that have already discovered
that failures above specifications are relevant to the field reliability,
and spend the time and effort to improve the margins, will be the
most successful at meeting the reliability expectations of customers
for the 21st Century.
Kirk Gray
has over 21 years in the electronics manufacturing industry and
the last 11 years in the application of HALT and HASS processes.
Mr.Gray began his career in electronics at the semiconductor, thin-film
processing level and continued to proceed down the electronics assembly
path to reliability testing. He discovered the speed and other benefits
of HALT and HASS processes for reliability when he started a HALT
and HASS team as ESS Process Engineering Manager at StorageTek in
1989. Kirk is Vice Chairman of the Denver Chapter of the IEEE Reliability
Society , Chairman of the IEEE/CPMT Technical Committee 7 on Reliability,
and Registration Chairman for the annual IEEE/CPMT Workshops on
Accelerated Stress Testing (AST) held in the fall each year. If
you would like to contact him, please send an email to gray@equipment-reliability.com
Wayne
Tustin's vibration and shock teaching schedule is posted
at http://www.equipment-reliability.com.
Right now he is "studying up" on the vibration and shock measurement
and testing issues pertaining to computer hard drives, preparing
for a private course. Have you heard about intense noise developmental
testing and production screening of printed wiring boards? It's
an alternate method (to electrodynamic shakers and to pneumatic
repetitive-shock machines) for flexing PWBs during HALT, HASS, ESS,
etc. You can send him an email to tustin@equipment-reliability.com
if you want to learn more. Or you can phone him at 805/564-1260.
He would like to hear from you.
|
|
|
Participate
at ERI News
|
| |
You are invited to send news of reliability-oriented events to collaborate
with ERI's newsletter. Please send an email to the webmaster. |
| |
|
|
Vibration
and Shock courses coming up
|
| |
Wayne Tustin will teach the following short courses in vibration
and shock measurement, analysis, calibration, testing, HALT, ESS
and HASS:
Society
of Automotive Engineers
Portland,
OR
December 4-6, 2000
Troy, MI
April 18-20, 2001
*******
Applied
Technology Institute
College Park,
MD
April 9-12, 2001 (get more information from Wayne Tustin)
*******
ERI
classes
Huntsville,
Alabama, February 20-22, 2001
Hillsboro
(Portland), Oregon, March 20-22, 2001
In addition, Wayne will present a super-concentrated 1-day version
at Grand Rapids, MI,
March 27, 2001.
Details are available from Vibration
Research, phone 616-669-3028 or send an email to
john@vibrationresearch.com
|
| |
|
|
Announcements
|
| |
For
sale Alpha-M Corporation in Dallas, TX, manufacturer of small electrodynamic
shaker systems and related hardware. Owner Bill Crowley is considering
retiring. Call Bill at
972/406-0424 or FAX him at 972/247-0651.
|
| |
|
|
Contact
information
|
| |
ERI - Equipment Reliability Institute
1520 Santa Rosa Av.
Santa Barbara - CA - 93109
Tel/Fax: (805) 564-1260
Wayne
Tustin tustin@equipment-reliability.com
Webmaster
webmaster@equipment-reliability.com
|
| |
|
| |
ERI
News is sent in both html and plain text formats. If you had any
problems reading this newsletter, please let us know. Send an email
to the webmaster,
reporting your difficulties.
If
you do not want to receive ERI's quarterly newsletter, please send
a reply to this message with "remove" as subject.
|
|