Equipment Reliability Institute - your reliability newsletter
November, 2000

Wayne TustinHello, readers! We hope you will enjoy reading this Fall 2000 issue of our Reliability Quarterly. Well, it's supposed to be quarterly. For various reasons we've missed a few issues. We want to introduce two contributors to this one: Dan Conine of Sheboygan, Wisconsin and Kirk Gray of Louisville (pronounced lewisville not looieville), Colorado. Brief author biographies are posted at the ends of the articles.
Wayne Tustin

*******************************

Doing Good Work
by Dan Conine

In Zen and the Art of Motorcycle Maintenance, the author talks extensively about quality, and how to determine its attributes; how to know quality in the world around us, and what parts of life are considered quality of life. In recent years, manufacturers around the world have been actively pursuing quality programs in order to lay claim to building quality products. These programs come under the guise of 'zero-defect', 'six-sigma', 'ISO-XXXX', etc. The goal of all of these programs has been to improve the reliabillity of products. This reliability is usually pre-defined as some finite number of measurements which can be made on the product, thus quantifying the output of a particular production line as based on the design intent.

What this means is: you can make a product that is totally useless crap, as long as it is consistently useless and crappy. The entire 'quality' myth is based on the assumption that quality is quantifiable in a finite number of steps. This may be true, but good work is an infinite process which cannot end. By the same token, it is infinitely simpler than a quality program. Good work cannot be completely measured. It lies in the heart of the lover of that particular product. Not in the mind of an inspector following a blueprint, but in the soul of the user.

What makes a Stradivarius ? Not quality, but good work. It cannot be measured except by the ears of a lover of the music, and only when played by a lover of that same music. It is action to be lived in, not looked at. The commonest example of good work in modern times are the software programmers who live in their work for days at a time. The world tries to quantify and compare one software program to another by counting bugs, or tallying accounts, but the good work is done in the wee hours of the night, and it is appreciated by a dedicated user who brings the code to life, knowing he is performing feats which may not be remembered by posterity, but by his own soul.

The same relationship exists in many places, but fewer and farther between than ever in our history. This relationship between producer and user is the basis of good work. It can easily be confused with 'Flow', that feeling by a single person that they are part of the bigger picture, and fit well with it. This can also be mixed with 'Inspiration', sometimes felt as 'divine'. But these things fail to reach the level of good work because they fail the relationship between producer and user. In good work, the user may even be the same person that produced the work, but the relationship is still there. The producer of the work is compelled to embellish it with details: cleaning all the cracks and crevices, removing sharp edges, streamlining the code, pulling the weeds. The user cannot use the product without appreciating the usefullness of the thing: the reliability of a program, the noiselessness of an automobile, the responsiveness of a control, the cleanliness of the food.

The failing of a quality program to achieve good work lies in the inherent design of any quality program. A quality program is designed to allow the production of things by replaceable 'units' without the knowledge and skill required of a good work producer. Quality is, at most, reduced to 'acceptable by the average user', and at least, to 'tolerable by some users'. In the case of Microsoft, it is reduced to 'sometimes useable by disgruntled, enslaved users'.

This reduction to 'acceptability levels' has permeated our society, not only in the production of things, but also in education, food, transportation, and philosophy. We no longer seek out great thinkers who expand the meaning of words and their uses, but rather, we only accept works that are 'peer reviewed and published'. We no longer grow our own vegetables and seek out the wild fruits on fencerows, we expect any and all of the sweetest, purest, insect-free foods to be in the supermarket. We no longer buy or make tools that feel good in our hands, now they must be safety-certified, and mass-produced to fit 3 sigma of the population who is interested in a hobby.

Good work is not taught in schools. To the contrary: if an engineer has a tendency toward perfection in a product, they are criticized for delaying a production schedule and 'over-engineering' something. The goal is always to make a quick profit in order to continue on in business. Pursuit of good work in a product is not even thought of as a goal in itself. Quality is looked at from the profitability/ROI point of view: How much quality vs the cost of implementing the program? If you want the best motorcycle, you find a lover of motorcycles and ask them what should be done. A survey of motorcyle users yields too many compromises in size, price, and features. A motorcyle lover that smooths out the sharp edges, fits pieces to exacting tolerance, and takes care of rattles and leaks knows where to find the problems that a loving user will appreciate being absent.

The high cost of this investment is mainly in two areas: ego and time. The inventor of a thing must put aside the notion that he can know all about something, and accept criticism. The production line must slow down enough to allow the producers of the product to get to know each part. Just as a good doctor knows each patient grows differently, each part on a machine, or program, or garden must be seen in its own light. Defects only show up the same way twice if they are designed that way, and if they are designed to be that way, a quality program will not look for them.

Good work is the intentional complication of a task. Quality measurement and control is the intentional simplification and specialization of the same task. To produce a good chair, you carefully cut, sand , and polish it until you are done finding things to sand and polish. To produce a quality chair, you set up a system which makes as few cuts, the least number of sanding requirements, and a finite number of steps to completion. The quality chair may be used, accepted, and may even last a finite period of time. The good chair will be loved, traded, gifted, repaired, and eventually worn into kindling.

Dan Conine wrote "Doing Good Work", later in this issue. Dan's firm, Product Discovery, is marketing and licensing the patents of its founder, Gregory R. Brotz. See www.invedyne.com on the Web. You can e-mail Dan at dan@productdiscovery.com.

*******************************

Electronics Testing into the 21st Century:
Success in Test Is in Capabilities, Not Specifications

by Kirk Gray and Wayne Tustin

Development of electronics, with increasing shorter market windows, and the rapid pace of electronics invasion into almost every appliance and machine manufactured, is requiring that electronics be reliable and mature at market introduction. There may not be enough time in the market to improve a poor design. Your customers are not willing to risk it, to take another risk to purchasing another electronic product from you, and they will probable tell others about your products poor quality. Yet, today there are extremely efficient and cost effective methods to prevent field failures. These methods have been around for at least fifteen years. The evidence of the effectiveness is overwhelming, yet unpublished, because of its effectiveness. Would you publish methods that reduced your field failures by a factor of ten, letting all your competitors in on it? Probably not, and that's why the electronics testing community is still very reluctant to accept accelerated stress testing and screening as a standard approach to reliability improvement.

The methods that we are referring to are called Highly Accelerated Life Test or HALT and Highly Accelerated Stress Screening or HASS. These are methods of testing that take a fundamentally different approach. The difference is in finding the actual, not specified, operation and destruct limits and then driving those margins to the fundamental limit of the technology. The fundamental limit of the technology is the point at which the product margins cannot be extended without the use of exotic materials or methods. An example is the melting point of wire insulation or component packaging. Most electronic components are typically heated to well over 175°C for solder re-flow and can withstand temperatures much higher. Yet, most testing occurs at temperatures around operational specifications based on the end-use environment. Materials in electronics are very capable of operation in environments (typically -40 °C to +110°C) that are well beyond most electronic operation specifications (typically 0 °C to +55 °C). Those component and design weak links that limit large operational margins are the same components and design limits that will have a very significant effect on field reliability. Only by taking the products to the limits through increasing steps of stress, investigating the root cause and understanding the physics of failure, then improving that weak link, can you make a robust system in the shortest possible time. A robust system will be capable of short effective intensive, but safe, environmental screens using combined environments while being powered and monitored.

Electronics was, for many years, fragile. Glass tubes, filaments also had inherent wear-out modes that gave the electronics a limited life. Early solid state devices had mechanisms that would also cause failures in time, such as chemical contamination, metallization defects, and packaging defects that resulted in corrosion and delamination. A large percentage of these defects were accelerated by high temperature, giving rise to the successful use of "burn-in" to weed out "infant mortality".

Statistical prediction in the 1960's and '70's was accepted because designs at that time were made of mostly discrete components and statistical estimates the life of a new designs had a reasonable correlation to the actual MTBF. This was most likely due to, relative to today, a small number of devices, manufacturers, and manufacturing techniques. Today, hundreds of new electronic components are introduced to the market every week, and at the same time hundreds are being taken off the market. It is no longer possible or reasonable to even attempt statistical estimates of reliability based on a summation of components reliability, even if accurate data on current components was available. It would be virtually impossible to and useless to obtain because variation in the huge number of applications and end-use conditions.

Today's components do not have wear-out modes that are within most electronics technologically useful life. Therefore, the vast majority, if not all, of electronics failures are due to defects, either in the design or introduced in manufacturing. The most significant effects on reliability are caused by unplanned events during manufacture causing a lowering in the operating margins. It can be an EC, a change in machine operators, or a change in your vendors manufacturing capabilities that introduces a decrease in margins and a resulting increase field failures. Engineers must quit wasting time and resources trying to statistically calculate estimates of reliability. The future is unpredictable and in electronics, predictive reliability cannot be done to any accuracy that would be beneficial to a designer. Only through discovering the real capabilities and the root causes of the weak links in the design or manufacturing process, and improving them, is significant improvement in reliability realized.

The relation of field stresses and inherent field strength of a product is illustrated in Fig. 1. The variation in manufacturing the product is generally much less variable than the end use environment. Failures occur when the weakest units are subject to the highest stresses as shown in Fig. 2.

Fig. 1


Fig. 2

It is important to remember that when a electronic product is manufactured in volume, there is a distribution of its inherent strength around its original designed strength. The end-use environment is even more uncontrolled and has much wider distributions in most cases. No matter how you specify the end-use conditions, your customers will push those limits. By developing a robust design, the product can better survive these extremes. The difference in costs between a robust, well centered design, and one with design weaknesses is usually very small, if any. Changing the orientation of a component, location, or using a more capable component, is very cost effective in the design phase. It can even be very cost effective after the product has been in production for some time. An example would be the case of one AcceleRel Engineering client, we'll call Company A. Company A had a product with a 5% annualized field return rate. For reliability testing they had an elevated temperature (90 °F) burn-in room. The product was powered, but not monitored, for 24 hours. HALT testing found that the operating limit, also the destruct limit was as low as 15°C above the design specification (35°C). The limit was caused by one failing component (two per unit). By replacing the component with the same type component of higher current capabilities, the operating and destruct limit was move to 90°C. With the new operating margins, a short HASS process, lasting one hour for two units was developed. The product was powered and monitored while applying 10gRMS (200 - 2KHz) of random multi-axis vibration and four rapid thermal transitions of -30°C to +70°C at a rate of 60°C/ minute (measured on the product). The field return rate dropped to 0.5%. To no ones surprise, Company A has made this process company confidential, and has implemented it across most all products in production.

Another example of how easy it can be to increase an operating margin was again with client Company A. The UUT had an operating limit of 60°C, with the small +15 vdc auxiliary power supply inside a RF power supply. The limiting component was found to be a small regulating diode located next to a heat sink, but not touching it. By bending over the component to make contact with the heatsink the operation limit was raised by 30°C to 90°C. Large margin improvements can be made sometimes by just repositioning a component. This would not have been found if the operating limit had not been discovered through step stress. The product had easily passed its operation specifications when originally design.

Starting and continuing a HALT and HASS program is a major commitment of resources for any company, but the ROI in reduced warranty costs, re-design, re-work, and lost sales is tremendous. Developing a robust design using HALT, even if screening cannot be implemented, is extremely valuable tool for reaching a rapid design maturity and should be a standard evaluation for all new designs. HASS is a process to precipitate and detect defects and shifts in margins that end up being field failures.

It is important to realize that even though the concept of testing to find limits, basing the testing on actual material capabilities, not specifications, and extending those capabilities to the best possible with current technology is really very simple. HALT will not find every and all defects, but it will find greater than 95% that cause field failures. Convincing designers that improving margins beyond what was originally specified is a difficult task for those who are not experienced in this new approach. Once the benefits are demonstrated, companies readily embrace the HALT and HASS processes. It is taking that first step of finding the limits and following through with changes to improve them. Designers and companies that have already discovered that failures above specifications are relevant to the field reliability, and spend the time and effort to improve the margins, will be the most successful at meeting the reliability expectations of customers for the 21st Century.

Kirk Gray has over 21 years in the electronics manufacturing industry and the last 11 years in the application of HALT and HASS processes. Mr.Gray began his career in electronics at the semiconductor, thin-film processing level and continued to proceed down the electronics assembly path to reliability testing. He discovered the speed and other benefits of HALT and HASS processes for reliability when he started a HALT and HASS team as ESS Process Engineering Manager at StorageTek in 1989. Kirk is Vice Chairman of the Denver Chapter of the IEEE Reliability Society , Chairman of the IEEE/CPMT Technical Committee 7 on Reliability, and Registration Chairman for the annual IEEE/CPMT Workshops on Accelerated Stress Testing (AST) held in the fall each year. If you would like to contact him, please send an email to gray@equipment-reliability.com

Wayne Tustin's vibration and shock teaching schedule is posted at http://www.equipment-reliability.com. Right now he is "studying up" on the vibration and shock measurement and testing issues pertaining to computer hard drives, preparing for a private course. Have you heard about intense noise developmental testing and production screening of printed wiring boards? It's an alternate method (to electrodynamic shakers and to pneumatic repetitive-shock machines) for flexing PWBs during HALT, HASS, ESS, etc. You can send him an email to tustin@equipment-reliability.com if you want to learn more. Or you can phone him at 805/564-1260. He would like to hear from you.


Participate at ERI News

 
You are invited to send news of reliability-oriented events to collaborate with ERI's newsletter. Please send an email to the webmaster.
   
Vibration and Shock courses coming up
 


Wayne Tustin will teach the following short courses in vibration and shock measurement, analysis, calibration, testing, HALT, ESS and HASS:

Society of Automotive Engineers

Portland, OR
December 4-6, 2000

Troy, MI
April 18-20, 2001

*******

Applied Technology Institute

College Park, MD
April 9-12, 2001 (get more information from Wayne Tustin)

*******

ERI classes

Huntsville, Alabama, February 20-22, 2001

Hillsboro (Portland), Oregon, March 20-22, 2001

In addition, Wayne will present a super-concentrated 1-day version at Grand Rapids, MI,
March 27, 2001.
Details are available from Vibration Research, phone 616-669-3028 or send an email to
john@vibrationresearch.com

   
Announcements
 


For sale Alpha-M Corporation in Dallas, TX, manufacturer of small electrodynamic shaker systems and related hardware. Owner Bill Crowley is considering retiring. Call Bill at
972/406-0424 or FAX him at 972/247-0651.

   
Contact information
 


ERI - Equipment Reliability Institute
1520 Santa Rosa Av.
Santa Barbara - CA - 93109

Tel/Fax: (805) 564-1260

Wayne Tustin tustin@equipment-reliability.com

Webmaster webmaster@equipment-reliability.com

   
 

ERI News is sent in both html and plain text formats. If you had any problems reading this newsletter, please let us know. Send an email to the webmaster, reporting your difficulties.

If you do not want to receive ERI's quarterly newsletter, please send a reply to this message with "remove" as subject.