|
Lessons Learned At JPL
by Wayne Tustin
Considerable newspaper and technical publication
coverage was given to an overly-severe March 21, 2000 vibration
test in Room 144 of Building 100 at the Jet Propulsion Laboratory,
Pasadena, California. The over-test caused significant damage
(over $1,000,000) to the High Energy Solar Spectroscopic Imager
(HESSI) satellite built by the University of California at
Berkeley (UCB). A Mishap Investigation Board (MIB) was convened.
The principal source for this article is a prerelease version
of the MIB report, which will eventually appear at http://www.gsfc.nasa.gov.
My intent here is to help readers (who are in
any way involved in testing) appreciate the possibly disastrous
consequences of actions that are taken or not taken in seemingly
routine procedures.
Event Sequence
Earlier on March 21, at 13:39 hours, the spacecraft had passed a
nominally sinusoidal 0.25g survey test, Run #2. (I say nominally
because records later investigated by the MIB showed that there
had been significant waveform distortion.) The spacecraft also passed
a random vibration test at 17:50 hours. Run #9 utilized force limitations
and spectrum notches. (Other records investigated by the MIB showed
that this test also had unusual characteristics.) In retrospect,
both of these tests had revealed symptoms of trouble, symptoms that
unfortunately were ignored on March 21.
The MIB Report mentions aborts on Runs 1, 5,
6, 7 and 8. These are blamed on various overloads, but these overloads
are not explained in the Report. I've been told of solar array panel
rattling, with acceleration peaks causing limit channels to clip.
At 18:13 hours (Run #10) the 13:39 sine survey
test was repeated. The third test of the day, Run #11, was to be
a 7.5g open-loop sine burst vibration test.
Plans called for six bursts at -12 dB or 1.88g
peak, one burst at -6 dB or 3.75g and (after a review of input and
responses) a single burst at full level, 7.5g peak. Unfortunately,
the first -12 dB (1.88g) check, at 18:43 hours, was much too severe.
The test was aborted manually. Records showed that acceleration
reached 21g for 4+ cycles. Solar panel arrays were damaged.
What had happened?
The Board prepared a lengthy list of all possible causes for the
mishap. One by one, most of the possible causes were exonerated.
- Endevco 2271A/A20 accelerometers
- Trig-Tek 1273A charge amplifier · m + p VCP9000
vibration control system
- LDS power amplifier driving the Ling A-249
shaker
- Shaker internals (flexures and bearings and
other possible mechanical and electrical difficulties).
- Spacecraft response instrumentation is not
discussed here ... only the mechanical input to the spacecraft.
Attention soon focused on the oil film slip
plate (magnesium alloy) to which the spacecraft was attached via
an adapter ring which was connected by 24 Kistler 9251A force sensors
(to measure force input to the spacecraft) to an aluminum fixture
plate. (After the event, several of the mounting ring-to-slip plate
bolts were found to be loose. See log at 18:53 hours.) The slip
table, supporting granite block and shaker armature were found to
be misaligned, not parallel. The magnesium slip table had evidently
been rubbing on the granite block for some time, generating considerable
heat. Magnesium had transferred from the underside of the plate
to the granite surface. Considerable evidence from the earlier tests
that day indicated that the resulting stiction (greater than normal
coefficient of static friction) had been present throughout the
day's testing and was the root cause of the mishap.
This author asks: were records of tests from
March 13-17 and from March 20 reviewed? Yes, and there is some evidence
that the stiction problem had existed.(but not been recognized)
before March 21.
Concerning the pressurized oil film slip table,
the question was raised: had oil pressure been turned on? General
agreement: yes, but no documentation and (false economy) no system
interlocks.
MIB found that the granite block itself had
not shifted, but that the shaker body had moved, misaligning the
moving system (consisting of shaker armature, flexures, bearings,
bullnose and slip plate) and creating stiction. Two 1 inch diameter
bolts that had secured the shaker "saddle" to the shaker base were
broken with holes misaligned approximately 0.5 inch. Upon disassembly
of the shaker supporting base, one of the two trunnion support needle
bearings was found to have a broken outer race. Some rollers were
loose; others were missing. Replacement parts will be taken from
a surplus A-249 shaker being shipped from Huntsville, Alabama.
When did the shaker base fail?
I've been told that the shaker had been used in the vertical attitude
for "a long time" prior to March 21. During rotation of the shaker
body into the horizontal attitude, the soon-to-fail (or already
failed) trunnion bearing must have emitted loud noises. Why did
no one hear that noise? Careful realignment of slip plate to shaker
followed that rotation.
How did stiction cause the overtest?
MIB reconstructs the events thus: upon initiating the sine-burst
test, the shaker control computer had (at much reduced level) developed
and stored a drive signal. Unfortunately, that drive signal was
incorrect (much too high) due to slip plate stiction. Excessive
force had been required to obtain the required low level of slip
plate motion. The computer poorly estimated the drive signal which
would be needed for the -12 dB check. Unfortunately, no procedure
required the operator to run the sine burst test before mounting
the spacecraft.
When the -12 dB check occurred, that excessive
force not only overcame stiction but created excessive motion. That's
what damaged the spacecraft.
Evidence of stiction
MIB found evidence of stiction in acceleration vs. time plots taken
during the earlier 0.25g sine sweep, Run #10. Large amplitude "glitches"
occur immediately after the zero velocity points. This author surmises
that the accelerometer signals leading to those plots were available
to test personnel in Room 104 during the several sine sweeps. Subsequent
question brought assurance that an oscilloscope was provided and
is always turned on. Those charged with watching the oscilloscope
blamed distortion on "shaker-spacecraft interaction and electrical
noise". It seems to me they should have been instructed to stop
the test if motion was not sinusoidal.
Observers in Room 100 had noticed a "different"
low frequency sound during the control computer's equalization process,
prior to the earlier random vibration tests (Runs 3 through 9).
MIB found indications that stiction had affected equalization, particularly
in the low frequency, large displacement spectral region. This was
confirmed by reviewing control accelerometer PSD from the self-check
at 18:43 hours. This author now asks: 1. Had no one said to all
present in Shaker Room 100 and Control Room 104, something like
"Listen up, folks. This is an important test upon a very valuable
satellite. If you should observe any anomalies, holler so we can
investigate."? 2. Why did no one shout "Stop the test."?
Schedule requirements
This author asks: Why was the sine-burst test initiated, in the
face of this evidence that all was not right? One of the investigators
opines that the misalignment was not sudden but rather was a degradation
to which the operators had become accustomed. Perhaps. But might
we not ascribe some blame to pressure from above? Tests, coming
late in any program, always seem to commence behind schedule. Page
23 of the MIB Report mentions "tight schedule". Was any individual
afraid to stop the test?
How long had that test crew been working? 18:13
hours is 6:13 pm. 10 hours? Pressure to finish testing so
all could go home to dinner is certainly understandable. Was
there another "hurry up" test inflexibly scheduled for that
facility next morning? One of the investigators told me privately
that testing for 12 hours would not be unusual or unsafe and
that project personnel are used to even longer hours. I've
learned that commercial testing laboratories sometimes work
their people 18 hour shifts. Accident-investigation psychologists
tell us that judgment lessens when people are tired, and that
quite often the people involved in an accident will later
deny having been tired.
Contributing factors
- Misalignment caused the slip table to bind
at low force levels. MIB recommends checks for routinely assessing
the mechanical "health" of shaker and slip table system.
- Test personnel did not know that quality
data was available prior to initiating the sine-burst test. MIB
recommends additional procedure steps, to review such data.
- No facility validation test was performed.
MIB recommends simulating tests before the test article arrives.
- The shaker base failure. This was also identified
as the root cause. MIB recommends refurbishing or replacing the
shaker.
- Too-low an amplitude self-check. MIB recommends
self-checks at appropriate levels.
To MIB's five, I would like to suggest a sixth
and seventh. I'm told there has been considerable test personnel
turnover in the JPL test organization. Much experience has been
lost. Possibly the year 2000 staff had had little formal precautionary
training on this specific shaker system.
Much of the shaker equipment at JPL is 30 or
more years old. In generating vibratory force, shaker internals
and externals gradually deteriorate with each test. Anticipate failure.
Treat your shaker at least as well as you treat your automobile.
Observations
MIB offers nine very thoughtful observations, accompanying each
with recommendations.
Lessons learned
The MIB came up with some excellent steps to prevent recurrence
of these events. All dynamics labs should copy these six points
and post them prominently. Here is Section 10 of the prerelease
report. (I have added numbers and slightly changed wordings.)
- Test facilities must be maintained with
test equipment in good working order. Metrics that assess the
mechanical health of the systems must be developed and tracked.
- "Canned" tests should be developed (and
used periodically) to provide a trended database for test systems'
responses. Any deviations in any system response should be investigated.
- Critical control system response data such
as (a) the transfer function or (b) inverse transfer functions,
and (c) calculated drive voltage must be evaluated real-time during
testing to ensure that they are reasonable and do not indicate
system maladies.
- A facility validation test should be done
for each planned test series, before flight or critical hardware
is mounted. This should represent actual test conditions.
- Run self-checks that provide a representative
response for the forcing range of the planned test. For higher
force shock tests, shaker systems and test fixtures often do not
respond in a linear fashion. Don't assume that test facilities
are always in perfect working order.
- Define all test requirements (for a particular
test) in the test plan. Provide test operators with adequate data.
Require complete (system) verification testing before testing
critical hardware. To those six, I would like to add six more
to make a dozen:
- Run self-checks at approximately test intensities.
- Don't operate shaker systems "open loop".
- Supplement your shaker system with an independent
"soft shutdown" system protector that derives its signal from
an additional redundant (safety) accelerometer.
- Encourage anyone in the lab, whether part
of the test team or observing, to call out "Question" if he/she
suspects something is not right.
- Display unfiltered accelerometer time histories
on at least one old-fashioned analog oscilloscope. During a sine
vibration test, fully investigate any departures from a sine waveform.
- Listen to your shaker. It may be trying to
tell you something. If, as at JPL, the operators cannot hear the
shaker from their control room location, place a microphone near
the shaker and a loudspeaker that is always turned "on" in the
control room.
Thanks to several people who commented upon
drafts of this article: Guido Bossaert, Dan Worth, Bob Newton,
Bob Mercado.

|