From comp.unix.pc-clone.32bit Mon Apr  5 14:03:24 1993
Xref: utcsri comp.unix.pc-clone.32bit:2466 comp.unix.sys5.r4:2451
Newsgroups: comp.unix.pc-clone.32bit,comp.unix.sys5.r4
Path: utcsri!rpi!uwm.edu!linac!att!att-out!cbnewsj!dwex
From: dwex@cbnewsj.cb.att.com (david.e.wexelblat)
Subject: The Great Upgrade Saga (LONG)
Organization: AT&T
Date: Sun, 4 Apr 1993 13:54:01 GMT
Message-ID: <1993Apr4.135401.13520@cbnewsj.cb.att.com>
Followup-To: comp.unix.pc-clone.32bit
Lines: 375


			The Great Upgrade Saga
				  or
		       How I Hacked My Hardware
		     (and lived to tell about it)

The saga I describe details the trials and tribulations of my (ultimately
successful) hardware and OS upgrades.  I learned a lot from this experience,
and I hope that some of you find this useful.  There are a lot of lessons
here, in how to find the correct hardware, in what to do and what not to
do when trying to get things working, and ultimately, the value of paying
for quality parts and quality support.

I had hoped to make this story, and the qualitative performance analysis
of the results, available a while ago.  But going through the trials and
tribulations described below put me behind schedule on my major project,
XFree86, so I'm just getting to this now.

Prelude - Relevant Background
-----------------------------
I had been running on a Tangent 486/33 EISA box, containing a Mylex MAE 
486/33 motherboard with 16M memory, 128k cache, a Seagate Wren VI (~330M) 
SCSI drive, an Archive 2150S tape, and a few other miscellaneous items.  
This system runs Microport SVR4.0, version 4.1 at the time I began this.  My 
major use for this machine is as the development base for XFree86.  As some
of you may be aware, X11R5 is a disk pig like nobody's business.  My disk
as so full that I had been in a mode for the past 4-5 months of doing a 
full backup, then deleting 30-40M of stuff so I could build the X tree, then
rolling stuff in from the tape when I needed it.  Needless to say, this had
to stop.

At AT&T, there is generally an annual bonus that comes out mid-March.  My
wife and I agreed that I could spend the money this year on upgrading the
machine with more storage capacity (last year's bonus went to pay the IRS;
this year, we bought a house, so they wound up owing US money :->).  My
plans were to get 500M-1G of disk, and a DAT to back it up with.  Microport
had recently come out with their SVR4.0v4.2, which included, among other
things, drivers for Adaptec, BusLogic, and DPT EISA SCSI Host Adapters in
Enhanced mode, so I decided to replace my Adaptec 1542B with an EISA Host
Adapter.

Chapter 1 - The Decisions
-------------------------
Once the size of the bonuses were announced, which was about 6 weeks before
the checks came, I started shopping in earnest.  I talked with the developers
at Microport, and with the technicians at Tangent, and learned that the
Adaptec 174x and the Mylex motherboards don't get along well (turns out that
the Mylex motherboards don't get along with much well - we'll get back to
that).  So I decided to go with a BusLogic 742 or 747 (the Fast-SCSI II 
version of the 742).  I decided that I could afford about 1G of disk, and
a 1.3/2G uncompressed DAT drive.

So I got out my handy-dandy copy of Computer Shopper and started looking.
Turns out that the BusLogic host adapters are hard to find.  About 2 weeks
into this, there was a posting on one of the comp.unix.pc-clone newsgroups
from Ed Whittemore of American Micro Group, where he mentioned having good
success with BusLogic host adapters.  Since I had seen a lot of useful,
informative postings from Ed, and had seen lots of adds from American Micro
Group (i.e. they'd been around for a while), I decided to drop him a note
and see what advice he had.  AMG sells Intel Unix boxes, not Intel boxes
that just happen to run Unix.  This is a very important distinction - these
guys KNOW what goes on with these kind of complex machines.

Ed and I exchanged mail for a couple of weeks, and together we decided that
a BusLogic 747S, a Quantum 1.05G Fast-SCSI II drive, and a WangDAT DAT drive
would fit my needs, and my budgets.  We discussed the Mylex motherboard, and
the problems that many people have had with them.  Since I had reports from
both Microport and Tangent of success with this combination, we decided to
wing it.  About this time, I got in touch with Tangent, and ordered updated
BIOS and EISA Config Utilities, to increase my chances.

Then it was a matter of waiting a month longer for the check to arrive.  I
never spend money I don't have, unlike some of my coworkers :->.  Through
all of this, I was very pleased with my interactions with Ed Whittemore.
He was helpful, and courteous, and AMG's prices were quite competitive.  Now,
AMG is in-state for me, so I was going to be paying a 6% sales tax premium.
But I had a nice collection of warm fuzzies, so I felt that it was a good
investment in customer support.  How wise a decision that was did not become
truly apparent until much later.

Chapter 2 - The Check Arrives!
------------------------------
Finally, in mid-March, my check arrived.  I got back in touch with Ed, to
tell him I could actually put my money where my mouth is.  We worked up
a final quote, and discovered that the Micropolis 1.05G drive was now a 
little cheaper than the Quantum, so we decided to go with that.  And with
an HP DAT rather than the WangDAT (I have a personal bias towards HP hardware,
having run HP systems for several years in college, but it's usually too
expensive for me).  So the deal was done, and the order was shipped.

At this time, the new BIOS and EISA config I had ordered from Tangent STILL
had not arrived, so I called them again.  If any of you were reading these
newsgroups about two years ago, you may recall my trials and tribulations
with Tangent (send me email if you want the gory details - the bottom line
is that I will never deal with them again).

Chapter 3 - The Package Arrives!
--------------------------------
A couple of days later, the box arrives at my office.  I run to my boss'
office to get the property-removal pass signed (so I can get it out of
the building - I had it shipped to work, because UPS has a nasty tendency
to leave things sitting on the doorstep).  I then took off for home.
Since it was already late in the afternoon, I decided to spend the evening
reading the documentation (really!), and get started the next morning.

So, the next morning, I get up bright and early, dump off my final backup
to tape, and dig in.  My strategy at this point was to get the BusLogic
up and running with my existing disk and tape, and the existing OS (since
the 747S is 1542B-compatible).  So I shut things down, pull the Adaptec, 
and drop the BusLogic in.  I boot up under DOS, and run the EISA config
utility.  Which locks my machine up tight.  No Ctrl-Alt-Del, no nothing.
Hit the reset switch.  I call Ed up, and basically decide that the Mylex
demon has struck again.  We discussed new motherboard options, and I
went off to call my wife and discuss the problem.  I decided that I was
fed up to high heaven with Tangent, and had no interest in trying to get
them to help me out.  My wife solved the problem for me by simply telling
me "Just buy it".  So I called Ed back, and ordered a NICE EISA1 486DX/50
motherboard with 256k cache, and room for 128M of RAM (16M SIMMs).  Then
I put things back together, and brought the system back up.

	MORAL #1: Stay away from Mylex EISA motherboards.  They are
	incompatible with a lot of boards.  Including, I understand
	some of their OWN EISA boards.

	MORAL #2: Stay away from Tangent.  I will say no more in this
	forum.

	MORAL #3: Marry someone who either loves you very much, or 
	knows that you're intending to build them a computer out of
	spare parts (:->).

Footnote 1: The box from Tangent arrived that night, over a month since I
had first requested it.  I put in the new BIOS and ran the new EISA config.
It didn't lock up the machine, but the BusLogic wouldn't boot.

	MORAL #4: See Morals #1 and #2.

Chapter 4 - The New Motherboard
-------------------------------
The new motherboard arrived, and I took off from work again (:->).  Got home,
read the manual (really - I'm odd that way), and set about putting it in the
box.  This is the first time I had ever done a motherboard swap, so I went
very slowly.  My plan here was to rebuild the original system, except with
the new motherboard, then proceed to the other new hardware.  I get everything
up and running under DOS.  Everything seems fine and happy.  Then I try
booting off the hard drive.  Bam!  I'm getting all kinds of errors - on-board
parity error, no ROM basic error, etc, etc.  Hmm, thinks I - I've fried the
SIMMs moving them from the old machine.  I decide to try booting the first
floppy disk from the new Microport OS.  No joy.  The system reboots in the
middle.  "Things are really hosed," I think to myself (which is wrong - we'll
come back to this problem later).

So I decide that I'm dead in the water.  It's 9:00 at night.  I decide to put
the old system back together, to see if I really did fry something.  It goes
back together just fine, and everything is happy.  So I start to think that
maybe the rumors about 50MHz motherboards are true.  It's time for bed.

Bright and early the next morning, I call up Ed (a little aside - by the
end of this day, the guy at AMG who answers the phone knows my voice; he
doesn't bother asking who's calling any more :->).  We decide to go ahead
and try to get the BusLogic host adapter up in 1542 compatibility mode, to
see what happens.  So I rebuild the system with the new motherboard and
the BusLogic installed, run the EISA config, and try booting off the hard
drive.  It starts to boot, then locks up tight.  This time my thought is
that 1542-compatibility isn't.  So Ed and I talk some more, and try to
figure out why the 1542B isn't working.  As we talk about things like
terminator power and bus parity, Ed asks me what DMA rate I've got set.
"8Mb/sec," I tell him.  Oops.  Since the EISA bus runs at 8.33Mhz, this
can't work.  So I strap for 5Mb/sec and try again.  Bingo!  Up she comes.

	MORAL #5: There's a REASON that Adaptec put that DMA-rate
	diagnostic in the BIOS.  USE IT!

	MORAL #6: If you've got an EISA motherboard that supports a 1542B
	at 8Mb/sec DMA, they've screwed with the EISA timings, and other
	things won't work (which probably explains why the Mylex is 
	incompatible with so much stuff).

So we decide to stress the machine for a while, with the 1542B and the old
OS.  I start running the Byte benchmark suite.  I'm running XFree86 at this
time, and I notice some really odd things happening (like blts to the wrong
part of the screen, etc).  Then, while scrolling a bunch of stuff, the
machine locks up tight.

Call Ed again.  We talk about how old my Orchid ProDesigner IIS is and that
my ROMs are out of date.  We also discover that the "Speed Booster" switch
is enabled.  Ed doesn't think that could be the culprit, but we try it
anyhow.  Now things look pretty good.  There's still an odd phenomenon with
colormap flashing that I haven't seen before, but I can live with that kind
of hardware problem.  So we decide it's time to put the BusLogic back in and 
see where we can get.

What we get is no video.  None.  Nada.  Blank screen.  So I call Ed again.
Hmm.  He's seen this phenomenon before with SVGA cards and BusLogic SCSI
cards.  But not with the PD IIS.  Perhaps it's because mine is so old.
Anyhow, at this point, I'm pretty much dead in the water.  We discuss
what my options are.  Ed agrees to lend me a new ProDesigner IIS and a
BusLogic 545S (Fast-SCSI II version of the 542).  So I put the system
back together in it's original form, and wait for the next package
to arrive.

Footnote 2: It dawns on me that the test I did that caused the colormap
flashing was one I had never done before.  So I tried it with the old
system.  Guess what?  It does it too.  Turns out that some recent changes
I had made to XFree86 had introduced a colormap-sync problem in cases where
the colormap was being pounded.

	MORAL #7: Nobody's software is perfect.  Don't always be so quick
	to blame the hardware.

Chapter 5 - The Final Hardware Delivery
---------------------------------------
Ed and folks at AMG are so perturbed at my inability to get things working
that they decide to put a system together in the same configuration as mine.
They get it up and working fine.  Great.  So at least we know it's possible.
Ed decides to throw another BusLogic 747 into the package he's sending me,
just in case.

The next day (I'm at home anyhow - waiting for some contractors to do some
work on the house), the package arrives.  My first step is to install
the BusLogic 545S, into the old motherboard, to make sure I can get a
Fast-SCSI II setup working for my new disk.  This works fine.  So I install
the new disk and tape drive.  Things are still working fine.  So I decide
to install the new motherboard, with the BusLogic 545S.  Things are still
fine.  So we're making good progress.  I have two board to try - the
ProDesigner IIS, and the second 747.  Since I don't want to buy a new
video board, I decide to try the 747 first.  Bingo!  Everything is up and
happy.  Turns out that the initial 747 was defective (or so it appears).

At this point, I'm pretty happy.  It's late Friday afternoon, so I decide
that I'll let things stress-test overnight, and do the OS upgrade over
the weekend.  For some reason, I decided to try booting the first OS
install floppy.  It spontaneously reboots in the middle.  Hmm, this is
odd.  I get smart and call up Microport about it.  Turns out that the
new OS version supports ISA, EISA, and MCA in the same kernel, and they
do a couple of tricks to try to figure out what you've got.  Looks like
they're tickling the wrong registers on my new motherboard.  Fortunately,
there's an option you can set in the boot string that will avoid the
check.

	MORAL #8: ALWAYS try booting the first OS install floppy when there's
	someone available at customer support.  Otherwise you may wind up
	sitting on your thumb all weekend.  (I got this one right :->).

I leave the system running the Byte Benchmark Suite overnight, and things
seem to be fine.

Chapter 6 - The New OS
----------------------
Bright and early Saturday morning, I get started on the new OS install.
My brand new disk will be the new root, and I'm doing a virgin install
onto it, so that I can have the old disk intact to copy stuff off of
(rather than having to restore from tape).  Things go well, if slowly.
Microport has a very nice install system set up, and the system detects
my BusLogic board and puts it into enhanced mode automatically.

Diversion 1: The fancy new enhanced drivers are too big to fit on the
boot floppy.  But the system can detect what you've got.  So the first
thing it does after booting up is to reconfigure itself with the new
drivers.  VERY nicely done.  It's basically transparent.

Diversion 2: This is also one of the nice things about the BusLogic board.
With the Adaptec board, you've got to change modes via EISA config.  The
BusLogic does it automagically.  Which means I could just swap SCSI IDs
on my drives to boot the old one and run in 1542 compatibility mode.

After a LONG day of installing, copying, configuring, etc, things are
pretty much done.  I decide to run MY ultimate stress-test - a 'make
World' of the X11R5 source tree.  In the middle of this, the system
locks up, with the drive LED on solid - a sign of a SCSI-bus lockup.
I reboot things, and do the 'make World' again.  It's late, and I
go to bed while it's still running.  Things are up and happy in the
morning.

Diversion 3: The time to do a 'make World' on the new system is now
ONE HOUR AND FOURTY MINUTES!  On my old system it took about 3.5hrs.
That's LESS than half the time.  I never expected THAT.  More on
benchmarking will be posted later.

I spend Sunday doing more configuring and moving, etc.  I keep a couple
of files handy - one that lists my hardware configuration, and one that
lists basically every change I have made to the OS since installation,
including every single package I install.  I keep hardcopies of these
separate from the machine, printing them out any time there's a significant
change.  This is good advice for anyone, as a way to cope with a catastrophe.

Anyhow, as I'm editing the hardware configuration file to add in all my
new toys, I notice that the description of both my Seagate and Archive
drives indicate that SCSI Parity Checking is OFF.  Now, I know I've checked
those jumpers 20 times, but this certainly could explain the SCSI bus
locking up.  So I shut the system down, and pull the drives out.  Sure
enough, the parity jumpers are out.  I add the jumpers, put the system
back together, and wait to see what happens.  The system hasn't locked up
in a week now, since I fixed that.

	MORAL #9: I don't care if you've checked the jumpers 497 times.
	Check them AGAIN!

Chapter 7 - The Final Problem
-----------------------------
At this point everything is looking really good.  I go to work on Monday,
and try the one thing I haven't tried yet - calling into my machine.  No
joy.  It answers the phone, but no response.  Shit.  So I start playing with
things when I get home.  My modem is on a dumb serial port, using SAS.
I hooked a terminal to the port, and what I see basically looks like the
tty settings are being ignored - no echo, no cr-lf mapping, etc.  And
there are 'unknown ioctl' messages on the console.  Hmm.  Maybe SAS isn't
compatible with the new OS.  So I put the Microport stock serial driver
back in (which is a radically modified version of the USL driver).  Guess
what?  Same problem.  I try using 'getty', 'uugetty', and 'ttymon', with
both drivers.  Still no joy.  To rule out the terminal, I hook it up to
a port on my Equinox smart board.  The terminal is fine.

At this point, I'm assuming a hardware problem, since 6 different software
combinations all fail.  But the port works bidirectionally for dial-out
connections.  And I can hook a mouse to it and it works.  I called Ed
about this one, too, and he figured it sounded like bad hardware.  The
folks at Microport had no good ideas.  So I borrowed a dumb 2S/1P/1G
board from work, and took that home.  Guess what?  Exact same problem.
So it's software.  But what the hell is going on?

The answer came to me in the shower the next morning (really).  I was
thinking "Hmm.  The ioctls should be handled by the line discipline.  I
wonder if it's somehow got the wrong line discipline."  Bing!  The little
light goes on over my head (:->)  "I wonder if it's got a line discipline
at all!".  I jump out of the shower, dry off, and go do 'cat /etc/ap/chan.ap'.
Bingo!  I forgot to add the SAS device.  And Microport had the entry for
their ASY driver wrong.  So 'ldterm' was never being pushed.  Fixing that
solved the problem.

	MORAL #10: Even if several unrelated software components fails,
	it still doesn't mean the hardware is bad.

Conclusions
-----------
It took about two weeks to get all of these upgrades done and stable.  I
had hoped to get it done in a day or two.  But I'm VERY pleased with the
final outcome.  This machine has been solid as a rock for a week now, and
runs like a bat out of hell.

I cannot stress enough how important it is to find a vendor who really
KNOWS what's going on.  Had it not been for Ed Whittemore's continual
support, I would have thrown in the towel fairly early on, and just bought
a cheap drive to slap onto my old 1542B.  But Ed stuck with it, and we
worked out the problems.  I know have a system that performs far better
than I ever expected it would.  It was money well spent.  You can be
QUITE sure that Ed and AMG will be getting ALL of my future business.
It's the little things, and not-so-little things, that make for a great
vendor.  The thing that showed me what a quality shop they are was that
they loaned me ~$1000 in hardware to get this problem resolved once
and for all.  I was a first-time customer, and they really didn't know
me from Adam.  But that one thing has made me a customer for life.

I also can't stress enough the importance of net-connectivity and
activity for vendors.  I got in touch with Ed exclusively because of
his net activity, and my seeing that he knew what he was talking about.
I have since put a couple of friends in touch with Ed, and they have or
will be involved in even more purchases.  So being active on the net
has made a good chunk of money for them.  If you want to get in touch
with Ed, you can contact him at <ed@maxed.amg.com>.

To conclude, this has been an interesting experience.  I've learned a
lot from it, gone through some disappointments, and finally succeeded
in getting a great system together.  I hope that this long saga will
help people out with their own systems.  I wish that more people would
have posted such informational diatribes in the past, as it might have
saved me some headaches.


--
David Wexelblat <dwex@mtgzfs3.att.com>  (908) 957-5871
AT&T Bell Laboratories, 200 Laurel Ave - 3F-428, Middletown, NJ  07748

"Love is like oxygen.  You get too much, you get too high.  Not enough and
 you're gonna die."  -- Sweet, Love Is Like Oxygen


