The case of CGS-5 – I solved it. Honest – I did, I did.

Nortel was in a quandary – CGS-5 – the Canadian Government telephone switch in Ottawa was, as one of my Bermuda friends said, “Up and down like a bride’s nightgown!”
The embarrassing situation – having the Canadian government phone system suddenly stop working, even in the middle of very important Prime Minister communications with world leaders could not be tolerated.

Everyone, including the top honchos at Bell Canada – the provider of the service to the Canadian government, was frustrated with this tenuous, highly visible situation.
The president of Nortel at the time, Mr. Robert (Bob) Ferchat, was losing sleep at night. He was at his wit’s end, as this problem had great visibility with worldwide customers, especially his growing customer base in the USA – where the DMS product was demolishing the sales of the AT&T #5 Electronic Switching System (ESS).

The “baby bells”, created when the US federal government ordered the breakup of AT&T – found the Nortel product to be a better solution. But this open wound in Ottawa was causing great distress to the Nortel sales team and great disparaging talking points for the AT&T sales team. This Canadian designed, switching system, could not be trusted – especially in high-security locations – like state and federal governments.
At this time, I was working for Nortel in Ottawa – managing the Advanced Technologies Education and Training team – responsible for what we called the “top gun:” training for Nortel, Bell-Northern Research, and worldwide customer engineers. I had seen the movie “Top Gun” with my two sons when I first moved there and realized this is what our training staff and center would be – where the “best of the best” would be the instructors – where high-quality training, would be part of the strategic benefits for the sale of Nortel products. We were successful in adding a training gate requirement to the BNR design plan – and had a few heated meetings with BNR design teams, blocking the sign-off which would allow the product design to move forward. The training modules were a strategic part of the product design.

Nortel had an elite team of product technical specialists in Ottawa – having their own “captive network” which could emulate any “testy” problem in the field. They were known as the FAST team – First Application System Test, also funnily known as the Forget About Sex Tonight group, due to their sometimes long and tense working hours – their prime responsibility is to validate each new release of software known as BCS – Batch Change Supplement – number XX. Every customer in the world could be assured the new BCS software update, would not break their telephone network.
I chose most of my top gun instructors from this team – while also allowing them to spend time with the FAST team to keep them current on customer situations.

But I digress.

The problem with CGS-5 was interesting to me since in a previous technical life, I was known as a technical liaison for Sperry – Univac – eventually becoming UNISYS – a floor top computer design and manufacturing company. If a major customer anywhere in Canada experienced a critical technical issue for more than 4 hours – this length of time could be catastrophic for a large oil company – I was on a plane to help the local technical team. This was part of the support agreement.
The technical Liaison team experienced very strange problems that seemed unsolvable.

For example, one case in Toronto – a major insurance company, owners of an $8 million large computer system from Sperry Univac (a Univac 1108) seemed possessed. On some days between 7 AM and about 9 AM or between 3 PM and about 5 PM, the entire system drops dead without recovery. Both the customer and Sperry Univac were traumatized.

When we were called in, acknowledging a top technical team flown in from the USA, had worked on this problem for days, we discounted a logic design problem as the probable cause. The team from the USA disagreed with us being there, since they would be embarrassed if it was solved due to what the Canadian Technical Liaison team did instead of what they did – but they were also happy to have another set of eyes on the problem.
Technical people hate being embarrassed and I understood that – I hated it too.

We started looking at non-technical possibilities while keeping one of the best technical guys looking at the technical logic possibilities.
The common thread was time – only in the morning and afternoon – never on a weekend. We brought in the physical environment around the computer room. It was on the main floor – next to eight elevators — the elevators were all busy during the bracketed hours and were never busy on the weekend.

WTF, we thought. Could it be a power distribution problem? Something to do with the power usage between the elevators and the computer network — weird thinking. This did point us to looking into the power supply modules on the computer. We opened them up and inspected every centimeter and checked the tightness of all screws and the condition of all wires – we looked for burn marks and we found a metal tool known as a “spring hook” someone had left in the module during manufacturing and assembly. The spring hook is the only anomaly we found.

The spring hook was laying on the metal frame and the little hook part was on an angle – leaving a tiny gap between it, and the terminal screw of the positive power wire, which fed one of the major logic backplanes. If the spring hook touched the terminal, the power supply would short out. We looked at each other and removed the spring hook.

.
The problem disappeared – the customer was happy – we were happy, and during our analysis afterward, we surmised – the heavy use of the elevators, sometimes cause a slight vibration on the raised computer floor which was attached to the elevator shaft side of the building. The person who had left the hook inside the power supply module had cost great consternation to many people on many levels, and hundreds of thousands of dollars to the insurance company and Sperry Univac.
I will quickly detail the second experience in the last couple of sentences of this post. My experience as a Technical Liaison Customer Engineer – helped me solve the multi-million dollar outage at Bell Canada, in Ottawa – The Canadian Government Switch #5 (CGS-5)

I met with the FAST manager – Mr. Norm Peters – and asked him to allow me and my top gun instructors to form a blind and secret team to add some fresh eyes to the problem. I needed it to be secret since I did not know who was doing the deed. He thought my proposal was ridiculous – if his team and the BNR team had not solved this in over year – who the hell did I think I was to come from nowhere and solve the issue. I told him I had a great deal of experience in solving computer problems — he corrected me and told me the DMS was not a computer – it was a switch. I let that go. He was a telephone switching guy.

He said no.

I should have made an appointment with the President of Nortel, Bob Ferchat – he was desperate to find the solution, but I decided against that since I did not want to embarrass my friends in FAST and in BNR. I had discussed the cause with my top gun team – they knew my approach and proof.
I went to Bramalea and talked with Mr. Craig Belton, and with the director technical services for Nortel Canada – Mr. Gil Elliott.
I told them while working as a technical liaison in North America for a major computer design and manufacturing company – relating a case in Chicago – to help solve a funny problem with one of our customers. The system would fail at unrelated times and all failures seemed to be intermittent failures.
They had logic analyzers and scopes in various places trying to capture what was happening – to no avail – every problem seemed to be an intermittent problem.
This case had been related to me by Mr. Phil Brooks, a great friend, and associate. While having a coffee break, Phil told me about the case he had just worked on in Chicago. The intermittent fault had no logical remedy. He looked outside the backplane panel and brought in the external environment.

He devised a plan – went out and bought a Polaroid camera – no digital back then – bought some of the exploding flashbulbs required for inside shots – went back and hid behind the tape servers (known as the Univac 8C Servers). It was possible to hide behind the system peripherals back then. He waited.

The computer operator arrived – looked around and thought he was alone – then went to the back-panel doors of the computer frame – opened them – using his wedding ring – shorted out the pins on the backplane of the panel.
Flash – Phil snapped a picture – the computer system went down.
He caught him and had a picture to prove it.
The computer operator told a tragic tale – not enough money to pay the medical bills for his wife – he logged many hours of overtime during the last six months, due to the problem on the computer. He had cost his company about one hundred thousand dollars in lost time and productivity.

Back to the Nortel CGS-5 case
I told Craig Belton and later Gil – they should look for all technical people having access to the switch and separate them into sets. Each set should have at least one ring-wearer in the set, including Bell Canada and Nortel people, then move each set to a different switch location in Ottawa.

I also told them the second reason could be someone was shooting gamma rays at the switch and they could use lead barricades to block this – ridiculous – but I wanted to add a second option and add some drama and what I thought some levity too. No one laughed. I thought it was funny.

Canadian Government Switch 5 was the only switch on which this problem had occurred.
After about a week, CGS-1 had a catastrophic failure – the only person on the team wearing a ring was a Bell employee. The next day CGS-3 failed – but this time the suspect employee was watched.

Reason for his corporate treason? Same as the guy in Chicago, only greed was the reason.
He was making a ton of money in overtime – this problem cost Nortel millions of dollars and he made some extra money on overtime. Bob Ferchat and BNR thought it must be a grounding problem and a total redesign of the DMS ground plane was done and installed at great cost.

The day after they caught the Bell technician, Gil Elliott walked briskly into my office and closed the door. This was unusual – Gil Elliot never paid a visit to my office – he snapped the local newspaper in front of me.
“Did you see this?” he asked.
“Yes”, I said, “I am glad it’s over.”
Nothing else was said as he asked me for a tour of the facilities.

After about an hour of uncomfortable conversation – he left.
He never brought up the previous meeting, where I told him the remedy to the problem, nor did I.
To me, he should have mentioned it and at least said thank you.

He didn’t.

At a celebration in Bramalea attended by Bob Ferchat – everyone was happy this problem had been solved as it had been. Bob Ferchat privately was very happy, it had been a Bell employee, but he didn’t gloat about this.

As I was standing at the back of the room – Craig Belton caught my eye from the stage he was sitting on, one of the heroes in the room – when he saw me, he looked down and away.
Nobody said thanks or gave me any type of mention at all – and that was okay with me.

I think I am a catalyst – enabling people to say a harder thing to portray the truth about a situation. Most people don’t, and take credit from others for themselves. Being a catalyst, allowed others to prove who they are, without my interference.

Norm Peters was aloof but happy. He was also a hero at the center of attention on the stage. He and his team had solved the issue.

Comments are closed.