Tuesday, January 02, 2007

 

Register Corruption

This doesn't happen often, but few people are able to deal with it. Generally, someone will discover that if they rewrote a piece of code the problem goes away. Managment gaves them a pat on the back and everyone is happy until it happens again.

The first time I remember chasing one of these was when I started a new job a few years back. The crypto library would fail every once in a while. The work around was to reboot or redo the operation again. SQA has a show stopper defect filed, but it wouldn't effect development until the code actually went Beta.

It was running VxWorks 4.x on some CPU that's usually reserved for children's toys. Some crypto library would fail now and then. I tracked down the offending routine by simply checking for errors that are usually ignored. Some crypto developers wanted to ensure his own job security or enter into the International Obfuscation C Code context(real contest). To this day, I have very little idea what it was doing, but the math probably worked out fine. Unfortunately, it fooled the compiler when it tried to optimize. What I found was that some registers were being clobbered after a subroutine call that are supposed to be protected(note: some registers are not protected and can be clobbered by convention after a subroutine call). Of couse, this register held a value that is not always used, so the problem is not always detected. Debugging in VxWorks w/o an IDE is like debugging in GDB command line of a running process. Well, the compiler was already the latest and the community was no longer working on it for this particular CPU. I had to turn down the optimization to fix this problem. But the performance hit was too much, so I modified the build to only lower the optimization for that directory. Simple fix, but it took forever to find that problem.

I recently ran into another register corruption issue, but that turned out to be much easier to resolve. Some routine that wrote to H/W would occassionaly fail during stress. Fortunately, it was fixed by disabling the interrupts, so I was pretty sure it was being preempted. Bumping the priority didn't help, so it was likely an interrupt. For performance reasons, disabling the interrupts before this call was not acceptible. The routine itself was written in in-line assembly which generally spells trouble. First check was that all registers used were marked clobbered in inline assembly instructions (that's the last ':'). Then I check the MIPs processors references for the registers, but that just said all the registers are general registers with nothing special except that reg0 is always zero. Luckily, a co-worker had See MIPS Run which listed the industry conventions, and reg1 was a temporary synthesize assembler register. The assembler can clobber this anytime for command like li, and apparently the exception handler did not protected it as I believe it should. I changed the registers and it resolved this issue. I was even able to improve performance by using t1 and t2 registers which did not have to be protected after a subroutine call.

Comments: Post a Comment



<< Home

This page is powered by Blogger. Isn't yours?