Nerd Expo: January 2007

Wednesday, January 31, 2007

inline assembly for MIPS

Here's a piece of code to prefetch data. It's useful if the compiler is use mips2, so the gcc __builtin_prefetch() may not work.

/* Prefetch Header */
#define PREF_LOAD 0 /* intended for read */
#define PREF_STORE 1 /* intended for write */

#define _PREFETCH(v,t,o) \
__asm__ __volatile__ \
( \
".set push\n" \
".set mips4\n" \
"pref %1, %2(%0)\n" \
".set mips2\n" \
".set pop\n" \
:: "r"(v), "i"(t),"i"(o) \
)

.set is a directive to the compiler and not actually assembly. It'll be translated into multiple assembly instructions, but not to be confused with synthesis instructions like li.

.set push and pop will save the required registers on the stack.
.set mips4 and mip2 set the mode of operation on the processor.
pref is the assembly instruction to prefech data. This is a mips4 instruction, which is why it's preceed by .set mips4.

"r" is a constraint says to keep the value of v in a general purpose register.
"i" says the value is a constant of the type integer

%0, %1, and %2 refers to v, t, and o respectively. The value used to reference a variable is dictated by the order the varables appear after "::".

Example: Writing a 64bit value when the compiler is using 32bits.

void write64(uint32_t addr, uint64_t val)
{
uint32_t high = val >> 32; /* we don't assume endianness */
uint32_t low = val & 0xffffffff;

__asm__ __volatile__
(
".set push\n"
".set noreorder\n"
".set noat\n"
".set mips3\n"
"dsll32 $16, %1, 0 \n"
"dsll32 $17, $0, 0 \n"
"dsrl32 $16, $16, 0 \n"
"or $17, $17, $16 \n"
"sd $17, (%2) \n"
".set mips2\n"
".set reorder\n"
".set pop\n"
:: "r" (high), "r" (low), "r" (addr)
: "$16", "$17"
}

$16 and $17 are general purpose registers. The line begining with ":" says that $16 and $17 need to be saved and restored as part of ".set push" and ".set pop".

To save time, replace $16 and $17 with $8 and$9. These are temporary registers that don't need to be protected after a subroutine call, so you can remove the line with ":" to save 4 instructions.

# posted by Mack @ 12:06 PM 0 comments

sizeof my_what or sizeof (my_what)

sizeof is an operator, like ++ and *, and not a function.

To get the size of a structure, use (). To get the size of a variable, don't use ().

struct st_s{
int i, j;
} s;

int sz;
sz = sizeof(struct st_s);
sz = sizeof s;
sz = sizeof (s); /* misused, but gcc and probably most compiler will accept this */

# posted by Mack @ 12:01 PM 0 comments

Why bother with C++ explicit cast

C++ explicit cast are static_cast, const_cast, reinterpret_cast, and dynamic_cast.

int i = 5;
float j;

j = (float)i; /* C cast */
j = float(i); /* C++ generic cast */
j = static_cast(i); /* C++ explicit cast */

The reasons to use C++ explicit cast are clarity and correctness. It's a safty net as show in the following examples.

int i = 5;
float *pj;

pj = (float *)i; /* legal */
pj = (float *)(i); /* legal */
pj = static_cast(i); /* ILLEGAL */
pj = reinterpret_cast(i); /* legal */

When you got a bug in your code and you suspect it's caused by casting, you'll generally look at the reinterpret_cast before static_cast, so it can help reduce the debugging effort too.

# posted by Mack @ 11:37 AM 0 comments

Tuesday, January 30, 2007

Inlining C routines

Standards says to use __inline__ with the "__".

GCC supports with and without and there is no functional difference between them.

Declaring a routine __inline__ is merely making a suggestion to the compiler. It' not a guarantee that the routine will be inline. It's a common misunderstanding that inline is functional equlivant to #define.

GCC will not inline a routine when the optimization is off.
GCC option -finline_function ask the compiler to inline any routine it deeds a good candiate for inline, even if that routine is not marked inline.
GCC option -winline will warn you when an inline function can't be inlined.

Should one use extern, static, or PO(plain old) __inline__?

Use static __inline__ in a source file. The compiler will inline when it wants to, but if it can't, it'll be a normal static routine. You can do this in a header file too and is a good fix for a case where a bunch of inline routines were defined extern, but the compiler didnt' inline them and the linker is now complaining about unresolved symbols. This is probably not the best solution, but it's quick and easy.
Don't use extern __inline__, cause there's nothing good to come out of it. There's no body for this routine unless you explicitly define one somewhere. Don't do this, cause anytime you duplicate code, you must maintain multiple version of the same file and someone down the line will forget to update one of them. If the compiler does not inline this routine, the linker will complain about unresolved symbols.
I need more research to figure out what the PO __inline__ is good for. I thought I knew, but I ran a test and it failed.

Why does inline exist?
Inline is suppose to trade space for performance. Instead of have a routine defined once, you put the body of that routine inside the caller. The saves the time it takes to make a function call. The time it takes to make a function call is machine dependent and consist of setting up a new frame on the stack, saving and restoring registers, and a couple of branch/jump instructions. For a small routine, the overhead may take longer than the excution of the routine itself.

My suggestion is to not inline anything. Try the -finline_functions and -O2 option with GCC to see if it makes you code run any faster. Make sure you instrument you code first to guage performance before you start to inline anything to see if it really makes a difference. Linux kernel got a make over where the developers went through all the code and removed a bunch of inlines. The reason was inline can make the code expand which takes more memory. When the code gets bigger, you increase the likely hood of getting cache misses which would negate any benefit you get from inlining and probably make things worse. When you really need to improve your system performance, inlining will probably make very little difference relative to all the other things you can do, but it is a low hanging fruit that gets misused a lot.

# posted by Mack @ 10:28 AM 0 comments

Friday, January 26, 2007

Debugging Corruption

The difficulty here is the problem is not found until someone uses the corrupted data. The code crashes or produces incorrect data, and the problem is assigned to the wrong developer.

Questions to ask
1. Check if your OS has ways for you to check your stack depth and validate that your stack has not crash into your heap.
2. Check for buffer overflows of buffers on the stack.
3. Check for any inline assembly or assembly code. Do they properly protect the registers they use?
4. Does disabling the interrupt or bumping priority effect the problem or make it go away?
5. Does the problem only happen after a subroutine call?
6. Is the stack getting corrupt or is it a general register?

Here's a strategy for heap corruption:

First thing to do is put up boundries, both physical and temporal. You want to catch the problem as soon as possible. The best result is to root cause the problem. The next best thing is to proof it's not your code and find someone else to hand off the problem too.

To put a physical boundry around your code, just reserve large buffers before and after the data that's getting corrupted. If it's random data, then put these large buffers everywhere. Mark these buffers with some data that's humanly reable and seemly random. I suggest 0xdeadbeef, because it's an english word that's unlike to be used and it's an odd number. Odd number is important because the only time you see a number this big is when it's a pointer, and valid pointers are always even(unless you're running an 8 bit processor).

When the problem occurs, dump the buffers to see if it's corrupted. If the program crashed, you'll need to generate a core dump for post-mortem analysis with GDB.

To put a temporal boundry around your code, write routines which validates these large buffers have not been corrupted and place these checks whereever someone enters your module.

The above works well on a singled threaded application that has no exception handler or ISRs. If you suspect an exception handler, then just put the validation routine in the exception handler. If you suspect another thread, try bumping the priority of the thread that your module runs to be the highest in your process. If that doesn't work and you suspect and ISR, then try disabling the interrupts.

Debugging is part logical, luck, art, and intiutions. My strategy for debugging is not to find the root cause, but to narrow it down. Think of ways to quickly eliminate as many of the likely possibilites as possible. Your source code control is also a valuable tool. If the problem is a regression problem, find where the problem started to occur and see what check in occur from that point to the previous version of the code that did not have the problem. Managers and SQA love to use that latter method.

# posted by Mack @ 2:42 PM 0 comments

Friday, January 19, 2007

Slot Aloha for Bandwidth Request

This is a link level protocol for wireless point to multi-point configuration. It'll work for both fix channel and frequency hoppers. In each time period, allocate a time for the nodes to request bandwidth. This time slice is contented in that any node can attempt to sent. The base station will then anounce which node gets to send data. This reduces the contention time of a standard aloha protocol. It worked pretty well in the 2.4G and 3.5G bands with 32 nodes running heavy traffic.

For nodes that get to send, it should piggyback its request for more bandwidth on the data frame. The nodes should use an exponential backoff algorithm like the old 10/100 Ethernets when it detects that it's bandwidth request collided with that of another node's.

I remember getting near an array of radios and feeling hair in the back of my hand tremble in awe. Also climbing those 200' radio towers which are 1' wide when the wind is blowing made my knees tremble a bit. Seeing the drunk free climb a 300' tower to pee off the top was priceless.

I'm almost intested enought to google 3G or iMode to see how those protocols work.

# posted by Mack @ 7:13 PM 0 comments

Windows Freeware

1. Openoffice - replaces MS Office and also runs on Linux.

2. Spybot - removes spyware

3. iTunes - a decent way to backup your MP3 and you may occassionally need to buy something

4. Classic Media Player - replaces Windows Media Player and does not require another application to play commerical DVDs.

5. Deepburner - The only free DVD data burner I found. There's also a paid version.

6. MAME - plays old arcade games by emulating the hardware and running the same ROM that was in the original game.

7. Daphne - plays old laser disc games like Dragon's Lair and Space Ace

8. Virtual Pinball - Get the GnR table

9. The emulator zone has many more emulators to old consoles(NES) and computer(Commodore 64). I didn't have an interest to try any of those. I did try the Apple IIe emulator once to play Bards Tale, but you can get the PC version now.

10. Powertab - guitar tabs program with a play button. It's not as popular as Guitar Pro and probably not as good, but it's free.

11. Ultimate Guitar has all the tabs you would want.

12. Audacity - record or edit your music. It only records a single channel. I use my digital 8 track to record and Audacity to edit.

13. Internet Archive has free music and books. It's famous for all the live Grateful Dead concerts, but check out the Derek Trucks band and the 78RPM stuff.

14. DOS Box and VDMSound - These can help you play old DOS games. I needed this in Win2K, but the latest XP seems to run those games fine without this.

15. DVD Decryper, DVD Shrink, Rip4Me, and FixVTS - This combo will work for any DVD that your PC can read. Some movie DVDs are not recognized by some DVD Drives.

16. DVD2AVI will get the audio off of DVDs in case you want to make a CD from the concert you have on DVD. Use Rip4Me to get the VOBs, then load it with DVD2AVI. Audacity can edit the audio. In DVD2AVI, press F5 and ESC to play a bit of the move. Take note of the audio settings and click Audio->Channel Format->xxx to set the proper format.

17. Auto Gordian Knot - Like DVD2AVI but works well with Dolby. Takes a few hours.

18. Putty - telnet and ssh client

19. Cygwin - UNIX bash shell and it comes with standard UNIX tools

20. VIM - VI like editor for those who want to do it all without a mouse

# posted by Mack @ 6:43 PM 0 comments

Free Windows Software Development Tools

I'm planning to learn C++ by writing a simple game. Getting the initial setup and learning the tools is very important. Here are a few different free configurations for Windows to boot strap a C++ development. I'm baised towards UNIXie tools.

GUI Interface
1. Bloodshed C++ IDE
This includes the editor, compiler tools, and debugger. The compiler tools are GNU tools and it appears to be integrated with CVS, but I haven't tried that feature yet.

Command Line Interface
1. VIM*
This is an updated VI with color. You can download kits to configure VIM for any programming language.
2. Cygwin
This is a UNIX bash shell so you can use the GNU tools directly from a BASH shell on you Windows machine.
3. GNU Tools
a. gcc - this include the compiler, assembler, and linker to turn your C++ code into an executable.
b. make - automate the build. This is a very tough tool to master, but easily enough to get started on just by copying one of the examples from the document.
4. Subversion** or CVS

*VI and Emacs are the defacto UNIX editors. You only need to choose one as all modern UNIX based machines will have both. Many UNIX users only know one such as myself.

**Subversion is suppose to be an updated version of CVS. I tried to download the source, but it always fails. Maybe the project is running low on funds and can't affort too many downloads.

You may ask yourself why anyone would use the Command Line interface when nice GUIs are available? Flexibility, stuborness, and legacy(code and developers). I perfer to use a GUI IDE to manage my personal projects, but toolchain gurus like to hack the Makefiles. On large projects, it's actually easier to use command line interface because you can write scripts. The scripts can compile multiple projects, check for errors, generate reports, and run a regression test suite. Also, on large embedded projects, there may be multiple CPUs on a single platform to compile code for and each uses a different toolchain. I'm not aware for an IDE that handles this situation as gracefully from a configuration management's POV as its command line counterpart.

2nd reason to use Command line is I got some an old setup from a previous job that's running a commerical RTOS. I'm not sure if the hardware still works or I still remember how to setup the build environment, but it'll be interesting to rewrite the driver in C++ if I was demented enough.

Thinking in C++: 2 volumes and it seems to have good reviews on Amazon. I read a bit of this and have been writing code as I go along. I believe the author made a mistake in regards to using volatile for mutli-threaded programming. This is incorrect. Nothing in the C/C++ standard to my understanding supports the multi-threaded model (such as ADA). This is handled by POSIX standard libraries and probably some Microsoft libraries.

http://www.bluedonkey.org/cgi-bin/twiki/bin/view/Books/VxWorksCookbookCPP
This seems interesting for using C++ in an embedded environment. I'll save the link to check out later.

# posted by Mack @ 5:22 PM 0 comments

Wednesday, January 17, 2007

How to Win Friends and Influence People - summary

I got this from the web, and I like it enough to back it up in case that site goes down. I remember the first rule from an ROTC class I had. It's been a while since I read the book, but it nice to see the points outlined here.

How to Win Friends and Influence People
This is Dale Carnegie's summary of his book, from 1936

Part One - Fundamental Techniques in Handling People
1. Don't criticize, condemn or complain.
2. Give honest and sincere appreciation.
3. Arouse in the other person an eager want.

Part Two - Six ways to make people like you
1. Become genuinely interested in other people.
2. Smile.
3. Remember that a person's name is to that person the sweetest and most important sound in any language.
4. Be a good listener. Encourage others to talk about themselves.
5. Talk in terms of the other person's interests.
6. Make the other person feel important - and do it sincerely.

Part Three - Win people to your way of thinking
1. The only way to get the best of an argument is to avoid it.
2. Show respect for the other person's opinions. Never say, "You're wrong."
3. If you are wrong, admit it quickly and emphatically.
4. Begin in a friendly way.
5. Get the other person saying "yes, yes" immediately.
6. Let the other person do a great deal of the talking.
7. Let the other person feel that the idea is his or hers.
8. Try honestly to see things from the other person's point of view.
9. Be sympathetic with the other person's ideas and desires.
10. Appeal to the nobler motives.
11. Dramatize your ideas.
12. Throw down a challenge.

Part Four -
Be a Leader: How to Change People Without Giving Offense or Arousing Resentment
A leader's job often includes changing your people's attitudes and behavior. Some suggestions to accomplish this:
1. Begin with praise and honest appreciation.
2. Call attention to people's mistakes indirectly.
3. Talk about your own mistakes before criticizing the other person.
4. Ask questions instead of giving direct orders.
5. Let the other person save face.
6. Praise the slightest improvement and praise every improvement. Be "hearty in your approbation and lavish in your praise."
7. Give the other person a fine reputation to live up to.
8. Use encouragement. Make the fault seem easy to correct.
9. Make the other person happy about doing the thing you suggest.

I think my father has the audio book. I may pop that in the car.

Here's my summary of Think and Grow Rich by Neoploen Hill
1. Don't gave up
2. Be persistent
3. Learn from your failures
It's pretty much one simple message over and over again.

# posted by Mack @ 1:40 PM 0 comments

Friday, January 12, 2007

Embedded Design for Performance (MIPS)

An embedded system is waiting for an external event to occur and then
performance some action based on those events without user interaction. The
main considerations are generally performance, footprint, and power.
1. Keep the architecture as simple as possible, but no simpler
Below are architectures list by there level of
complexity.
A. Polling Loop
The CPU loops and polls all the external inputs. It's festible to write
all or most of this in assembly. The latency for servicing any input
is the worst time through the loop once. This is power hungry so it's
no good for anything running on batteries.
B. Interrupt Driven with Single Loop
ISR will handle external events. The loop is left to do data crunching.
Locking becomes an issue here. The latency for any interrupt is the
processing time for any higher, possible same, priority interrupts plus
any lock outs by the main loop or other ISRs. It's still festible to
write most of all of this in assembly. The loop can poll or wait for an
event. For power sensitive applications, the loop should wait. This
applies to the more complex architectures, too.
C. Single Thread
Basically the same as B, but you have a stack and an RTOS running. The
RTOS probably supports multiple threads and is just chewing up extra
clock cycles. Benefits for an RTOS is that it should come with tools.
Most code will be written in a higher level language such as C.
D. Multiple Threads
Similar to the single thread model plus scheduling overheads and more
sychronization issues.
E. Multiple Processes
Each process should have its own memory space, so you get an extra TLB
overhead. The scheduling algorithm is probably more complex and thus
adds to the scheduling overhead. Sharing resources between processes
is generally much more expensive.
F. SMP Processors
These are CPUs with multiple cores. The cores share memory and cache.
The architecture will most likely be mutliple threads or processes.
Sychronziation between cores is more expensive.
2. Design around the Cache Flow
To minimized cache misses, design modules and data based on when they'll
be used by the CPU.
2.1 Modulization
Modulization is breaking a large piece of code into smaller managable
pieces or modules. A module could be a file, library, object, or a
section in a file. They're usually designed around a hardware component
or an abstract object. For better performance, these modules should be
designed around completing a task.
Example of typical serial driver design.
1. A method for the user to send and receive data.
2. Software queuing code
3. Hardware manipulators routines
Example of a serial driver design around cache flow
1. Transmit Flow
a. A method for user to send data
b. TX queueing code
c. Hardware manipulators routines for transmission
2. Transmit Flow
a. Hardware manipulators routines for retrieving data
b. RX queueing code
c. A method for user to receive data
The latter method will reduce cache misses.
2.2 Cache Friendly Data
Group data that are likely to be used together on the same cache line.
Check your particular CPU to see what the cache line is.
Example of how to do this for a CPU with 8 byte cache line.
struct cache_friendly_s {
/* cache line 8 bytes - cnt 0 */
int32 a;
int32 b;
/* cache line 8 bytes - cnt 1 */
char c[4];
int32 d;
}
Notice the comments also has a count. This information is useful to
prefech, which will be cover later.
To help you count properly, make sure each data member is CPU aligned.
If the CPU is 32 bits, allocated all int16 in twos and chars in 4s.
There's two reasons for this. One it improves performance if all data
is CPU aligned. Two, the compiler will likely do this for you if you
don't by adding fillers.
2.3 Don't Wait, Prefetch
Prefetching data is having the CPU load some memory into cache that'll
be needed by some instruction in the future. This is not guarantee to
improve performance, and will actually hurt performance if prefetch
is seldomly needed. It's a good idea to do this after the code has been
instrumented so you can gauge how effect prefetching is. You can
write inline assembly to prefetch or use __builtin_prefetch() in gcc.
It's also possible to prefetch hardware registers.
a. Reserve register to do prefetch with. Using gcc with a MIPS core,
the compiler option --ffixed-t9 will reserve general purpose
register t9 for the user to manipulate.
b. Write inline assembly code to load the hardware register to the
general purpose register.
c. Write inline assembly to access that general purpose register.
The concept to grasp here is that the load instruction (b) will not
block the pipeline. Accessing the register (c) will block if (b)
has not finished loading the hardware register.
Beware:
a. Ask yourself if it's OK to prefetch this register early.
b. Don't make any library calls that is not compile with the same
--fixed-xx option.
c. Consider disabling all interrupts between prefetch and actually
accessing the register because any files not compiled with the
same --fixed-xx option may thrash this register, including your
RTOS's scheduler and ISR handlers.
d. Any inline assembly or assembly code will disregard this compiler
option.

# posted by Mack @ 6:40 PM 0 comments

Wednesday, January 10, 2007

ISR Efficiency

If the ISR is called often and the total processing is small, then the overhead of setting up the stack may come into play. A big part of this overhead, not readily obivious in C or C++, is setting up the stack. The number of registers the compiler decides to protect can be a large part of the processing.

Check the assemble code to see how many registers the ISR protects. Below is a list of things that can reduce the overhead in C.

1. Try to simplify the code as much as possible
2. Reduce the number of variable accesses, both local and global
3. Don't make subroutine calls - This may cause the compiler to save every register
4. In-line any subroutine calls you have to make.
5. Put subroutines you have to call but can't in-line into the same module as the ISR. Some compiler are smart enough to peek at all subroutine calls to figure out what registers are used instead of blinding assuming every register will be used.
6. If an ISR only calls a routine sometimes, then it may be more efficient to put that routine in a different ISR and trigger the irq in the original ISR when the calls is needed.

Let the disassembled code be your guide. Remember that in an ISR, even scratch pad and temporary registers that are normally not saved in subroutine calls have to be saved.

Another method an ISR can reduce the processing to overhead ratio is to increase processing instead of reducing overhead. For example, if you're writing a driver that's getting data from a piece of hardware and the data usually comes in brust, consider polling for more data in the ISR. Even if there's no data now, retry a couple of times before exiting this ISR. This depends heavily on the application, but is another useful tool to have.

References:
Embedded Systems Programming (latest issues)

# posted by Mack @ 11:51 AM 0 comments

Tuesday, January 02, 2007

Cache Considerations when Queuing

Using a FIFO(stack) is more efficient than FILO(queue), because whatever was enqueue last is most likely to be in cache when you dequeue. This works well for a free list.

I just did this and found the average performance for dequeuing from a free queue went up 3 times.

When dequeueing, it may be a good idea to pre-fetch the next buffer or two on the list. Pre-fetching still cause cycles even when the memory is already in cache, so this may or may not improve your performance depending on how much cache miss the dequeue operation is experiencing.

# posted by Mack @ 5:36 PM 0 comments

Register Corruption

This doesn't happen often, but few people are able to deal with it. Generally, someone will discover that if they rewrote a piece of code the problem goes away. Managment gaves them a pat on the back and everyone is happy until it happens again.

The first time I remember chasing one of these was when I started a new job a few years back. The crypto library would fail every once in a while. The work around was to reboot or redo the operation again. SQA has a show stopper defect filed, but it wouldn't effect development until the code actually went Beta.

It was running VxWorks 4.x on some CPU that's usually reserved for children's toys. Some crypto library would fail now and then. I tracked down the offending routine by simply checking for errors that are usually ignored. Some crypto developers wanted to ensure his own job security or enter into the International Obfuscation C Code context(real contest). To this day, I have very little idea what it was doing, but the math probably worked out fine. Unfortunately, it fooled the compiler when it tried to optimize. What I found was that some registers were being clobbered after a subroutine call that are supposed to be protected(note: some registers are not protected and can be clobbered by convention after a subroutine call). Of couse, this register held a value that is not always used, so the problem is not always detected. Debugging in VxWorks w/o an IDE is like debugging in GDB command line of a running process. Well, the compiler was already the latest and the community was no longer working on it for this particular CPU. I had to turn down the optimization to fix this problem. But the performance hit was too much, so I modified the build to only lower the optimization for that directory. Simple fix, but it took forever to find that problem.

I recently ran into another register corruption issue, but that turned out to be much easier to resolve. Some routine that wrote to H/W would occassionaly fail during stress. Fortunately, it was fixed by disabling the interrupts, so I was pretty sure it was being preempted. Bumping the priority didn't help, so it was likely an interrupt. For performance reasons, disabling the interrupts before this call was not acceptible. The routine itself was written in in-line assembly which generally spells trouble. First check was that all registers used were marked clobbered in inline assembly instructions (that's the last ':'). Then I check the MIPs processors references for the registers, but that just said all the registers are general registers with nothing special except that reg0 is always zero. Luckily, a co-worker had See MIPS Run which listed the industry conventions, and reg1 was a temporary synthesize assembler register. The assembler can clobber this anytime for command like li, and apparently the exception handler did not protected it as I believe it should. I changed the registers and it resolved this issue. I was even able to improve performance by using t1 and t2 registers which did not have to be protected after a subroutine call.

# posted by Mack @ 3:36 PM 0 comments

Quickly Root Cause Easy to Reproduce Problems

Once you figure how what is causing the problem, you need to detect the problem ASAIH(as soon as it happens). To do this, create a MACRO or routine to check for that condition. Now sprinkle that check all over the place or use the binary search method to find the problem. This is so quick and easy I don't see why more poeple don't do it. I run into a lot of guys who think this method is some kind of hack and real programmers need to read and understand the whole code first. They'll usually find 100 things they don't like about the existing code and make a bunch of "good" changes that may or may not solve the problem. This type of code review should be replaced by a group code review when a module has pass its unit testing, but most organizations skip this step.

If the crash goes away or changes behavior when you put your code in, then it's a random memory corruption issue or a race condition. Assuming it's not a race condition, see my post on Memory Corruption without MMU to hints to solve this problem. Of couse, if the piece of memory is written only once and you have a debugger, you can use the hardware breakpoint feature to catch who is corrupting this memory. Also try turning up the warning level on your compiler or run some form of lint. The latter may take some time to configure to run properly.

See future post for race conditions.

# posted by Mack @ 2:59 PM 0 comments

Kernel Crash w/Stack and Register Dump

Someone just came for help with a Linux 2.6 kernel crash. The exception handler caught a bad memory access and dump a pseudo stack trace with registers before the system rebooted. The pseudo stack dump has the calling routine's name and offset, but not automatic variables and arguments. The register dump has all general registers. KGDB is not setup and the tool guys will never set it up.

First step is to find the code that crashed and disassmble it(1). This gaves you the exact instruction(2). I suspect this is caused by a bad pointer reference. With luck, the pointer is in the register(3) or there's enought information in the registers to help pinpoint the problem. The disassemble code generally uses a0, s2, t3 and such to reference registers. Each CPU type has its own convention for how to map these to the actually register. Unfortunately, these conventions are not generally part of the CPU's H/W spec as they are industry conventions.

After you figure out what pointer is causing the problem, you have to code review or add auto detecting code(4). I like the latter whenever possible.

(1) Disassembly the .o file with objdump -S option to interlace the source code and assembly.
(2) The point where the code crashed may not be where the PC is pointing to, but it should be closed, like the previous instruction. In a RISC architecture, where the instructions gets executed in the pipe lets you know how far back to go. Due to branch delay, it may even be the line right after the previous branch.
(3) On RISC processors, the address must be in a general register to be referenced. CISC processors allow referencing pointers from memory, but I believe the address is still loaded into some register before the core can access it, but it's not explicit and you have to look at the CPU specification or ask you vendor.
(4) Auto detecting code is anything from a simple ASSERT to complex data integrity checking routines. If performance is an issue, make this a compiler option. Many developers don't like to release code with these checks because it can cause a crash when the system would normally not crash. Well, I say release the code with the checks cause if it's broken, then why run a piece of code when you don't know how it's going to behave. More than likely, sweeping these issues under the rug will cause many unexplain problems to occur in the field, and it'll take much more effort to root cause those issues. Don't kill yourself trying to convince the old timers because they've worked on HA systems since before you were born and it's how they've always done it.

# posted by Mack @ 1:52 PM 0 comments

Tech Books that are Worth Their Weight in Gold

This list obiviously depends on what you're doing so I'll try to put it in context. Most all the information is on the web, but I still strongly recomment you buy these books if you get paid to know this stuff.

IPv4 Networking Basics:
1. Internetworking w/TCPIP Vol 1 by Comer (updated version is with Stevens)
2. The Protocols by Stevens (I don't own this one but my previous work place had one)
Either of these books will gave you the basic understanding you need, but you only need one so save you money and choose. For more details or updates, go get the RFCs.

UNIX Programming including POSIX and Sockets:
The Richard W. Stevens series of 3 books are the most useful UNIX programming books I'm aware of. It's a shame Richard is not around to do updates for LINUX (RIP). People in the industry refer to these as the Stevens books as no one really remembers the name of the books. You can make a career out of copying examples from these books, but protect yourself by changing the names of the functions and variables.

Linux Kernel:
Linux Device Driver by Rubini and Corbet. I have version 2, but will get version 3 when I need to do any work with Linux 2.6. This book is not a good reference or reading book. You need to write you own drivers with the book as a guide to learn this stuff, but it's better and easier than how the old timers learn this stuff, which is randomly picking an exising driver and picking it apart.

CPUs:
See MIPS Run by Sweetman is easy to understand and interesting reading. If you're using any RISC core, get this book. It's better written than most CPU books and has valuable information you don't get from the CPU reference manuals.

Networking Basics:
Interconnections by Perlman is great reading, but work througt the excerises at the end of each chapter. If you ever read an RFC or technical specifications for OSFP or other routing protocols and said WTF, then read this book first. She's one of the best tech book writers.

Good Programming Habits C/C++:
1. Kernighan and Pike - The book is name The Practice of Programming, but most people know it by the authors. It's a small and insightful book.
2. Writing Solid Code, because it's much shorter than Code Complete. You can apply the teachings of each chapter as soon are you read it and belive me, they work. However, you'll find many experience programmers disagree with some information in this book (eg. Mcguire does not like defensive programming), and they'll gave you a hard time during code reviews. Stick to the book whenever possible causes it's much better than the "experience" programmers IMHO.

Security
1. Apply Cryptography by Schneier is so good, it made Schneier famous. After reading this book, you'll understand how to go about designing a secure system and also hacking a secure system. Don't carry this book on an international flight as it's probably illegal in some countries, seriously. You only need to read the first 100 pages or so unless you're going to implement a particular algorithm, which is highly unlikely for work related purposes as most of companies will buy the s/w libraries or hardware to do this.

2. Smashing the Stack for Fun and Profit(article) by Alphe One is the bases for probably all computer worms and virus. He didn't invent the technique, just documented it.

3. Hacking Exposed: I'm not sure if this is around anymore and all information is probably outdated, but the concepts for new hacks will likely be the same. This holds true for any hacking book, learn the concepts.

After reading these three books and articles, you'll need to choose if you want to get paid or become infamous.

Fibre Channel Networking:
1. Fibre Channel, A Comprehensive Introduction by Kembel in case your manager won't spend the $2K for you to take the class. It's like the T11 documents, but slightly better written and comes in a nice hardback binding. You'll still need to reference t11.org, but check the book first.

Kembels got other books that seem pretty good if you need to dive into loops and such. I've read them years ago, but didn't ever need to buy them.

Algorithms:
Whatever copy you got in college is good enought as they all seem to copy each other. Otherwise, get Kuths if you want detail analysis of each. Before you implement any of these algorithms, check out the BSD implmentation. They don't come with the GPL so it's easier to integrate with your code. Discalmer: I'm not a lawyer.

Check out the GNU site for documents for GNU tools and compilers, and http://www.open-std.org for C and C++ standards.

# posted by Mack @ 9:51 AM 0 comments

Nerd Expo