Tuesday, January 30, 2007
Inlining C routines
Standards says to use __inline__ with the "__".
GCC supports with and without and there is no functional difference between them.
Declaring a routine __inline__ is merely making a suggestion to the compiler. It' not a guarantee that the routine will be inline. It's a common misunderstanding that inline is functional equlivant to #define.
GCC will not inline a routine when the optimization is off.
GCC option -finline_function ask the compiler to inline any routine it deeds a good candiate for inline, even if that routine is not marked inline.
GCC option -winline will warn you when an inline function can't be inlined.
Should one use extern, static, or PO(plain old) __inline__?
Use static __inline__ in a source file. The compiler will inline when it wants to, but if it can't, it'll be a normal static routine. You can do this in a header file too and is a good fix for a case where a bunch of inline routines were defined extern, but the compiler didnt' inline them and the linker is now complaining about unresolved symbols. This is probably not the best solution, but it's quick and easy.
Don't use extern __inline__, cause there's nothing good to come out of it. There's no body for this routine unless you explicitly define one somewhere. Don't do this, cause anytime you duplicate code, you must maintain multiple version of the same file and someone down the line will forget to update one of them. If the compiler does not inline this routine, the linker will complain about unresolved symbols.
I need more research to figure out what the PO __inline__ is good for. I thought I knew, but I ran a test and it failed.
Why does inline exist?
Inline is suppose to trade space for performance. Instead of have a routine defined once, you put the body of that routine inside the caller. The saves the time it takes to make a function call. The time it takes to make a function call is machine dependent and consist of setting up a new frame on the stack, saving and restoring registers, and a couple of branch/jump instructions. For a small routine, the overhead may take longer than the excution of the routine itself.
My suggestion is to not inline anything. Try the -finline_functions and -O2 option with GCC to see if it makes you code run any faster. Make sure you instrument you code first to guage performance before you start to inline anything to see if it really makes a difference. Linux kernel got a make over where the developers went through all the code and removed a bunch of inlines. The reason was inline can make the code expand which takes more memory. When the code gets bigger, you increase the likely hood of getting cache misses which would negate any benefit you get from inlining and probably make things worse. When you really need to improve your system performance, inlining will probably make very little difference relative to all the other things you can do, but it is a low hanging fruit that gets misused a lot.
GCC supports with and without and there is no functional difference between them.
Declaring a routine __inline__ is merely making a suggestion to the compiler. It' not a guarantee that the routine will be inline. It's a common misunderstanding that inline is functional equlivant to #define.
GCC will not inline a routine when the optimization is off.
GCC option -finline_function ask the compiler to inline any routine it deeds a good candiate for inline, even if that routine is not marked inline.
GCC option -winline will warn you when an inline function can't be inlined.
Should one use extern, static, or PO(plain old) __inline__?
Use static __inline__ in a source file. The compiler will inline when it wants to, but if it can't, it'll be a normal static routine. You can do this in a header file too and is a good fix for a case where a bunch of inline routines were defined extern, but the compiler didnt' inline them and the linker is now complaining about unresolved symbols. This is probably not the best solution, but it's quick and easy.
Don't use extern __inline__, cause there's nothing good to come out of it. There's no body for this routine unless you explicitly define one somewhere. Don't do this, cause anytime you duplicate code, you must maintain multiple version of the same file and someone down the line will forget to update one of them. If the compiler does not inline this routine, the linker will complain about unresolved symbols.
I need more research to figure out what the PO __inline__ is good for. I thought I knew, but I ran a test and it failed.
Why does inline exist?
Inline is suppose to trade space for performance. Instead of have a routine defined once, you put the body of that routine inside the caller. The saves the time it takes to make a function call. The time it takes to make a function call is machine dependent and consist of setting up a new frame on the stack, saving and restoring registers, and a couple of branch/jump instructions. For a small routine, the overhead may take longer than the excution of the routine itself.
My suggestion is to not inline anything. Try the -finline_functions and -O2 option with GCC to see if it makes you code run any faster. Make sure you instrument you code first to guage performance before you start to inline anything to see if it really makes a difference. Linux kernel got a make over where the developers went through all the code and removed a bunch of inlines. The reason was inline can make the code expand which takes more memory. When the code gets bigger, you increase the likely hood of getting cache misses which would negate any benefit you get from inlining and probably make things worse. When you really need to improve your system performance, inlining will probably make very little difference relative to all the other things you can do, but it is a low hanging fruit that gets misused a lot.