inline functions

Inline Functions

What’s the deal with inline functions?

When the compiler inline-expands a function call, the function’s code gets inserted into the caller’s code stream (conceptually similar to what happens with a #define macro). This can, depending on a zillion other things, improve performance, because the optimizer can procedurally integrate the called code — optimize the called code into the caller.

There are several ways to designate that a function is inline, some of which involve the inline keyword, others do not. No matter how you designate a function as inline, it is a request that the compiler is allowed to ignore: the compiler might inline-expand some, all, or none of the places where you call a function designated as inline. (Don’t get discouraged if that seems hopelessly vague. The flexibility of the above is actually a huge advantage: it lets the compiler treat large functions differently from small ones, plus it lets the compiler generate code that is easy to debug if you select the right compiler options.)

What’s a simple example of procedural integration?

Consider the following call to function g():

void f()
{
  int x = /*...*/;
  int y = /*...*/;
  int z = /*...*/;
  // ...code that uses x, y and z...
  g(x, y, z);
  // ...more code that uses x, y and z...
}

Assuming a typical C艹 implementation that has registers and a stack, the registers and parameters get written to the stack just before the call to g(), then the parameters get read from the stack inside g() and read again to restore the registers while g() returns to f(). But that’s a lot of unnecessary reading and writing, especially in cases when the compiler is able to use registers for variables x, y and z: each variable could get written twice (as a register and also as a parameter) and read twice (when used within g() and to restore the registers during the return to f()).

void g(int x, int y, int z)
{
  // ...code that uses x, y and z...
}

If the compiler inline-expands the call to g(), all those memory operations could vanish. The registers wouldn’t need to get written or read since there wouldn’t be a function call, and the parameters wouldn’t need to get written or read since the optimizer would know they’re already in registers.

Naturally your mileage may vary, and there are a zillion variables that are outside the scope of this particular FAQ, but the above serves as an example of the sorts of things that can happen with procedural integration.

Do inline functions improve performance?

Yes and no. Sometimes. Maybe.

There are no simple answers. inline functions might make the code faster, they might make it slower. They might make the executable larger, they might make it smaller. They might cause thrashing, they might prevent thrashing. And they might be, and often are, totally irrelevant to speed.

inline functions might make it faster: As shown above, procedural integration might remove a bunch of unnecessary instructions, which might make things run faster.

inline functions might make it slower: Too much inlining might cause code bloat, which might cause “thrashing” on demand-paged virtual-memory systems. In other words, if the executable size is too big, the system might spend most of its time going out to disk to fetch the next chunk of code.

inline functions might make it larger: This is the notion of code bloat, as described above. For example, if a system has 100 inline functions each of which expands to 100 bytes of executable code and is called in 100 places, that’s an increase of 1MB. Is that 1MB going to cause problems? Who knows, but it is possible that that last 1MB could cause the system to “thrash,” and that could slow things down.

inline functions might make it smaller: The compiler often generates more code to push/pop registers/parameters than it would by inline-expanding the function’s body. This happens with very small functions, and it also happens with large functions when the optimizer is able to remove a lot of redundant code through procedural integration — that is, when the optimizer is able to make the large function small.

inline functions might cause thrashing: Inlining might increase the size of the binary executable, and that might cause thrashing.

inline functions might prevent thrashing: The working set size (number of pages that need to be in memory at once) might go down even if the executable size goes up. When f() calls g(), the code is often on two distinct pages; when the compiler procedurally integrates the code of g() into f(), the code is often on the same page.

inline functions might increase the number of cache misses: Inlining might cause an inner loop to span across multiple lines of the memory cache, and that might cause thrashing of the memory-cache.

inline functions might decrease the number of cache misses: Inlining usually improves locality of reference within the binary code, which might decrease the number of cache lines needed to store the code of an inner loop. This ultimately could cause a CPU-bound application to run faster.

inline functions might be irrelevant to speed: Most systems are not CPU-bound. Most systems are I/O-bound, database-bound or network-bound, meaning the bottleneck in the system’s overall performance is the file system, the database or the network. Unless your “CPU meter” is pegged at 100%, inline functions probably won’t make your system faster. (Even in CPU-bound systems, inline will help only when used within the bottleneck itself, and the bottleneck is typically in only a small percentage of the code.)

There are no simple answers: You have to play with it to see what is best. Do not settle for simplistic answers like, “Never use inline functions” or “Always use inline functions” or “Use inline functions if and only if the function is less than N lines of code.” These one-size-fits-all rules may be easy to write down, but they will produce sub-optimal results.

How can inline functions help with the tradeoff of safety vs. speed?

In straight C, you can achieve “encapsulated structs” by putting a void* in a struct, in which case the void* points to the real data that is unknown to users of the struct. Therefore users of the struct don’t know how to interpret the stuff pointed to by the void*, but the access functions cast the void* to the appropriate hidden type. This gives a form of encapsulation.

Unfortunately it forfeits type safety, and also imposes a function call to access even trivial fields of the struct (if you allowed direct access to the struct’s fields, anyone and everyone would be able to get direct access since they would of necessity know how to interpret the stuff pointed to by the void*; this would make it difficult to change the underlying data structure).

Function call overhead is small, but can add up. C艹 classes allow function calls to be expanded inline. This lets you have the safety of encapsulation along with the speed of direct access. Furthermore the parameter types of these inline functions are checked by the compiler, an improvement over C’s #define macros.

Why should I use inline functions instead of plain old #define macros?

Because #define macros are evil in 4 different ways: evil#1, evil#2, evil#3, and evil#4. Sometimes you should use them anyway, but they’re still evil.

Unlike #define macros, inline functions avoid infamous macro errors since inline functions always evaluate every argument exactly once. In other words, invoking an inline function is semantically just like invoking a regular function, only faster:

// A macro that returns the absolute value of i
#define unsafe(i)  \
        ( (i) >= 0 ? (i) : -(i) )

// An inline function that returns the absolute value of i
inline
int safe(int i)
{
  return i >= 0 ? i : -i;
}

int f();

void userCode(int x)
{
  int ans;

  ans = unsafe(x++);   // Error! x is incremented twice
  ans = unsafe(f());   // Danger! f() is called twice

  ans = safe(x++);     // Correct! x is incremented once
  ans = safe(f());     // Correct! f() is called once
}

Also unlike macros, argument types are checked, and necessary conversions are performed correctly.

Macros are bad for your health; don’t use them unless you have to.

How do you tell the compiler to make a non-member function inline?

When you declare an inline function, it looks just like a normal function:

void f(int i, char c);

But when you define an inline function, you prepend the function’s definition with the keyword inline, and you put the definition into a header file:

inline
void f(int i, char c)
{
  // ...
}

Note: It’s imperative that the function’s definition (the part between the {...}) be placed in a header file, unless the function is used only in a single .cpp file. In particular, if you put the inline function’s definition into a .cpp file and you call it from some other .cpp file, you’ll get an “unresolved external” error from the linker.

How do you tell the compiler to make a member function inline?

The declaration of an inline member function looks just like the declaration of a non-inline member function:

class Fred {
public:
  void f(int i, char c);
};

But when you define an inline member function (the {...} part), you prepend the member function’s definition with the keyword inline, and you (almost always) put the definition into a header file:

inline
void Fred::f(int i, char c)
{
  // ...
}

The reason you (almost always) put the definition (the {...} part) of an inline function in a header file is to avoid “unresolved external” errors from the linker. That error will occur if you put the inline function’s definition in a .cpp file and if that function is called from some other .cpp file.

Is there another way to tell the compiler to make a member function inline?

Yep: define the member function in the class body itself:

class Fred {
public:
  void f(int i, char c)
    {
      // ...
    }
};

This is often more convenient than the alternative of defining your inline functions outside the class body. However, although it is easier on the person who writes the class, it is harder on all the readers since it mixes what a class does (the external behavior) with how it does it (the implementation). Because of this mixture, you should define all your member functions outside the class body if your class is intended to be highly reused and your class’s documentation is the header file itself. This is another application of Spock’s logic: the needs of the many (all the people reusing your class) outweigh the needs of the few (those who maintain your class’s implementation) or the one (the class’s original author).

Of course if you are not writing a highly reused class, or if you are providing documentation of your class’s external behavior outside the header files (e.g., HTML or PDF or whatever), then you should probably define your inline functions inside the class body proper, as that will simplify your development as well as maintenance of the class’s implementation.

This approach is further exploited in the next FAQ.

With inline member functions that are defined outside the class, is it best to put the inline keyword next to the declaration within the class body, next to the definition outside the class body, or both?

Definition only.

Here is an example of an inline member function defined outside the class body:

class Foo {
public:
  void method();  // Best practice: Don't put the inline keyword here
  // ...
};

inline void Foo::method()  // Best practice: Put the inline keyword here
{
  // ...
}

Recall that you should define your inline member function outside the class body when your class is intended to be highly reused and your reusers will read your header file to determine what the class does — its observable semantics or external behavior. In that case…

  • The public: part of the class body is where you describe the observable semantics of the class, its public member functions, its friend functions, and any other features of the class to be reused by others. The goal is to keep this public: part public — to drain the public: part of any inklings of anything that is unimportant to reusers. If “it” can’t be observed from the caller’s code, “it” shouldn’t be in the public: part of the class body.
  • The other parts of the class, including non-public: part of the class body, the definitions of your member and friend functions, etc. are pure implementation. Try not to describe any observable semantics that were not already described in the class’s public: part. If “it” can be observed from the caller’s code, “it” should be described in the public: part of the class body; “it” might also appear in the non-public: parts of the class, but “it” should be specified, somehow, in the public: part.

From a practical standpoint, this separation makes life easier and safer for your class’s reusers. Say Chuck wants to simply use your reusable class. Because you read this FAQ and used the above separation, Chuck will see, in the public: part of your class, everything he needs to see and nothing he doesn’t need to see. Your class’s public: part will be Chuck’s one-stop-shop for your class’s observable semantics AKA external behavior. By purifying your class’s public: parts, you made Chuck’s life both easier (he needs to look in only one spot) and safer (his pure mind isn’t polluted by implementation minutiae).

Back to inline-ness: the decision of whether a function is or is not inline is an implementation detail that does not change the observable semantics (the “meaning”) of a call. Therefore the inline keyword should not go within the class’s public: (or the protected: or private:) part, so it needs to go next to the function’s definition.

*NOTE: most people use the terms “declaration” and “definition” to differentiate the above two places. For example, they might say, “Should I put the inline keyword next to the declaration or the definition?” In case you’re talking to a language lawyer, it would be more precise to talk about a non-defining declaration and a defining declaration, since definitions are also declarations.