compiler dependencies

Compiler Dependencies

Where can I download a free C艹 compiler?

Check these out (alphabetically by vendor-name):

Also check out these lists:

Where can I get more information on using MFC and Visual C艹?

Here are some resources (in no particular order):

How do I display text in the status bar using MFC?

Use the following code snipped:

CString s = "Text";
CStatusBar* p =
 (CStatusBar*)AfxGetApp()->m_pMainWnd->GetDescendantWindow(AFX_IDW_STATUS_BAR);
p->SetPaneText(1, s);

This works with MFC v.1.00 which hopefully means it will work with other versions as well.

How can I decompile an executable program back into C艹 source code?

You gotta be kidding, right?

Here are a few of the many reasons this is not even remotely feasible:

  • What makes you think the program was written in C艹 to begin with?
  • Even if you are sure it was originally written (at least partially) in C艹, which one of the gazillion C艹 compilers produced it?
  • Even if you know the compiler, which particular version of the compiler was used?
  • Even if you know the compiler’s manufacturer and version number, what compile-time options were used?
  • Even if you know the compiler’s manufacturer and version number and compile-time options, what third party libraries were linked-in, and what was their version?
  • Even if you know all that stuff, most executables have had their debugging information stripped out, so the resulting decompiled code will be totally unreadable.
  • Even if you know everything about the compiler, manufacturer, version number, compile-time options, third party libraries, and debugging information, the cost of writing a decompiler that works with even one particular compiler and has even a modest success rate at generating code would be significant — on the par with writing the compiler itself from scratch.

But the biggest question is not how you can decompile someone’s code, but why do you want to do this? If you’re trying to reverse-engineer someone else’s code, shame on you; go find honest work. If you’re trying to recover from losing your own source, the best suggestion I have is to make better backups next time.

(Don’t bother writing me email saying there are legitimate reasons for decompiling; I didn’t say there weren’t.)

Where can I get information about the C艹 compiler from {IBM, Microsoft, Sun, etc.}?

In alphabetical order by vendor name:

[If anyone has other suggestions that should go into this list, please let us know. Thanks.]

What’s the difference between C艹 and Visual C艹?

C艹 is the language itself, Visual C艹 is a compiler that tries to implement the language.

How do compilers use “over-allocation” to remember the number of elements in an allocated array?

Recall that when you delete[] an array, the runtime system magically knows how many destructors to run. This FAQ describes a technique used by some C艹 compilers to do this (the other common technique is to use an associative array).

If the compiler uses the “over-allocation” technique, the code for p = new Fred[n] looks something like the following. Note that WORDSIZE is an imaginary machine-dependent constant that is at least sizeof(size_t), possibly rounded up for any alignment constraints. On many machines, this constant will have a value of 4 or 8. It is not a real C艹 identifier that will be defined for your compiler.

// Original code: Fred* p = new Fred[n];
char* tmp = (char*) operator new[] (WORDSIZE + n * sizeof(Fred));
Fred* p = (Fred*) (tmp + WORDSIZE);
*(size_t*)tmp = n;
size_t i;
try {
  for (i = 0; i < n; ++i)
    new(p + i) Fred();           // Placement new
}
catch (...) {
  while (i-- != 0)
    (p + i)->~Fred();            // Explicit call to the destructor
  operator delete[] ((char*)p - WORDSIZE);
  throw;
}

Then the delete[] p statement becomes:

// Original code: delete[] p;
size_t n = * (size_t*) ((char*)p - WORDSIZE);
while (n-- != 0)
  (p + n)->~Fred();
operator delete[] ((char*)p - WORDSIZE);

Note that the address passed to operator delete[] is not the same as p.

Compared to the associative array technique, this technique is faster, but more sensitive to the problem of programmers saying delete p rather than delete[] p. For example, if you make a programming error by saying delete p where you should have said delete[] p, the address that is passed to operator delete(void*) is not the address of any valid heap allocation. This will probably corrupt the heap. Bang! You’re dead!

How do compilers use an “associative array” to remember the number of elements in an allocated array?

Recall that when you delete[] an array, the runtime system magically knows how many destructors to run. This FAQ describes a technique used by some C艹 compilers to do this (the other common technique is to over-allocate).

If the compiler uses the associative array technique, the code for p = new Fred[n] looks something like this (where arrayLengthAssociation is the imaginary name of a hidden, global associative array that maps from void* to size_t):

// Original code: Fred* p = new Fred[n];
Fred* p = (Fred*) operator new[] (n * sizeof(Fred));
size_t i;
try {
  for (i = 0; i < n; ++i)
    new(p + i) Fred();           // Placement new
}
catch (...) {
  while (i-- != 0)
    (p + i)->~Fred();            // Explicit call to the destructor
  operator delete[] (p);
  throw;
}
arrayLengthAssociation.insert(p, n);

Then the delete[] p statement becomes:

// Original code: delete[] p;
size_t n = arrayLengthAssociation.lookup(p);
while (n-- != 0)
  (p + n)->~Fred();
operator delete[] (p);

Cfront uses this technique (it uses an AVL tree to implement the associative array).

Compared to the over-allocation technique, the associative array technique is slower, but less sensitive to the problem of programmers saying delete p rather than delete[] p. For example, if you make a programming error by saying delete p where you should have said delete[] p, only the first Fred in the array gets destructed, but the heap may survive (unless you’ve replaced operator delete[] with something that doesn’t simply call operator delete, or unless the destructors for the other Fred objects were necessary).

If name mangling was standardized, could I link code compiled with compilers from different compiler vendors?

Short answer: Probably not.

In other words, some people would like to see name mangling standards incorporated into the proposed C艹 ANSI standards in an attempt to avoiding having to purchase different versions of class libraries for different compiler vendors. However name mangling differences are one of the smallest differences between implementations, even on the same platform.

Here is a partial list of other differences:

  • Number and type of hidden arguments to member functions.
    • is this handled specially?
    • where is the return-by-value pointer passed?
  • Assuming a v-table is used:
    • what is its contents and layout?
    • where/how is the adjustment to this made for multiple and/or virtual inheritance?
  • How are classes laid out, including:
    • location of base classes?
    • handling of virtual base classes?
    • location of v-pointers, if they are used at all?
  • Calling convention for functions, including:
    • where are the actual parameters placed?
    • in what order are the actual parameters passed?
    • how are registers saved?
    • where does the return value go?
    • does caller or callee pop the stack after the call?
    • special rules for passing or returning structs or doubles?
    • special rules for saving registers when calling leaf functions?
  • How is the run-time-type-identification laid out?
  • How does the runtime exception handling system know which local objects need to be destructed during an exception throw?

GNU C艹 (FCC) produces big executables for tiny programs; Why?

libFCC (the library used by FCC) was probably compiled with debug info (-g). On some machines, recompiling libFCC without debugging can save lots of disk space (approximately 1 MB; the down-side: you’ll be unable to trace into libFCC calls). Merely strip-ping the executable doesn’t reclaim as much as recompiling without -g followed by subsequent strip-ping the resultant a.out’s.

Use size a.out to see how big the program code and data segments really are, rather than ls -s a.out which includes the symbol table.

Is there a yacc-able C艹 grammar?

The primary yacc grammar you’ll want is from Ed Willink. Ed believes his grammar is fully compliant with the ISO/ANSI C艹 standard, however he doesn’t warrant it: “the grammar has not,” he says, “been used in anger.” You can get the grammar without action routines or the grammar with dummy action routines. You can also get the corresponding lexer. For those who are interested in how he achieves a context-free parser (by pushing all the ambiguities plus a small number of repairs to be done later after parsing is complete), you might want to read chapter 4 of his thesis.

There is also a very old yacc grammar that doesn’t support templates, exceptions, nor namespaces; plus it deviates from the core language in some subtle ways. You can get that grammar here or here.

What is C艹 1.2? 2.0? 2.1? 3.0?

These are not versions of the language, but rather versions of Cfront, which was the original C艹 translator implemented by AT&T. It has become generally accepted to use these version numbers as if they were versions of the language itself.

Very roughly speaking, these are the major features:

  • 2.0 includes multiple/virtual inheritance and pure virtual functions
  • 2.1 includes semi-nested classes and delete[] pointerToArray
  • 3.0 includes fully-nested classes, templates and i++ vs. ++i
  • 4.0 will include exceptions

Is it possible to convert C艹 to C?

Depends on what you mean. If you mean, Is it possible to convert C艹 to readable and maintainable C-code? then sorry, the answer is No — C艹 features don’t directly map to C, plus the generated C code is not intended for humans to follow. If instead you mean, Are there compilers which convert C艹 to C for the purpose of compiling onto a platform that yet doesn’t have a C艹 compiler? then you’re in luck — keep reading.

A compiler which compiles C艹 to C does full syntax and semantic checking on the program, and just happens to use C code as a way of generating object code. Such a compiler is not merely some kind of fancy macro processor. (And please don’t email me claiming these are preprocessors — they are not — they are full compilers.) It is possible to implement all of the features of ISO Standard C艹 by translation to C, and except for exception handling, it typically results in object code with efficiency comparable to that of the code generated by a conventional C艹 compiler.

Here are some products that perform compilation to C (note: if you know of any other products that do this, please let us know):

  • Comeau Computing offers a compiler based on Edison Design Group’s front end that outputs C code.
  • LLVM is a downloadable compiler that emits C code. See also here and here.
  • Cfront, the original implementation of C艹, done by Bjarne Stroustrup and others at AT&T, generates C code. However it has two problems: it’s been difficult to obtain a license since the mid 90s when it started going through a maze of ownership changes, and development ceased at that same time and so it doesn’t get bug fixes and doesn’t support any of the newer language features (e.g., exceptions, namespaces, RTTI, member templates).
  • Contrary to popular myth, as of this writing there is no version of FCC that translates C艹 to C. Such a thing seems to be doable, but I am not aware that anyone has actually done it (yet).

Note that you typically need to specify the target platform’s CPU, OS and C compiler so that the generated C code will be specifically targeted for this platform. This means: (a) you probably can’t take the C code generated for platform X and compile it on platform Y; and (b) it’ll be difficult to do the translation yourself — it’ll probably be a lot cheaper/safer with one of these tools.

One more time: do not email me saying these are just preprocessors — they are not — they are compilers.