Lib design

Memory layout of C++ objects

http://www.vishalchovatiya.com/memory-layout-of-cpp-object/

DLL binary interface compatibility

Highly recommended to use pimpl
Do not make any change in header, EXCEPT: (1) adding a non-member function, (2) adding a NON-VIRTUAL member function.

Extract from https://softwareengineering.stackexchange.com/questions/213259/whats-is-the-point-of-pimpl-pattern-while-we-can-use-interface-for-the-same-pur:

First, PImpl is usually used for non-polymorphic classes. And when a polymorphic class has PImpl, it usually remains polymorphic, that is still implements interfaces and overrides virtual methods from base class and so on. So simpler implementation of PImpl is not interface, it is a simple class directly containing the members!

There are three reasons to use PImpl:

Making the binary interface (ABI) independent of the private members. It is possible to update a shared library without recompiling the dependent code, but only as long as the binary interface remains the same. Now almost any change in header, except for adding a non-member function and adding a non-virtual member function, changes the ABI. The PImpl idiom moves definition of the private members into the source and thus decouples the ABI from their definition. See Fragile Binary Interface Problem
When a header changes, all sources including it have to be recompiled. And C++ compilation is rather slow. So by moving definitions of the private members into the source, the PImpl idiom reduces the compilation time, as fewer dependencies need to be pulled in the header, and reduces the compilation time after modifications even more as the dependents don't need to be recompiled (ok, this applies to interface+factory function with hidden concrete class too).
For many classes in C++ exception safety is an important property. Often you need to compose several classes in one so that if during operation on more than one member throws, none of the members is modified or you have operation that will leave the member in inconsistent state if it throws and you need the containing object to remain consistent. In such case you implement the operation by creating new instance of the PImpl and swap them when the operation succeeds.

Actually interface can also be used for implementation hiding only, but has following disadvantages:

Adding non-virtual method does not break ABI, but adding a virtual one does. Interfaces therefore don't allow adding methods at all, PImpl does.
Inteface can only be used via pointer/reference, so the user has to take care of proper resource management. On the other hand classes using PImpl are still value types and handle the resources internally.
Hidden implementation can't be inherited, class with PImpl can.

And of course interface won't help with exception safety. You need the indirection inside the class for that.

Fragile binary problem

http://wiki.c2.com/?FragileBinaryInterfaceProblem

The FragileBinaryInterfaceProblem (that is especially common in CeePlusPlus) is as follows: Developer A writes a base class, intending it to be derived from. Developer B uses a library from A containing said base class, and writes and uses a derived class. Developer A delivers to B a newer version of the base class which is supposedly backwards-compatible, having only a minor change (such as a new read-only method). Developer B installs the library from A, relinks (or reloads) his application, and it crashes and burns in mysterious ways. B curses C++ (or whichever language), recompiles, and things work fine.

That's the FragileLanguageProblem. CeePlusPlus is known to need to recompile everything at least daily.

The problem is that in languages such as C++ which use techniques such as simple offset calculations for data members and simple VeeTable calculations for virtual functions, even a simple change such as adding a function will cause the derived class code to use incorrect offsets. (There are lots of other implementation details that most C++ implementations require derived class code to know about base classes). Many places and systems which do distributed or incremental development in C++ have long lists of rules about which transformations to a base class are "safe" (won't require derived classes to be recompiled) and which aren't. When derived class code is in the hands of a customer who cannot or will not recompile (ie an EndUser); the FragileBinaryInterfaceProblem is a "big" problem; as the customer has software that once worked, doesn't now, and he doesn't know why. (See also DllHell.)

Other languages, such as JavaLanguage and CsharpLanguage, use other techniques for member access (lookup by name, name hashing, etc.) to avoid this problem. The tradeoff can be performance, however DynamicCompilation can completely eliminate any performance penalty.

C++ tends to sacrifice robustness for performance, other languages make the opposite tradeoff. (And the LanguageBigots in both camps will loudly claim that this is why all but their favorite language is unsuitable for any software development whatsoever.)

The FragileBaseClassProblem, on the other hand, is language-independent; it occurs when a base and a derived class both make assumptions about the behavior of the other that are violated when one (or the other) changes; the most damaging case is when a base class provided by a vendor or other entity changes, breaking someone's derived class. This version of the problem is an OO-design problem, not a language implementation problem; recompiling the code won't help.

-- engineer_scotty (ScottJohnson)

A different, but related problem with CeePlusPlus is that there is (for many platforms) no standardized binary interface for the representation of objects, which means that attempting to link to C++ classes in separate modules (or static-link libraries) that are written using different compilers (or perhaps even different versions of the same compiler) will not work. The ComponentObjectModel is one attempt to address this on the MicrosoftWindows platform, and there are others as well. -- MikeSmith

Many Unix platforms, including LinuxOs, now have standardized on a C++ ABI. Of course, on many such platforms 99% of the world uses GCC, so inter-complier compatibility isn't a major issue--except when dealing with code compiled with older (pre 3.0 mainly) versions of GCC.

It's also somewhat inaccurate to describe COM as a solution to the C++ ABI problem. C++ objects are not necessarily COM objects--COM supports C and VisualBasic as well. Many C++ features such as MultipleInheritance are not supported by COM; COM provides introspection features (like IUnknown::QueryInterface) not supported natively by C++. That said, COM was certainly a useful technology for the Windows platform.

I said "an attempt to address", not "a solution for".

It wasn't clear whether the comment below applied to the FragileBaseClassProblem or the FragileBinaryInterfaceProblem. Possibly both.

Oddly enough, these are almost verbatim the arguments presented for the kind of versioning implemented in CsharpLanguage. To the extent it is mentioned in the partial spec, section 1.17 could be lifted verbatim and placed on this page (and on BinaryCompatibility).

-- StevenNewton

VTable

http://wiki.c2.com/?VeeTable

Commonly "v-table" or simply "vtable", short for "virtual table" (or maybe even "virtual dispatch table").

Primarily refers to the (hidden) data structure found in most if not all CeePlusPlus implementations, used to implement DynamicDispatch and RunTimeTypeIdentification (C++'s DynamicCast and related features). Many other OO languages have a similar feature.

At the simplest, a vtable is an array of pointers to functions. Any object with virtual methods contains an internal pointer to a vtable. The vtable layout is implementation dependent, and may frequently contain type metadata (to support DynamicCast) and correction factors needed to support MultipleInheritance. Detailed descriptions of how multiple inheritance is (often) implemented in C++ can be found in either ProgrammingLanguagePragmatics or TheDesignAndEvolutionOfCpp.

The design of vtables makes DynamicDispatch relatively fast in C++. A virtual function call is simply a linear lookup into the function pointer array, with corrections applied for MultipleInheritance. Unfortunately, this leads to the FragileBinaryInterfaceProblem. Ignoring the issue of MultipleInheritance, suppose you have a class D which has several virtual functions defined, like this:

class D : public B

{

public:

virtual ~D();

virtual void func0(void);

virtual int func1(int,double);

virtual bool func2 (const char *);

};

And suppose B is defined thusly:

class B

{

private:

int foo;

int bar;

public:

virtual ~B() = 0;

virtual void func0(void) = 0;

virtual void func0_5(void *);

virtual void func1(int,double);

};

What will the vtables look like? Probably something like this;

For B:

0: implementation defined stuff

1: pointer to B::~B (the destructor)

2: pointer to "pure virtual function" code, for func0

3: pointer to B::func0_5

4: pointer to B::func1

For D:

0: implementation defined stuff

1: Pointer to D::~D (the destructor)

2: pointer to D::func0

3: pointer to B::func0_5. (It wasn't overridden in D)

4: pointer to D::func1

5: pointer to D::func2

Note that all the virtual functions newly defined in class D appear in D's vtable after all virtual functions that were defined in B, regardless of whether or not D overrides them or not. This is true regardless of the definition order in D; though newly-defined virtual functions will appear in order.

When client code (outside either class) wants to call a given function, say func1, the compiler "knows" that it's in slot 4. The following code, in pseudo-assembly, is emitted. Assume a pointer to the object (the "this" pointer) is found in r1. Due to how C++ lays out objects, the vtable pointer will likely be in the second machine word after the start of the object in both cases. Assume the numbers in the indirect addressing form num[rx] counts word offsets. Assume that the arguments to func1 are already in the appropriate place.

load r0, 2[r1] ; get the vtable

load r0, 4[r0] ; get 4th entry of vtable

jsr r0

What happens if we add a new function, func0_75, to B? Make it a harmless "test" function, defined so:

virtual bool func0_75 (void) const;

The two new vtables become:

For B:

0: implementation defined stuff

1: pointer to B::~B (the destructor)

2: pointer to "pure virtual function" code, for func0

3: pointer to B::func0_5

4: pointer to B::func0_75

5: pointer to B::func1

For D:

0: implementation defined stuff

1: Pointer to D::~D (the destructor)

2: pointer to D::func0

3: pointer to B::func0_5. (It wasn't overridden in D)

4: pointer to B::func0_75

5: pointer to B::func1

6: pointer to D::func2

When this is done, the correct calling sequence for func1 becomes

load r0, 2[r1] ; get the vtable

load r0, 5[r0] ; get 5th entry of vtable

jsr r0

In other words, any code using either the base class or any of its derived classes must be recompiled; else the program will likely crash. This is especially a problem if the client code and the code for B are in a different UnitOfDeployment. A harmless semantic change (adding a query function) causes all hell to break loose.

MultipleInheritance complicates this scheme quite a bit; and I'm too tired to write the dirty details here. See one of the two books above if you're interested in how that works.See http://www.codeguru.com/cpp/tic/tic0158.shtml

Page updated

Report abuse