Fog Creek Software
g
Discussion Board




Basic C API Design Question

Hi all.

I'm an aspiring C developer working on my first substantial C project: a network library.

My question is a rather simple one about design. Some libraries I've seen have functions that dynamically allocate memory for a struct, fill in the struct's members, then return a pointer to the struct; others just create a temporary struct, fill in the struct's members, then return the contents of struct.

My question is, which one is generally better and which one do you prefer? I know it's faster to pass around pointers instead of the actual objects they point to, but isn't any performance gained negated by the calls to malloc? Which is easier to use and more readable?

Also, do you have any examples of *good* C APIs to emulate?

cprogrammer
Friday, February 20, 2004

My bias is towards minimizing the number of public
structs in a C API - structs should be kept internal
to the API, and created/destroyed by creator and
destroyer functions (I won't call them constructors
and destructors as we aren't talking about C++).

Remember that the structs themselves are part of
the API, and the fewer API elements, the better for
maintenance and improvement of the API without
lots of backward compatibility issues.  If you have a
struct that you pass to several API functions, use
opaque (void) pointers.  Use a creator to init your
struct, and a destroyer to destroy it. 

A well-designed API should _never_ have internal
stuff that points at user data - if the user passes
you strings, copy (strdup) them to your internal structs.
Otherwise, the user may free the data or have it
go out of scope (ie, a local variable) and you'll be
toast.

x
Friday, February 20, 2004

This reply isn't meant to be offensive, so please don't take offense. You don't seem to have much programming experience, so I'd suggest getting a good learning book. You can look at a good API, but that won't tell you why things are done how they are. A book will often say "use technique X when you want Y".

In response to your specific question, as I understand it:

- You can use pointers without malloc(). Look up the unary & operator. This is very often (read: usually) a good idea to use when you're passing around structs.
- Use dynamic allocation when you don't know how much of whatever you'll need.
- Don't have an API keep references to the user's data, as he might free it without you're knowing.

I apologize that I can't reccomend an API or book, but it's been quite a while since I've done C development, and I never got a book. Hopefully someone else can reccomend one here.

Mike Swieton
Friday, February 20, 2004

x, there's no need to pass around void pointers for your struct. You can have this:

mycode.h:

typedef struct _MyStruct MyStruct;

MyStruct* my_struct_new();

my_struct_destroy(MyStruct *s);

and in the implementation file, mycode.c, define what a
MyStruct is:

struct _MyStruct {
  ...
};

As long as you pass *pointers* to MyStruct around in your API, you don't have to worry about defining it; it's an opaque type. This lets you use C's type-checking.

scruffie
Friday, February 20, 2004

To me, I think generic pointers (void*) indicate either: user data (i.e. for a callback), or a function pointer (i.e. for a callback). I don't like seeing void*s elsewhere in the code, it throws me off. I don't consider it idiomatic. Now, if the library writer wants to typedef a custom type to void*, that's fine, but I don't ever want to riddle my code with void*'s that are actually structs. I feel it's misleading.

Mike Swieton
Friday, February 20, 2004

Example of not-so-good design i just saw:

device_get_id(pBus, MEM_BASE, &id);
device_init_reg(pBus, MEM_BASE, addr, data);
device_start(pBus, MEM_BASE, 1);
...

The bad thing here is you always write (pBus, MEM_BASE, ...). You should rather have a new pDev struct that include both the pBus and MEM_BASE parameters.

I think its easier to say whats not good that whats good.

droopycom
Friday, February 20, 2004

"My question is a rather simple one about design. Some libraries I've seen have functions that dynamically allocate memory for a struct, fill in the struct's members, then return a pointer to the struct; others just create a temporary struct, fill in the struct's members, then return the contents of struct."

What do you mean by the second part exactly? It seems like you may be misunderstanding the code, or the code you are looking at has errors.  What is creating a temporary struct if not dynamically allocating it?

You can dynamically allocate (which you mentioned), statically allocate (which means the API is not really creating it), or allocate on the stack ("automatically"), but the latter does not create something suitable for returning to the caller.

As for another style issue, I prefer to have the caller allocate when he is responsible for freeing.  Returning pointers to malloc'd memory and asking the caller to free is more prone to create memory leaks.  If you make the caller allocate themselves, they are more likely to remember to free it.  Mallocing and freeing in the same scope is a nice thing to strive for.

The downside of having the caller allocate is that you have to provide them with the struct definition so they know how much to allocate.  Keeping structs opaque is very useful as someone mentioned.

Roose
Friday, February 20, 2004

A third option is to have the caller pass in an empty struct which your API fills in.  This puts the burden of managing the struct's memory upon the caller.  That has pros and cons depending on whether the struct contains pointers to other allocated memory, etc, etc, etc.

We can name techniques all day here, but ultimately, the answer is going to be "it depends". 

Could you make your example more specific?  State a problem and a couple of possible designs and we can critique them.

Eric Lippert
Friday, February 20, 2004

I tend to do lots of "public" API's, and rarely want even
the internal name of a struct to be public.  I'll do something
for readability like

typedef void APIStruct1;

APIStruct1 *APIStruct1Creator(...);

bool
APIDoSomething(APIStruct1 *handle, ...);

but don't like unresolved, but named, structs.  This
enforces a rigorous separation of my private API stuff
from the API user's code.  After lots of experimenting
with different ways of handling this sort of thing over
the years, my bias is toward keeping the internals of
the API as hidden as the language will let me.

The more invisible the internals, the more unlikely the
user will do something to cause a trouble call :)

x
Friday, February 20, 2004

Mike Swieton,

A void * shouldn't be used for function pointers - it
isn't portable, as I discovered to my annoyance on
a recent project.  The Moto HCS12 has 3 byte function
pointers and 2 byte data pointers, and some code we
were porting to it used void * as a generic "unknown
thing of data pointer size", including function pointers.

We had to use generic function pointers for the things
used for callbacks, etc to make the code work.

x
Friday, February 20, 2004

One thing to watch out for in API design if you are returning pointers to structs that the library allocated is how the user is supposed to free them.

Generally it is a bad idea to tell the user to simply call free() on a struct that you library has malloc()'d.

The problem arises when the version of the C runtime your library is using is different to the one the user is using.  This may happen if, for example, your library is in a dynamically linked library with a statically linked C runtime.

The correct design is to always provide a FreeMyStruct(MyStruct*) API that the user should use.

On Windows you can fudge these issues a little if you use (and tell the user to use) an OS level allocator API e.g. CoTaskMemAlloc /  CoTaskMemFree.

Rob Walker
Friday, February 20, 2004

Do NOT pass a pointer (or reference) to the structure directly.  Do NOT expose your actual structure to the user.  Use a void pointer- but hide it as a handle- essentially what the Windows API does (i.e. HWND, HINSTANCE, etc).  Do NOT leave memory allocation up to the user.  Have a Create and Destroy function (or Open/Close) that returns the HANDLE and all of your API functions would take this HANDLE.  For Example you would end up with the following protypes for your API.

>>>>
MyAPI.h
  typedef void*  HMYAPI;

  // creates a handle and returns
  HRESULT MyApiCreate( int parm1, int parm2, HMYAPI &phMyApi);

  // destroys the handle
  HRESULT MyApiDestroy( HMYAPI hMyApi );

  HRESULT MyApiDoSomething( HMYAPI hMyApi, ... );
  HRESULT MyApiDoSomethingElse( HMYAPI hMyApi, ... );
>>>

This has the following advantages:

  1) You can use this in many different environments with relative ease (in VB and the like the handle is nothing more than long value). 
  2) You can change the structure in the future and it will not effect anyone using it.
  3) You could also use a class instead of the structure and write a C++ interface (the handle would point to the class instead of a struct).  You would then have a C++ interface to your library and a C/VB interface to your library thru the API.

There are other more andvanced things that can benefit you by doing this depending on your aplication- since you can hide all kinds of extra data in your handle (user is none the wiser) like sharing resources, thread safety for shared components, etc...

Mike

MikeG
Friday, February 20, 2004

Some libraries allow the caller to pass a pointer to an uninitialized struct to their foobar_init() function. This allows the caller to embed the struct in a larger struct or allocate the struct using some special memory allocator. In some systems, especially embedded programming, the caller wants to use a different memory allocator than just malloc() and free().

For some good library design tips in C, check out David Hanson's book "C Interfaces and Implementations : Techniques for Creating Reusable Software".

runtime
Friday, February 20, 2004

Thanks for all of the suggestions.

For Roose, what I ment by the second part was something like this:

thing_t thing = new_thing("one", "two", "three");
...
thing_t new_thing(char *param1, char *param2, char *param3)
{
    thing_t thing;
    thing.param1 = param1;
    thing.param2 = param2;
    thing.param3 = param3;
    return thing;
}

Instead of this:

thing_t *thing = new_thing("one", "two", "three");
...
thing_t *new_thing(char *param1, char *param2, char *param3)
{
    thing_t *thing = (thing_t *) malloc(sizeof(thing_t));
    if (thing == NULL)
        exception("Couldn't allocate memory.");

    thing->param1 = param1;
    thing->param2 = param2;
    thing->param3 = param3;

    return thing;
}

cprogrammer
Friday, February 20, 2004

Your second and third examples will both result in
dangerous situations.  The second one is returning
a pointer to what will be stack garbage and will fail
hard.  The third one is setting up pointers to user-provided
data and is likely to result in hard-to-find and bizarre
errors, since the user will often pass in their own stacks.
You may want to check out a C book to see why this
is bad...

I'm assuming the first is COPYING the  three
arguments - if so, it may be ok, even though returning a
visible structure is not a wise thing in an API.

x
Friday, February 20, 2004

> I'm an aspiring C developer working on my first substantial C project: a network library

I'm not trying to be funny or negative, but how appropriate is it that a novice C developer should be developing a library for their first substantial project?

If you develop a library many other developers depend on it

Assuming this is a business driven exercise, rather than a purely intellectual challenge, I would have thought it more appropriate the novice sets out by developing applications using other people's libraries

If you don't know your way round C, how easy is it to judge whether you are providing the appropriate tools for others developers?

S. Tanna
Friday, February 20, 2004

Right, that is what I thought... you should never do this:

thing_t f()
{
  thing_t thing;
  ...
  return thing;
}

This may happen to work in some cases where the caller uses it before calling any other functions.  However you're returning technically garbage data, and any more complicated usage will result in thing getting trashed with random data.  thing does not exist once you have exited the function.

If you are seeing this, then you should fix it right away.

Roose
Friday, February 20, 2004

Roose, You might want to rething that.

Go
Friday, February 20, 2004

OK - I didn't notice that the func is actually returning
a struct, not a struct pointer.  Returning a struct in
this fashion is technically legal, but generally not good
practice in an API (or anywhere else, IMO) since there's
an implicit structure copy being done when you do the
return - and since structure passing is not always done
very well, compilers may not deal with this perfectly and
can cause bugs.  To have best predictability, known
performance, as well as portability in environments where
you may have limited stack space, structures should be
passed between functions as pointers only (or as
HANDLE's or other things which resolve to pointers).

<soapbox>
One of the nice things about C is that you have exact
control over what's going on at all times, and things
can't happen "under the covers" in the fashion that
they do in C++ or other OO languages.  This is a good
thing IMO in the environments where C is still widely
used such as embedded systems and OS kernels or
internals, where the benefits of abstraction tend to
be less than the benefits of knowing exactly what the
code is doing.  (If you want abstraction, that's what
API's are for :)

For this reason, I don't like structure assignment and
structures as parameters and returns - the mechanisms
for them are too "under the covers" for me to like.
</soapbox>

x
Friday, February 20, 2004

Maybe then it would be practical to have a linked list
of allocated structures and utility function to count them
so at exit he could check if the list is empty or if he
leaves the OS to free used space on exit, at least
he can see if the counter is within expected range.

On the other side, if there's going to be a huge number of
these structures, he may forget the list (speed penalty) but
then he could keep a counter and change it at every
alloc/free (or use the list only in debug mode).

Personally, I like doing malloc only at initial set-up or within
some library because that way it's really hard to forget
releasing something.

After that, he needs to make clear if users of that library
would access elements directly (s->elemX = 255)
or through set/get functions (lib_setElemX (255)) and
than stick to it forever.

VPC
Friday, February 20, 2004

Oh whoops am I wrong?  I guess it does return by value.  I never use that, and I never see it used, but I guess you are right.

What's bad is returning a pointer to anything on the stack.

But anyway, I would say it is not idiomatic C to return structs from functions, for efficiency I suppose, but efficiency isn't always the main concern.

Roose
Friday, February 20, 2004

Passing structures by value has some legitmate uses. A common use might be in an API for handling say 64-bit integers.

For compilers that support 64-bit integers we might have

  typedef unsigned long long U64;

and for compilers that don't have such support:

  typedef struct {
    unsigned long lo;
    unsigned long hi;   
  } U64;

The API would be the same regardless of the underlying implementation, for example:

  U64 Do64BitMath(U64 a, U64 b);

However for larger or more complicated structures passing by value is not a good idea due to the overheads.

Go
Friday, February 20, 2004

What I do is construct an opaque type around a typedef and a struct:

typedef struct foo { ... } *Foo;

I then define functions to create and destroy the structures:

int FooNew(Foo *pfoo, ... );
int FooFree(Foo *pfoo);

All the rest of the API dealing with Foo dispense with the pointer designation:

int FooFrob(Foo foo, int blah);
int FooManipulate(Foo foo, char *frob);

So the end user of the library does stuff like:

{
  Foo myfoo;

  if (FooNew(&myfoo) == ERROR) { ... }
  FooFrob(myfoo,34);
  FooManipulate(myfoo,"Hello");
  FooFree(&myfoo);
}

I've used this technique to implement an API with two different implementations.  The first used a Foo that was a typedefed structure and was actually a pointer; another implementation of the same API the Foo was actually an integer (implementation detail-it was an index into an array of structures maintained by the library).  None of the code calling the API had to change.

Then there's the issue of what functions to include in an API.  I tend to favor small functions that conceptually do *one* thing with as few paramters as possible.  Makes testing easier and allows a more or less Lego-like approach to writing code (one library I wrote handled HTTP from the client side.  I had one function that would establish the connection and obtain the document, but *not* follow redirects.  Another function, using the first, would make a connection and follow any redirects.  Most of the time I would call the second function, but there were cases where it was nice to have the lower level function available).

Sean Conner
Saturday, February 21, 2004

*  Recent Topics

*  Fog Creek Home