Fog Creek Software
g
Discussion Board




"C" 32 bit short?

Programing in ANSI C

Here's the situation:
I don't know where my code will be compiled/executed, but I do know that the compiler used will be ANSI compliant.
The program will write out data records to a file.
These data records will be read later on a system where a short is 16 bits and int and long are both 32 bits.

Here are my questions:
Can I use a type of short in the struct in my program, or could there be a possibility that the program will be compiled on some system where short is 32 bits (does such a system exist currently)?

If I need to expect the possibility of a 32 bit short or larger (in the future), then how can I guarantee that the value stored in the short will be written to the file as 16 bits?

Danny Hamilton
Friday, October 3, 2003

A 32 bit short is standard compliant.

The only guarantee is char <= short <= int <= long.

You could always pull out the 16 least significant bits with &.

Microsoft gives a header LIMITS.H, but I couldn't find a standard-compliant header.

Anonymous
Friday, October 3, 2003

I've found that the best practice is not to write export files in binary format.

Use a simple ASCII format, such as fixed-length or comma-separated values; or even XML if you must, and your data will be usable in any combination of processor, operating system or language.

HeWhoMustBeConfused
Friday, October 3, 2003

Yes, I knew that 32 bit short was standard compliant, that's why I asked the question.  What I was wondering is:

Does such a thing actually exist right now.

By the way I like the idea of pulling out the 16 least significant bits.

Not certain exatly how to do that though:

say for instance on a system with 32 bit short

short x;
x = 15;

What type do I declare to hold the 16 bit value, and how do I set it equal to the 16 least significant bits of x?

-  Danny

Danny Hamilton
Friday, October 3, 2003


In addition to the width problem, there's the byte order problem.

With width problem, if you know what systems the program will be compiled on, then you could typedef a name for your app's 32bit integer. Then, this could be defined to be the appropriate underlying type on different systems.

You could do something similar with the byte order but it would be more complex (i.e. write out the integers in a stream of bytes in a standard order. Then read the integers as a byte stream and assemble the bytes into integers in the correct byte order.

Or, use ASCII.

njkayaker
Friday, October 3, 2003

HeWhoMustBeConfused,

I agree, and have made use of that concept in this program.

My program receives a data stream that I get to define, then acts as a filter to write out these record formats that already exist on a legacy system.  The file is then trasfered to the lagacy system for use.  So I defined my input data stream as pipe delimited, but since my program needs to be able to run on any system with an ANSI compliant compiler, and I need to write out records that will be useable on the legacy system, you can see my dilema.

-  Danny

Danny Hamilton
Friday, October 3, 2003

A 32bit short is not standard compliant.

Only "short <= int <= long" is.

njkayaker
Friday, October 3, 2003

njkayker,

Oh my gosh!  I hadn't even thought about the byte order issue!

I'll have to think that over.  I'm a bit confused on your answer to the width issue though.

As you can see from my last post, due to legacy issues I can't use ASCII.

typedef a name for my app's 32bit integer?

you mean like:

typedef short short32

?

Ok? Then you suggest to define it to be the appropriate underlying type?

How do you mean?  So I've got this 32bit short defined as type short32 in my app and I need to write out a 16 bit short to a file...  Not sure what to do here.

-  Danny

Danny Hamilton
Friday, October 3, 2003

If you must use binary, then your best bet is to externalize your shorts using code along the lines of:

short s;
char c[2];

c[0] = s >> 8;
c[1] = s;

fwrite(c, 1, 2, fp);

Slow, but it will work unless chars are not 8-bit.  But in that case you're screwed anyway since you have nothing to reference to.

Make sure to get this code right... otherwise you'll be crapping in your shorts. :-)

David Jones
Friday, October 3, 2003

nykayaker,

Not sure what you mean by 32bit short is not standard compliant?

As long as 32 <= int <= long

32 bit is not "not compliant"... or rather is compliant.

No?

-  Danny

Danny Hamilton
Friday, October 3, 2003

The common way to deal with this problem is:

1. Don't store binary data. Everybody's said it, and I'm gonna say it too. Just don't do it.

2. If you disregard #1, then stop using the built-in types directly. Use either an automatic or a manual system in which you can define the endianness and the size of types through the use of a header file. Then use the "guaranteed proper size" types. The autoconfig system that's prevalent on Unix is one way to do it.

Brad Wilson (dotnetguy.techieswithcats.com)
Friday, October 3, 2003

David,

You're a godsend!  I believe that I can expect char to be 8 bits (since ANSI defines it as 1 byte).  Speed isn't an issue here, just portablility and reliablility.

Now I suppose I still need to be concerned about byte order?  That's really going to make a mess of things isn't it?

-  Danny

Danny Hamilton
Friday, October 3, 2003

Brad,

I agree fully with #1 which is why I defined my input stream to be pipe delimited.  However due to legacy system issues, I'm bound to #2 for my output.

If I read #2 correctly, you suggest using header files to store functions for re-arranging the byte order if necessary, and then including instructions for the user that say:

If your system uses ??? byte order, then compile with header file x.h otherwise compile with header file y.h

Is that what you mean?  I'd rather not do that, but it may be the only way to handle the byte order issue.

-  Danny

Danny Hamilton
Friday, October 3, 2003

(Read it backwards.)

Basically, the point I was trying to make is that one can't assume that a particular width is associated with a given type.

njkayaker
Friday, October 3, 2003

If you are using a compiler that implements the 1999 C standard (ISO/IEC 9899:1999), the header file stdint.h will contain definitions of fixed-sized integers intN_t and uintN_t, where N is the bit width, for example, uint16_t. Generally, you can expect N values of 8, 16, and 32 to exist for a 32 bit target environment, and 64 should exist in most cases as well. See section 7.18.1.1 of the standard for details.

You can check for byte order problems as you would with Unicode UTF-16 encoding: write a known multi-byte integer value at the beginning of the file that will show you whether or not a byte reversal has occurred. If byte reversal has occurred, you have to perform byte-level operations on the data to fix it.

Writing the data as text may be a solution. If you choose to write data as text, and your data includes text strings, be careful that the file cannot be corrupted by the contents of the text strings. For example, you might escape control characters, in a manner similar to XML. If you use a text encoding and your data includes text strings, you might want to write the file using Unicode, so your file format does not have to change in the future if the underlying application supports Unicode. The Unicode UTF-8 encoding is efficient for characters that can be represented in 8 bits.

Dan Brown
Friday, October 3, 2003

nykayaker,

No problem.  That point is the reason for my question in the first place.

Nobody has really answered my first question though.

I know that you CAN have a 32 bit short and still be ANSI compliant, but am I getting overly concerned about something that is only possible but doesn't actually exist (or isn't likely to any time soon)?

-  Danny

Danny Hamilton
Friday, October 3, 2003

Dan,

As for using text, or placing an int at the top of the file they are excellent ideas, but if you look back through this thread, you'll find that due to legacy system issues, they won't work well in my situation.

Although I suppose it would work if I write the file with the known value at the top, then check the value, then re-write the entire file without the known value at the top (and with byte reversal if needed).

I don't know if I can count on the 1999 standard, but it's an excellent idea.  I'll check with requirements to see if it's an option.

-  Danny

-  Danny

Danny Hamilton
Friday, October 3, 2003

As everyone has said, this can easily be solved by typedefs. Use the ANSI standard int16_t, uint13_t, int32_t, uint32_t, and friends. And if those don't exist on one of your target platforms, then use typedefs to create THOSE typedefs.

Actually, I would recommend insulating your code even further with something like the following. If you fear the implementation details of "short" in your code, then you should also fear the implementation detail of "uint16_t" in your code.

/* some header file */

#ifdef SUCKY_COMPILER
typedef unsigned short uint16_t;
#endif /* SUCKY_COMPILER */

typedef uint16_t my_record_length_t;

runtime
Friday, October 3, 2003

It depends a lot. I've seen systems where everything is 24 bits (DSP processors) because thats the only way to make ptr++ work (Actually it's not but the mess folks got themselves into on either crays or CDC cybers proves that the alternative is generally worse)
It's likely that on 64 bit systems you may come across a 32 bit short. The compiler writer might have decided that for compatibilty reasons an int should be same size as a pointer so therefore a short should be 32 bits.
In these days of ANSI C it's unlikely. Personally I'd put some typedefs in and use those.  The microsoft documentation for the AMD64 bit compiler indicate that short is 16 bits.

Peter Ibbotson
Friday, October 3, 2003

The other obvious solution is the NOT use C. Use something more portable like Python and then (as mentioned above) use a non-binary file format, such as XML.

runtime
Friday, October 3, 2003

I've worked on a system (using a TI DSP processor) where char, short, int and long were ALL 32 bits; the CPU only natively handled 32-bit chunks, nothing shorter.

So this is a potentially real problem.

The solution is as others have mentioned, your friendly neighborhood typedef.

Chris Tavares
Friday, October 3, 2003

runtime,

I don't get it.

if I:

typedef unsigned short uint16_t;

on a system with 32bit short, doesn't that just give me a 32bit data type called uint16_t?

How will this help me write out a 16 bit short in the file?

If I can assume that uint16_t will already be available as a 16 bit int, then the:

typedef uint16_t my_record_lenght_t;

sort of makes sense (although it seems I could just as easily use the uint16_t type in place of the my_record_length_t type).  Depending on how the "my_record_length_t" type was defined, it MAY make the code more readable, but I don't think it would buy me anything in functionality.

Several people have suggested using typedef to create a 16 bit int if it doesn't exist.  What nobody has told me yet is how to create such a type on a system that has 32bit short, or how to create such a type in a way that I will end up with a 16 bit short both on systems with 16bit short and 32 bit short.

-  Danny

Danny Hamilton
Friday, October 3, 2003

Chris,

A 32 bit char?  That would really make a mess of my program.  I'm pretty sure that all the way back to the first ANSI C (1988?) an ANSI compliant C compiler has to treat a char as one byte (I hope I'm correct on this).

Like many others you've suggested using typedef, but
I still don't know how to create a 16 bit type with typedef on a system where char = 8bits and short = 32 bits.

-  Danny

Danny Hamilton
Friday, October 3, 2003

Peter,

Again someone suggests using typedef, but doesn't explain how.

can you give me an example on how to reliably create a 16 bit type with typedef?

Would I do something like:

typedef struct uint16_t {
  char myfield[2];
} UINT16_T;

???

Then on a system with 32bit short can I:

short x;
UINT16_T y;

x = 15;
y=(UINT16_T)x;

???

My gut feeling tells me this won't work, but I haven't tried it yet.

-  Danny

Danny Hamilton
Friday, October 3, 2003

Hmm... Thats your problem on DSP chips a byte is NOT 8 bits, the online ANSI standard I found says this about a byte (The IEEE and ISO tend to use octet for 8 bits, dunno about ANSI but I would guess they're the same):

3.4
      [#1] byte
      addressable  unit  of  data storage large enough to hold any member  of  the  basic  character  set  of  the  execution environment

      [#2]  NOTE 1 It  is  possible to express the address of each individual byte of an object uniquely.

      [#3] NOTE 2 A byte is composed of a contiguous  sequence  of bits,  the  number  of which is implementation-defined.  The least significant bit is called the low-order bit; the  most significant bit is called the high-order bit.

Don't sweat it too much. This is probably ONLY an issue for DSP chips (or other odd embedded environments). I would imagine that on a 64 bit compiler where short!=16bits they'll be some kludges like "short short" to mean 16 bits.
In practice byte ordering is a bigger issue. Intel chips have instructions to flip the byte order, others such as the motorola 68K family don't.

Simple solution is too put some typedefs in, then at runtime check to see if your int16 definition is still correct.
i.e. is (0xfffe + 1) =  0xffff (Checks short isn't 8 bits) and (0xffff+1)=0x0000 (Checks wrapping at 16 bits)

Given that a usable 16 bit math class can be created simply in C++ for any compiler with a 32 bit integer don't worry about it.

Peter Ibbotson
Friday, October 3, 2003

Even if your file format assumes 16-bit integers (whether the compiler calls them chars, shorts, wchar_ts, or whatever) you probably do not need to use 16-bit integers in memory. At work, I maintain a program that parses variable-sized records from (file or network) byte streams. I use a technique similar to the following:

uint32_t value = read_next_byte(stream);
value <<= 8; /* shift value "one byte" to the left to make room for the next byte */

value |= read_next_byte(stream);

then my value should be something >= 0x00000000 and <= 0x0000ffff. Of course, you might have to worry about little- and big-endian machines! I will leave that as a problem for the reader. ;-) The endianness of the byte stream my program reads differs from endianness of the endianness of my target CPU. So I actually read the "low" byte first and the "high" byte second and swap them before shifting and or'ing them into the 32-bit integer.

runtime
Friday, October 3, 2003


You are correct that the above typedef example would be broken on a machine with a 32-bit short. My example was really just if your compiler did not support the ANSI uint16_t typedef. If your platform used 32-bit shorts, I would expect the compiler to have a non-portable way to get smaller integers.

#if sizeof(unsigned short) == 16
    typedef unsigned short uint16_t
#elsif sizeof(unsigned short) == 32
    typedef NONPORTABLE_16BIT_INT_TYPEDEF uint16_t
#else
    #error YOU ARE HOSED!
#endif

runtime
Friday, October 3, 2003

Correct me if I'm wrong, but doesn't something like David suggested take care of the byte order?  The bitwise operators don't care about memory representation.  For example, using VC++ on a Windows/Intel machine, the short 0xABCD is represented in memory as 0xCD 0xAB.  However, the operation ((x & 0xFF00) >> 8) has the value 0x00AB, not 0xCD. 

Is there any reason why something like the following shouldn't work in a standard manner (things like error handling obviously missing for clarity)?

  void WriteShort16(FILE* f, short s)
  {
    char c[2] = { s >> 8, s };
    fwrite(c, 1, 2, f);
  }

  short ReadShort16(FILE* f)
  {
    char c[2];
    fread(c, 1, 2, f);
    return (c[0] << 8) & 0xFF00 | c[1] & 0x00FF;
  }

A different approach that I've seen for handling byte order is to use the socket functions htons and ntohs.  These convert between "host" order and "network" order. 

On another note, text files are nice when feasible, but would you guys seriously consider writing something like a JPEG as text rather than binary?

SomeBody
Friday, October 3, 2003

runtime,

Your suggestion of reading 1 byte and left shifting into the variable matches David's suggestion, but in reverse since he was suggesting right shifting before writing instead of left shifting before reading (by the way it's the writing that I need to do).

It looks like that's the best suggestion I've got so far.  It may be an issue on machines where a char != 8 bits, but I'll deal with that later (one problem at a time).

Thanks to all the assistance I've received from the helpful people who've posted to this thread, it looks like I've got a workable solution to the issue of needing a 16 bit value on a machine with short = 32 bits or greater (or less).

So the next problem it looks like I'll need to tackle is byte order.  The legacy system reading the files with the 16bit shorts in them is running in MS DOS on an Intel 486.

So I'll need to make sure that when I write these values the byte order matches that system.  I suppose I can use right shift on a value and see what the new value is.  That will tell me which byte order the system is running where my program will run.

Then the program can reverse the byte order if needed based on the result?

I'll do some investigation as to wether I can count on my program running on systems where char is 8bits.  If not, and if I can't come up with a good way to handle it on my own I may be back asking for a good way to write a 16 bit value to a file where char > 16 bits.

-  Danny

Danny Hamilton
Friday, October 3, 2003

>>On another note, text files are nice when feasible, but would you guys seriously consider writing something like a JPEG as text rather than binary?

No, when you write out JPEG data to a file, you are already taking care of things like byte order.  The JPEG spec defines the representation that you have to use.  I'm more familiar with the TIFF spec, where the 1st and 2nd bytes specify the byte order (endianness) of the rest of the data (though for some reason, instead of calling it "big endian" or "little endian", the refer to it as 'I'ntel and 'M'otorola).  I believe someone suggested something similar.
The OP seems not to have an output spec that he's writing to, otherwise this whole thing wouldn't be an issue.  Instead, it sounds like he just wants to write code and hope it works cross platform.  At least with a spec, you can reasonably write test code.

Brian
Friday, October 3, 2003

Brian,

Maybe I didn't explain well enough.

I know exactly what the output needs to look like, it's how to create consitant output no matter what platform my program is compiled on that I was struggling with.

I've got a datastream of pipe delimited ASCII as input.

I've got to write out records that include several fields (some 1 byte int, some 2 byte int, some 4 byte int, some character strings, even some packed BCD.  I've already got the program written, compiled, tested, and working on the legacy platform.

Now I've been told that they wnat to be able to compile the source on other platforms to generate the legacy files elsewhere and then move the files to the legacy system.  But at this time it's uncertain exactly where they will be compiling/using the program.  So, it needs to be written/modified so that it will generate consistant output wherever they choose to use it.

There are a lot of "short cuts" or "kludges" that I could use to get thigs done.  Such as a bunch of $ifdef bigendian #endif blocks or #ifdef 32bitshort #endif, and then include some instructions that say, "If the machine has 32bit short, make sure to add #define 32bitshort to the program before compiling.

What I had hoped to do, since I have some time, and access to the wealth of shared knowledge available here, is to create a single portable source code that will create consistant output no matter where it is used.

The best answer I've seen (and am disappointed I hadn't thought of it myself) is to use bit shifting to load an array of bytes with the various values I need.  And then write out the array of bytes.

I'm especially disappointed with myself for nt thinking of the bit shifting method, since I'm already doing something like that for the packed BCD.  I was just so focussed on native data types, that it didn't cross my mind.  As soon as David mentioned it, it was pretty clear that that would work, and unless someone has a better/simpler/faster idea, it's what I'll end up doing.

When byte order was first mentioned, I became a bit concerned that I hadn't thought of it as an issue, and this thread took a bit of a twist.  As "sombody" mentioned, since I know the byte order the output needs to be in, it shouldn't really be an issue.  The bit shifting operations should work consistantly no matter which byte order the machine uses natively, and so I should easily be able to create a consistantly ordered 16bit int to write to the file.

Now the only remaining concern is systems where char != 8bits!

-  Danny

Danny Hamilton
Friday, October 3, 2003

Would the htons(), htonl(), nltoh(), and nltos() functions help you at all?

They should at least with byte order.

Happy to be working
Friday, October 3, 2003

Danny,

Just exactly what exotic systems do you expect this to compile on?  Pretty much all modern systems will have "stdint.h" and functions for converting integers to/from network byte order.

I don't think it's unreasonable to limit it to platforms with that level of support -- it should cover nearly all of them.

Almost Anonymous
Friday, October 3, 2003

Almost anonymous,

As I mentioned, I know the byte order of the output, so byte order shouldn't be a concern any how.  And I think David and "Somebody" came up with identically workable ideas to ensure 16bit int.

Unless there's a better way to handle it, I think that pretty much takes care of everything except:

If char != 8bits.

-  Danny

Danny Hamilton
Friday, October 3, 2003

Almost Anonymous,

As far as what systems it will compile on...

I know there is potential for SUN or Intel at a minimum.

As for OS...
SCO UNIX, SuSE Linux, MS DOS, and WindowsNT are all likely candidates.

The issue isn't so much where it will compile.  It would probably be save to assume it will compile on an Intel box running either WindowsNT, MS DOS, SCO UNIX, or some flavor of Linux.

I could probably push back and ask for more specific requirements, and get it narrowed down to a few realistic potential systems it will run on.

However,  since I'm not up against a wall as far as deadline goes, and I have access to the wealth of knowledge available here, I figured it would be a good oppurtunity to:

a) learn something new

b) create something that will remain useful for an extended period of time

c) practice writing well thought out, fully portable (or as fully as possible), code.

d) create impressive code that handles not only likely situations but most possible situations.

e) avoid bringing questions to my employer to find answers when he is already very busy.

-  Danny

Danny Hamilton
Friday, October 3, 2003

We have had this problem in spades: we have legacy software that has migrated from 16-bit little-endian architecture (short = int = 16 bits, long = 32 bits) through 32-bit big-endian to 64-bit little-endian.

The only practical way to do this is to use symbolic types internally (we have WORD, DWORD, and so on... based on the original PL/M types), and create a set of routines for packing and unpacking structures into the legacy format. You create a hierarchy like this:

typedef struct {
    char *buf;
    size_t len;
    size_t cur;
} buffer_t;

typedef union {
    char c[DWORD_LEN];
    DWORD dw;
} dword_t;

BOOL read_dword(buffer_t *b, DWORD *dest)
{
    dword_t *wp;
    if(b->len - b->cur < DWORD_LEN)
        return FALSE;
    wp = &b->buf[b-<cur];
    *dest = legacy2dword(wp);
    b->cur += DWORD_LEN;
    return TRUE;
}

...

BOOL read_struct_foo(buffer_t *b, struct foo *dest)
{
    struct foo tmp;
    if (!read_dword(b, &tmp.first_dw))
        return FALSE;
    ...
    return TRUE;
}

And same for writing.

Then you define legacy2dword and dword2legacy in another include file that looks like this:

#ifdef LITTLE_ENDIAN
#define legacy2dword(dwp) ((dwp)->dw)
#else
#define legacy2dword(dwp) ((dwp)->c[0]<<24 | (dwp)->c[1]<<16 | (dwp)->c[2]<<8 | (dwp)->c[3])
#endif

And as someone else noted you may be able to use the network-byte-order macros htonl, htons, and so on to shortcut this.

Peter da Silva
Sunday, October 5, 2003

Whoops... in read_struct_foo I forgot

    *dest = tmp;

at the end. :)

You probably want to save and restore b->cur as well.

Peter da Silva
Sunday, October 5, 2003

*  Recent Topics

*  Fog Creek Home