Fog Creek Software
Discussion Board




My pointer stupidity

I was trying to tokenize a space separated string to an array, and came uip with the following routine

char** MakeSpacedWords(char* string) {
    char ** words = new char*[1024];
    char *token;
    token = strtok(string, " ");
    int lgt = 0;
    while( token != NULL )
    {
        words[lgt] = new char[1024];
        sprintf(words[lgt], "%s", token);
        token = strtok(NULL, " ");
        lgt++;
    }
    delete[] token;
    return words;
}

After much core dumps and swearing i gave up and added a int pointer so that the function also returns the length of the generated array.
My question is: how can you "tell" that there are no more elements in a dynamically allocated array? What looping construct  must i use so that i don't get illegal memory operations. I also googled it but couldn't find a very clear explanation. The way i see it, if a pointer is not allocated, it should be ==NULL.
Thanks, Andrei

Andrei DRAGOMIR
Tuesday, July 20, 2004

Use the STL vector to allocate dynamic arrays in C++. It's MUCH easier.

MT Heart
Tuesday, July 20, 2004

All you need to do is add a null pointer terminator to your list of arrays.

Add the line words[lgt] = 0; after your loop.

I really hope you're just learning the language because that is one horribly ugly bit of code, not to mention absurdly free with the ol' memory.

Mr Jack
Tuesday, July 20, 2004

Yeah, I don't do C. I was just trying to add some features to some piece of code i use and it's all written like this, so I figured I'll write my piece in the same way. Besides from the obvious ugliness, what do you mean by 'free with the memory' ?

Andrei DRAGOMIR
Tuesday, July 20, 2004

beware the dreaded buffer overflow..

i like i
Tuesday, July 20, 2004

> The way i see it, if a pointer is not allocated, it should be ==NULL

The thing is that "char*" is a dumb type ... like "int", it doesn't have a constructor that automatically initializes its value.

If you write "int* int_array = new int[1024];" there's no guarantee that all the elements of the array are initialized to 0 (it may or may not be .... it's O/S and compiler-dependent ... and may be different in debug and release builds) ... and similarly thee's no guarantee that when you call "char** words = new char*[1024]" that all the pointers in the array are initialized to NULL.

It's an instance of the general rule "when you define a variable, don't assume that you know what its value is until after you have initialized it."

Christopher Wells
Tuesday, July 20, 2004

A Must Read OP !!

http://www.oreilly.com/catalog/opensources/book/appa.html

Hold On!!
Tuesday, July 20, 2004

Ignore the previous post.. meant for a different thread.

Hold On!!
Tuesday, July 20, 2004

Yeah, I remembered now... line upon line of memset calls.
Thanks for the explanations. I'll try not to write any more stupid code :).

Andrei DRAGOMIR
Tuesday, July 20, 2004

    while( token != NULL )
    {
        ...
    }
    delete[] token;

Not sure if delete of a NULL is an error, but certainly redundant.

Check C++ FAQ ( http://new-brunswick.net/workshop/c++/faq/ ) for a concise guide to C++

Honu
Tuesday, July 20, 2004

myArray = myString.split()

Oh wait, you're trying to do this in C, not Python. Posts like this remind me why I left C and never looked back.

Tom H
Tuesday, July 20, 2004

"Besides from the obvious ugliness, what do you mean by 'free with the memory' ?"

You allocate fixed arrays all the time (1024 of this, 1024 of that) where it's simply not necessary. You could find out how many words there are before allocating the (4k) array of pointers and only allocate what you need. You really, really should only allocate as much space as you for each word (how many 1023 letter words do you think you are likely to get?).

As it is you are taking a (say) 60 character, 12 word, string and taking up 16k to store it, where you need only allocate 113bytes (+ memory management overhead, of course).

That's free with the memory by anyones standards.

Mr Jack
Tuesday, July 20, 2004

This isn't C, it's C++ written as if it were C. If you're going to use C++, do it the C++ way. Use strings and vectors, don't mess with pointers and manual memory allocation. This kind of code will only get you in trouble.

sid6581
Tuesday, July 20, 2004

Can't you just go through the string replacing the spaces with zero bytes in place, and putting the pointers to the start of the tokens into a char ** array as you go?

Matt
Tuesday, July 20, 2004

Holy Crap!
What a wacky implementation!
Why sprintf?!
( okay I didnt compile the below, but its tighter )

// whats the most possible number of strings
int maxLen = strlen( inputString ) / 2 + 1;
char** tmpResult = malloc( maxLen * sizeof( char* ) );

// iterate the tokens, count for later.
int iToken = 0;
nextToken = strtok( inputString, " " );
while(  nextToken ) {
  tmpResult[ iToken++ ] == nextToken;
  nextToken = strtok( NULL, " " )
}

// copy to a actual size
char** minResult = malloc( (1+iToken) * sizeof( char* ) );
memcpy( minResult, tmpResult, iToken * sizeof( char*) );
free( tmpResult)

// mark the end with a 0
minResult[ iToken ] = 0;
return minResult;

Woah
Tuesday, July 20, 2004

Holy buffer overflows, Batman!

'How many 1023 letter words do you think you are likely to get?' One; my specially-crafted exploit that will allow me to 0wnz j00.

Bad code. Since you're using C++, I'll echo previous posters in saying use std::string and std::vector.

If you were doing C instead, here's some pointers:

1) use strdup instead of allocating memory and copying using sprintf

2) use realloc to resize the dynamic array when it's too small (doubling in size, say). You can shrink it at the end if you want.

Good code (and by extension, secure code) takes more effort.

scruffie
Tuesday, July 20, 2004

Well yes, but why not just do what Matt said? strtok even puts the nulls in the original string for you.

as
Wednesday, July 21, 2004

*  Recent Topics

*  Fog Creek Home