Fog Creek Software
Discussion Board




Remove duplicate from sorted array

How to remove duplicate from sorted array? You can use int counters, but you are not allowed to decalre another array.

Mark
Thursday, March 13, 2003

I am not sure if you are asking how to do this. But if you wanted to a solution I can provide 1.

Since it is a sorted array any duplicates would be next to each other. So start at the beginning and compare each element to it neighbor on its right. If there are no duplicates then you have made n-2 comparisons and you are done.

If you find a duplicate you are going to have a hole. Since it is an array an not a link list you will need to move every element down. (You could create a new array if that was allowed.) However you do not just want to start moving everything down, you can simply mark with a pointer or counter where the hole is and move on to the next comparison. When you get the next non-duplicate you move(copy) that element into the hole and increment your hole counter by 1.

So here is some C code that will do just that.

//Returns false on errors
bool bRemoveDuplicates(int array[], int iSize){
  if(iSize <1)
  return false;
  if(array == NULL)
    return false;
  if(iSize == 1)
    return true;

  int left_comp = 0;
  int right_comp = 1;
  bool start_move = false;
  int hole = iSize;

  for(; right_comp <iSize;right_comp++,left_comp++){
    if(array[left_comp] == array[right_comp]){     
      if(!start_move){
        start_move = true;
        hole = right_comp;
      }
    }
    else if(start_move){
      array[hole++] = array[right_comp];
    }
  }

  for(;hole<iSize;hole++){
      array[hole] = 0;
  }

  return true;
}

void main(){
  int array[] = {1,1,1,1,1,1,1,1,2,2,4,5,7,8,8,9,9,10,11,11,12,13,14};
  int iSize = sizeof(array) / sizeof(int);
  int i;

  cout<<"Before List"<<endl;
  for(i=0;i<iSize;i++)
    cout<<array[i]<<" , ";
  if(!bRemoveDuplicates(array,iSize)){
      cout<<endl<<"Error Occured"<<endl;
  }
  cout<<endl<<"After List"<<endl;
  for(i=0;i<iSize;i++)
      cout<<array[i]<<" , ";
  cin>>i;

}

Jacob Blumberg
Thursday, March 13, 2003

Great. I was trying solution based on same line, but my logic was not flawless.

Thanks again.
(This was my first question on this site, While I'm preparing for job interview, you will more questions like this)

Mark
Friday, March 14, 2003

How about this?

/*
* 'a' is the array
* 'n' is the number of elements in the array
* Returns the number of elements in the array after
* removing duplicate elements
*/
int
unique(int *a, int n)
{
    int i, k;

    k = 0;
    for (i = 1; i < n; i++) {
        if (a[k] != a[i]) {
            a[k+1] = a[i];
            k++;
        }
    }
    return (k+1);
}

int
main()
{
    int a[] = {1, 1, 2, 2, 3, 3, 4, 4, 5, 5};
    int n;

    n = unique(a, 10);
    for (i = 0; i < n; i++)
            printf("a[%d] = %d\n", i, a[i]);
}

Shiva
Monday, March 24, 2003

I see, you are just doing the opposite of what I did, instead of keeping track of where they are equal you assume they are equal and make the special case when they are not. The question comes down to are there going to be more duplicates than not. Statistically there should be less duplicates if the numbers are completely random and span the full integer range. However if there are more duplicates you algorithm will run better. You are also making needless assignments in the case that there are no duplicates. You should also declare I in main and check your input for errors :) (nit picking) But it provides a bit cleaner approach.

Jacob Blumberg
Monday, March 24, 2003

I wonder if anyone remembers how to use pointers anymore.  Seems to be a lost skill.

Anways, lets make the problem simpler, by assuming that arrays of numbers are zero terminated (like C strings).  Then I'd probably be thinking along these lines.

int i, * p, * q ;
for (i= *ablock, q= p= (ablock +1); (* q); q ++) {
    if (* q != i) {
        i= * q ;
        if (q != p) { *p= i ; } p ++ ;
    }
}
if (p != q) * p= 0 ;

But that's not very readable.  What I'd actually write since most likely sometime later in the future I'd be coming back to this and wonder what I was thinking should look more like this:

    /*** unique
        test a zero terminated array for consequtive duplicate values
        modifies array to only hold unique entries
    */

void    unique(int * ablock)
{
    int curval ;
    int * p, * q ;

    ASSERT(ablock != NULL) ;
    ASSERT(* ablock) ;

        // loop through looking for unique values
    for (curval= *ablock, q= p= (ablock +1); (* q); q ++) {
        if (* q != curval) {
            curval= * q ;
                // relocate value if new array is shorter
            if (q != p) { *p= curval ; }
            p ++ ;
        }
    }
        // terminate new list if shorter
    if (p != q) * p= 0 ;
}

This is pretty streamlined, assumes writes are kind of expensive, and will handle any valid array with one or more elements in it.

Now as to testing it, you want to exercise a couple of different cases and make sure to get all the boundary conditions (no dups, dups at the end, dups at the begining, etc.)

int main(int, char **)
{
    int testa[]= { 1, 2, 4, 6, 7, 8, 9, 12, 15, 19, 0 },
        testb[]= { 1, 1, 1, 4, 5, 9, 9, 10, 10, 12, 15, 18, 20, 0 },
        testc[]= { 1, 2, 4, 5, 6, 8, 8, 8, 8, 8, 0 } ;
    int * p ;

    unique(testa) ;
    for (p= testa; (* p); p ++) { printf(" %d, ", * p) ; } printf("\n") ;
    unique(testb) ;
    for (p= testb; (* p); p ++) { printf(" %d, ", * p) ; } printf("\n") ;
    unique(testc) ;
    for (p= testc; (* p); p ++) { printf(" %d, ", * p) ; } printf("\n") ;
}

That should cover it.

Derek Woolverton
Wednesday, April 09, 2003

*  Recent Topics

*  Fog Creek Home