Fog Creek Software
Discussion Board




problems with Java code snippet

I having some problems with this code snippet trying to remove some noise words from a Vector of vectors, where each element is another vector-
I get a array index out of bounds exception at removeElementAt, at run-time, compiles fine.
Anybody know what's the problem?

for(int i=0;i<v.size();i++){
    //ielements are a vector, process in another loop    
   
      Vector temp=(Vector)(v.elementAt(i));
      System.out.println(temp);
      System.out.println(temp.size());    
     
      for(int j=0;j<temp.size();j++){
         
            String val=(String)temp.elementAt(j);
          for(int iter=0;iter<noisewords.length;iter++){
            System.out.println(" j is "+j+ "iter is " +iter);    
            if(val.equalsIgnoreCase(noisewords[iter]))
            System.out.println("in here");
            ((Vector)v.elementAt(i)).removeElementAt(j);    
          }                         
      }            
        
   

    
}//end for    

sumit
Wednesday, May 28, 2003

This is just an idle thought but where you have:

if(val.equalsIgnoreCase(noisewords[iter]))
            System.out.println("in here");
            ((Vector)v.elementAt(i)).removeElementAt(j);   


You are ALWAYS removing an element regardless of whether val looks to be a 'noiseword'.  Do you actually mean:

if(val.equalsIgnoreCase(noisewords[iter])) {
            System.out.println("in here");
            ((Vector)v.elementAt(i)).removeElementAt(j);   
}

Because currently, on each iteration of your loop:

for(int iter=0;iter<noisewords.length;iter++)

You are removing noisewords.length elements from your vector.  So if noisewords.length > v.elementAt(i).size() before removal of any noisewords, you'll end up with the ArrayIndexOutOfBoundsException that you have.

Konrad
Wednesday, May 28, 2003

Rather than iterating through both vectors, wouldn't you be better of butting your noisewords into a collection and then using:

temp.removeall(CollectionOfNoiseWords)

from http://java.sun.com/j2se/1.3/docs/api/java/util/Vector.html :
removeAll
public boolean removeAll(Collection c)
Removes from this Vector all of its elements that are contained in the specified Collection.
Specified by:
removeAll in interface List
Overrides:
removeAll in class AbstractCollection
Returns:
true if this Vector changed as a result of the call.
Since:
1.2

Ged Byrne
Wednesday, May 28, 2003

I'm not a java programmer, but ISTM that if you have a vector of, say, 3 things, and you take the second one out, when you try and access the third one you might well be out of bounds (depends if the effect of removing an element shifts the remaining elements)


Wednesday, May 28, 2003

import java.util.*;
import java.io.*;

public class NoiseWords {
    public static void main(String[] args) {
        Vector v = new Vector();
        v.add(createVector("The rain in spain falls mainly in the plane"));
        v.add(createVector("To be or not to be"));
        v.add(createVector("The fat cat sat on the mat"));
        v.add(createVector("The quick brown fox jumped over the rugged man"));
       
        Vector noiseWords = createVector("the is on to in or");
                       
        Vector element;
       
        for (Iterator it=v.iterator(); it.hasNext(); ) {
            element = (Vector) it.next();
            element.removeAll(noiseWords);
            for (Iterator innerit = element.iterator(); innerit.hasNext();) {
                System.out.print(" " + innerit.next());
            }
            System.out.println(".");
        }
       
       
       
       
    }
   
    private static Vector createVector(String s) {
        Vector v = new Vector();
        StringTokenizer parser = new StringTokenizer(s);
        while (parser.hasMoreTokens()) {
            v.add(parser.nextToken().toLowerCase());
        }
        return v;
    }
}

Ged Byrne
Wednesday, May 28, 2003

It looks from the println statements like you need to use an interactive debugger.  try http://www.eclipse.org if you don't already have one.  it should be easy to figure out with the right tools.

Scot
Wednesday, May 28, 2003

Additionally, instead of looping through the noisewords array, you should probably just use a hashset:

Set noiseSet=new HashSet();
for(int qw=0;qw<noisewords.length;qw++){
    noiseSet.add(noisewords.toLowerCase());
}

then for your comparisons:

  String val=(String)temp.elementAt(j);
  if(noiseSet.contains(val)){
      System.out.println("in here");
      ((Vector)v.elementAt(i)).removeElementAt(j);   
  }

anon
Wednesday, May 28, 2003

When you remove an element, the Vector gets smaller, but you keep incrementing j. If you wind up removing all elements, on the last iteration the size is 1, but j is the original size.

I would strongly recommend using Iterator to go thourgh the Vector. Then you can do it.remove() to remove the current element without having to worry about indicies. I would also recommend using a List rather than Vector to avoid unnecessary synchronization.

anon's suggestion to use a HashSet is excellent. It's a much more approriate data structure for what you're trying to do. You may need to use String.CASE_INSENSITIVE_ORDER for comparison if string case doesn't matter.

Just curious - what is this code for - a class project or real product?

igor
Wednesday, May 28, 2003

hi igor,
real product, part of a real product for document clustering of a web search engine...got most of the code, stuck on the indexes here... have used HashSets elsewhere

sumit
Wednesday, May 28, 2003

Agree with Konrad,

You're missing a couple of brackets around the if statement checking for the presence of noise words. 

It's probably a bug, but I don't know if it'll solve all your problems. :)

Crimson
Wednesday, May 28, 2003

When removed vectors, perform from the bottom up - not the other way. As others have noted, removing elements changes the size of the vector. Eventually you'll throw an out of bounds exception.

David Geller
Wednesday, May 28, 2003

OFF-TOPIC:

What an ugly language!!!
Isn't it amazing that Java got widespread use when it requires such cumbersome, error-prone operations to do something as simple as the orgiginal poster needs. Sheesh...

For a comparison, this is how the same is done in high-level language (Ruby, in this case):

for vector in vector_of_vectors
        vector.delete_if { |word| noise_words.include?  word.downcase }
end

Sorry for the off-topic, but I just couldn't resist.

raindog
Wednesday, May 28, 2003

raindog,

to someone who's not very familiar with Ruby your code looks even more obscure. If you look at anon's changes you'll see that the proper solution is quite readable. I'm sure it's easy to write bad code in Ruby just as much as java.

.
Wednesday, May 28, 2003

>>  to someone who's not very familiar with Ruby your code looks even more obscure. If you look at anon's changes you'll see that the proper solution is quite readable. I'm sure it's easy to write bad code in Ruby just as much as java.

I didn't say it's obscure. I said it's ugly, but it's a matter of taste. Sorry again, I really don't mean to hurt anyone's feeleings.

As to the bad code: yes, you're right. It's easy to write bad code in any language. However, a solution in Perl, Python or Ruby takes 5-10 times less LOCs than Java, on average. And error per LOC parameter doesn't depend on language used. Sorry, I don't have references to these 2 facts handy. Just believe me :-)

raindog
Wednesday, May 28, 2003

In Java 1.5, isn't Java going to have better syntax for going through a Collection's elements?  Like "for item in blahList"?

Correct Java code isn't as bad as the OP's code snippet.  But it's still a bondage & discipline language, nothing like Python or the rest.

anonymous
Wednesday, May 28, 2003

Yes, I'm looking forward to ver. 1.5

Java is a good environment. It just needs a good language now :)

raindog
Wednesday, May 28, 2003

[[ However, a solution in Perl, Python or Ruby takes 5-10 times less LOCs than Java, on average ]]

You're comparing orange and apples. Sport car may go much faster and easier than a truck but it can't carry as much load. I used to develop a lot in both Perl and Java - both of them have advantages and disadvantages over each other. There are applications where Perl fits better, there are those where Java does.
Comparing LOCs, number of errors and etc is still a meaningless task, IMHO. Perl vs. Java is rather a long talk and I have a lot of thoughts about it, but we're offtoping too much already ;)

Evgeny Goldin
Thursday, May 29, 2003

Am I missing something.  Is there a compelling reason not to use removeAll?

        Vector element;     
        for (Iterator it=v.iterator(); it.hasNext(); ) {
            element = (Vector) it.next();
            element.removeAll(noiseWords);
        }

Seems elegant enough to me.  With 1.4 I believe you can even get rid of the cast.

Ged Byrne
Thursday, May 29, 2003

Ged, it almost works. Unfortunately, the comparison should be case-insensitive and we need to downcase the elements of an array (assuming noiseWords are already in lowercase).

Here's the problem: even if Java API provides a nice convenience method (like aemoveAll), the language is not flexible enough to extend the applicability of the method and make it really practical.

Ruby's delete_if (or similar Python or Perl high level functions) is similar to removeAll, but it takes a function as an argument, so it's really easy to customize: instead of downcasing, you could check only the first 10 letters, or only the first word of a string, and so on.

Other than that, the 1.5 Java will fix the other 2 non-elegant things: generics will allow to get rid of type casts and foreach loop gets rid of unneded boilerplate (for (Iterator = ..)) thingy.

Evgeny, I agree with you and didn't mean to start a language war here. Fortunately, it's not Slashdot :) But my point is that Java strength is in it's environment (libraries, servers, IDEs, all the hype, books, etc) and it would be nice to add some language features to make life of Java programmers (like myself) better. Java 1.5 is the way to go.

raindog
Thursday, May 29, 2003

*  Recent Topics

*  Fog Creek Home