Fog Creek Software
Discussion Board

finding the count of a number of terms in a list

Is there an algorithm lees than O(n^2) for this?

Friday, July 4, 2003

I'm  not sure I understand the question, I think you need to be more precise.

However I suspect that your answer (if any) is to be found within the Declarative programming community (Functional and Logic Programming) who make heavy use of list manipulation, typical languages being Haskell, OCaml, Oz and so on.

David Basil Wildgoose
Friday, July 4, 2003

Sumit, are you making the list yourself, or is it given to you by some(one|thing) else?

Matthew Lock
Friday, July 4, 2003

If you mean you want to find the number of times each element of the list appears in the list, then - yes, you can do it in O(n) average with a hash, or O(n log n) worst case with a balanced tree of some sort.

Friday, July 4, 2003

I am making the list myself, I am trying to find the number of duplicates in a sentence, how can I do this in less than O(n^2), by using say a hash?

Friday, July 4, 2003

Yes you should use a hash to count the number of terms. They are orders of magnitude faster than scanning arrays.

Matthew Lock
Friday, July 4, 2003

To clarify create in the loop that scans your sentence create a hash for each word you find and increment it by one. At the end of the loop your hash will contain a count of the words in the sentence.

Matthew Lock
Friday, July 4, 2003

You can sort a list in O(n log n) steps, and then find duplicates in the sorted list in O(n) steps.

Frederik Slijkerman
Friday, July 4, 2003

you view the sentence as a 'list of words'; are there other ways to view it?

constructive comment
Friday, July 4, 2003

How can I find duplicates in the list in O(n) steps?
Is there any other way than a hash? the system is such that incorporating a Hash will require a lot of changes

Friday, July 4, 2003

Without using a hash I think the best approach is to first sort the list using the fastest means possible.

Then iterate the list, counting duplicates.

Ged Byrne
Friday, July 4, 2003

I don't see how using a hash could require a lot of changes in your code. As I understand, you use the hash only to find your number of duplicates, and throw it away afterwards.

Roel Schroeven
Friday, July 4, 2003

What language?

If the list is small, then in VB you could use:

  Dim c              As New Collection
  Dim buf          As Variant
  Dim i              As Integer
  buf = Array("one", "two", "three", "one", "a", "one")
  On Error Resume Next
  For i = 0 To UBound(buf, 1)
      c.Add Null, buf(i)
  Next i
  On error goto 0 
  Debug.Print c.Count

The above results in a answer of 4, since there is only 4 unique elements

Albert D. Kallal
Edmonton, Alberta Canada

Albert D. Kallal
Friday, July 4, 2003

use SQL  :=)

SELECT [Word], Count(*)
HAVING Count(*) > 1

Saturday, July 5, 2003

You find the duplicates as you create the list, whether you hash/index or not depends on whether you care about this at some later date, ie you store the occurrence of duplication.

I typically store all words over a particular length (and those designated keywords), in knowledge systems as a separate table with the 'document' id and offset of the word.  Duplicates within the same document get ignored in that case as its used to search the knowledgebase, but they could be stored as a count in the same record.

Simon Lucy
Sunday, July 6, 2003

This sounds like a job for...

Regular Expressions!!!!

Have a look at the RegExp component. You don't say what language, so I'm going to guess VB. RegExp's are available in many, many, many other fine languages.

All you would have to do is count the matches that come back. I don't know how this goes for speed, but it's been pretty good to me.

Geoff Bennett
Sunday, July 6, 2003

You can get O(1) if you want.

Maintain the count every time a term is added.  For example:


  return count;


Monday, July 7, 2003

*  Recent Topics

*  Fog Creek Home