finding the count of a number of terms in a list Is there an algorithm lees than O(n^2) for this? Sumit Friday, July 4, 2003 I'm  not sure I understand the question, I think you need to be more precise. However I suspect that your answer (if any) is to be found within the Declarative programming community (Functional and Logic Programming) who make heavy use of list manipulation, typical languages being Haskell, OCaml, Oz and so on. David Basil Wildgoose Friday, July 4, 2003 Sumit, are you making the list yourself, or is it given to you by some(one|thing) else? Matthew Lock Friday, July 4, 2003 If you mean you want to find the number of times each element of the list appears in the list, then - yes, you can do it in O(n) average with a hash, or O(n log n) worst case with a balanced tree of some sort. He-who-would-not-be-named Friday, July 4, 2003 I am making the list myself, I am trying to find the number of duplicates in a sentence, how can I do this in less than O(n^2), by using say a hash? Sumit Friday, July 4, 2003 Yes you should use a hash to count the number of terms. They are orders of magnitude faster than scanning arrays. Matthew Lock Friday, July 4, 2003 To clarify create in the loop that scans your sentence create a hash for each word you find and increment it by one. At the end of the loop your hash will contain a count of the words in the sentence. Matthew Lock Friday, July 4, 2003 You can sort a list in O(n log n) steps, and then find duplicates in the sorted list in O(n) steps. Frederik Slijkerman Friday, July 4, 2003 you view the sentence as a 'list of words'; are there other ways to view it? constructive comment Friday, July 4, 2003 How can I find duplicates in the list in O(n) steps? Is there any other way than a hash? the system is such that incorporating a Hash will require a lot of changes Sumit Friday, July 4, 2003 Without using a hash I think the best approach is to first sort the list using the fastest means possible. Then iterate the list, counting duplicates. Ged Byrne Friday, July 4, 2003 I don't see how using a hash could require a lot of changes in your code. As I understand, you use the hash only to find your number of duplicates, and throw it away afterwards. Roel Schroeven Friday, July 4, 2003 What language? If the list is small, then in VB you could use:   Dim c              As New Collection   Dim buf          As Variant   Dim i              As Integer       buf = Array("one", "two", "three", "one", "a", "one")   On Error Resume Next   For i = 0 To UBound(buf, 1)       c.Add Null, buf(i)   Next i   On error goto 0    Debug.Print c.Count The above results in a answer of 4, since there is only 4 unique elements Albert D. Kallal Edmonton, Alberta Canada kallal@msn.com http://www.attcanada.net/~kallal.msn Albert D. Kallal Friday, July 4, 2003 use SQL  :=) SELECT [Word], Count(*) GROUP BY [Word] HAVING Count(*) > 1 DJ Saturday, July 5, 2003 You find the duplicates as you create the list, whether you hash/index or not depends on whether you care about this at some later date, ie you store the occurrence of duplication. I typically store all words over a particular length (and those designated keywords), in knowledge systems as a separate table with the 'document' id and offset of the word.  Duplicates within the same document get ignored in that case as its used to search the knowledgebase, but they could be stored as a count in the same record. Simon Lucy Sunday, July 6, 2003 This sounds like a job for... Regular Expressions!!!! Have a look at the RegExp component. You don't say what language, so I'm going to guess VB. RegExp's are available in many, many, many other fine languages. All you would have to do is count the matches that come back. I don't know how this goes for speed, but it's been pretty good to me. Geoff Bennett Sunday, July 6, 2003 You can get O(1) if you want. Maintain the count every time a term is added.  For example: Addterm(term):   containter.add(term)   count++; FindCount():   return count; -- ee eclectic_echidna Monday, July 7, 2003   Fog Creek Home