SORT WHAT IS THE BEST WAY TO SORT A VERY LARGE FILE, IF THIS FILE IS TOO LARGE FOR THE MEMORY? JOE Wednesday, August 4, 2004 Send the output to the card punch. Take your stack of cards to the machine operator and tell her to run them through the sort process. Wait about 4 hours and go back to get your card stack now in sorted order. Feed them back into the punch card reader and print it all out on 11x17 fanfold paper to verify. old_timer Wednesday, August 4, 2004 Use the Bayes-Knorr algorithm with a Inverse-Tree-Traverse pattern. Google should give some more info on this. Dennis Forbes Wednesday, August 4, 2004 How old is your computer that it doesn't even type in lower case? Bill Rushmore Wednesday, August 4, 2004 Don't listen to Dennis, he's just trying to trick you. Everybody knows you'll need to first run it through a Hayes-Bozon Leaf Pruning filter in order to remove duplicates before you apply a recursive Radix sort to achieve O(n) performance. Captain McFly Wednesday, August 4, 2004 What does your data look like?  Is it random or does it follow some known distribution?  Does the sort need to be stable or not?  You can probably use something like bucket sort (or a recursive bucket sort like quicksort).  You just need to work out a reasonable scheme for storing the buckets (since they can be larger than your memory capacity). Kalani Wednesday, August 4, 2004 A toolshed or an abandoned warehouse are both excellent places to stash excess buckets until you need them again. muppet Wednesday, August 4, 2004 The great implementation advice on JOS is the main reason I keep coming back. Bored Bystander Wednesday, August 4, 2004 Don't let anyone fool you!  There is only one sort: BUBBLE!  But make sure to run it twice in case one slips by. n^2 > n log n > n How can you dispute that!  Bubble rocks! Capn' Kirk Wednesday, August 4, 2004 I believe the method of superior sorting algorithm you seek is "bogosort". Here is a link for more information: http://en.wikipedia.org/wiki/Bogosort Thank you. anon-y-mous cow-ard Wednesday, August 4, 2004 I thought the obvious answer would be to buy more memory? Ankur Wednesday, August 4, 2004 Undo your Caps Lock and everything will sort fine, my friend.   Wednesday, August 4, 2004 If you need to sort ONE very large file, then it means it has only ONE element, which means it's SORTED in any way you want. Wednesday, August 4, 2004 "Bogosort" We had a programmer who to get a random number from 1 to 10 kept would keep calling our random function (which returned 0-65535) until the return value was between 1 and 10. Also, he was an Indian on an H1-B visa. Maybe that and the bogosort is just what they teach over there? Babu Jones Wednesday, August 4, 2004 What does it mean to sort a very large file?  Are you trying to arrange the bytes in the file in ascending order? name withheld out of cowardice Wednesday, August 4, 2004 As Bored Bystander said, it's also a good idea to examine how you use the file in the first place.  If you write infrequently but read frequently, for example, you should store the file in sorted form (in this case your time to sort will obviously be O(1) but your insert time will be something like O(log n)). If you have any control over it, use a RDBMS so that you can worry about your application's primitives instead of partially reimplementing an RDBMS. Kalani Wednesday, August 4, 2004 JOE - In case you haven't figured it out yet, no one is answering your post because (1) it looks like a homework assignment and (2) the ALL CAPS screams "I'm a clueless asshat!" But, being that I'm in a charitable mood, Google for external mergesort. Yet another anon Wednesday, August 4, 2004 Pfft. O(1) is weak. I'm working on a O( n - n - 1 ) sort which actually sorts the file before you create it. I'm having a little bit of trouble with duplicate values, though. Captain McFly Wednesday, August 4, 2004 Note that JOE has not checked back. Guys like JOE expect the phone to ring with a specific answer in response. Wednesday, August 4, 2004 To give a somewhat serious answer, if faced with such a situation I would chop the file into X parts and then sort and persist each of the X parts using conventional high efficiency sorting algorithms. Imagine having 5 sorted decks of cards with the lowest cards up - it's fairly easy and efficient from there to build it into one sorted deck. Dennis Forbes Wednesday, August 4, 2004 External mergesort? Clearly some people have retroactively stolen my idea. I'm going to have to go get a patent and seek justice for this wrong. Dennis Forbes Wednesday, August 4, 2004 The obvious answer is not to sort it and go home and drink a beer - it will sort itself out, given enough time. coward Wednesday, August 4, 2004 "retroactively stolen my idea" Dude, I hear ya. Remember that math formula, sum 1...n = (n*(n+1))/2.  I invented that.  Only I did it about 200 or 300 years too late. If there's ever a time machine invented, first thing I'm gonna do is go back in time and whack that motherfucker whose name is in all the textbooks.  Just a low-life retro-thief is what he is. Yet another anon Wednesday, August 4, 2004 If you're sorting lines in the file, maybe you have enough memory to store pointers to the start of each line in the file (if not the actual lines themselves). In this case you could use any decent sort algorithm just reading the relevant lines from disk when you need to compare them... Also, like someone said, you might want to store the data in the file in such a way that it's always sorted, or build some kind of index or hash table to make sorting and lookups quicker. So yeah, essentially reimplementing a DBMS. Waste of time! been done! just embed something like SQLLite (which can store the data in a specified file and doesn't need any kind of database server daemon running). Perhaps even something simple like BDB could handle this? Matt Wednesday, August 4, 2004 The OP's question reeks of some sort of low-level computer assignment - maybe a summer school high school computer class. Dennis Forbes Wednesday, August 4, 2004 Bubble sort, of course, as invented by my acquaintance Keith, the parking lot attendant, who assures me that it is the most efficient sort algorithm of all time. "But everybody invents the bubble sort," I told him. "Yes, but I invented it first." How can you argue with logic like that? E. Naeher Wednesday, August 4, 2004 I have a book at home about mathematical cranks -- it is quite amazing the number of people who are thoroughly convinced that (a) they invented "casting out nines", and (b) they could make millions of dollars by patenting their invention and selling it to... someone.  Eric Lippert Wednesday, August 4, 2004 Mr. Lippert, how scriptin' going?   Wednesday, August 4, 2004 Sorry, I didn't realize that this was probably a joke until I saw the previous post on sorting. Kalani Wednesday, August 4, 2004 +++We had a programmer who to get a random number from 1 to 10 kept would keep calling our random function (which returned 0-65535) until the return value was between 1 and 10+++ we had a similar colleague who wrote this function: while (1) {     x = rand(100);     if (x < 3) { break; } } show_random_picture(x);   Wednesday, August 4, 2004 Did you know, your caps lock uses up CPU and ram? So things will go much faster if you capitalize properly. Anon to protect the guilty Wednesday, August 4, 2004 Assuming this question is for real.... Donald Knuth wrote some books called The Art of Computer Programming. You want volume 3. I don't know if the second edition has what you are looking for, but I know my first edition does (its an f*ing antique too), with all kinds of fold out graphs showing samples of how to do merge-sorts with tape drives or punch cards. From the bad old days when computers had like 16k of core memory and operating systems used like 2k of memory. Peter Wednesday, August 4, 2004 JOE, just a quick idea. Split the file into N files, each small enough to sort in memory. Sort each, one at a time. "Merge" the files -- go through each, pick the top strings, figure out which is the minimum, add to output file, repeat. If you can't afford to have 10,000 file handles open, merge two at a time. Alex Wednesday, August 4, 2004 Sorry, Dennis, I just swooped right down and replied. Missed your answer. Alex Wednesday, August 4, 2004 >> if (x < 3) { break; } Really appalling is that he put the break between braces! ;) Alex Wednesday, August 4, 2004 SORT. I JUST WANT TO SORT. YOU GUYS MAKE THIS STUFF WAY TOO CONFUSING. I THINK YOUR CODING ABILITIES HAVE GONE TO YOUR HEADS. I HAVE A BUNCH OF DATA IN A REALLY LARGE FILE. WHO CARES IF ITS LINES OF TEXT OR WHATEVER. I'D USE A DAMNED BUCKET IF I KNEW WHAT IT WAS. DAMNIT. SORT! I WANT TO SORT! WHERES LOTHAR OF THE HILL PEOPLE WHEN YOU NEED HIM TO MAKE THINGS SIMPLER. WOULD BE SORTER Wednesday, August 4, 2004 Buy a Mac. S. Jobs Wednesday, August 4, 2004 Would be sorter: You do realize that you've not told us anything useful. What do you mean by sort? Do you want the lines in sorted order? If they're not text, as your last post seems to imply, then what criteria to sort by, as lexical sorting will obviously not work. What kind of system are you on? If you're on a Unix system, and you're dealing with text files, you can use the sort command. But if your OS is a secret, it's kind of hard for us to know the easiest way. If you expect an answer, you should stop being an asshole. We don't know everything about your problem, and you won't tell us. You won't be polite, you type in all caps, and generally are acting oafish. Our abilities haven't gone to our heads; rather, our abilities allow us to see a huge number of variations on your problem, and we don't know which applies. Furthermore, it sounds like a homework question. If it is, do your own damned homework. If not, why the secrecy? Anon to protect the guilty Wednesday, August 4, 2004 The right sort for your sort is a good sort. Google "porn" and don't come back. caltrop Thursday, August 5, 2004 This would be childsplay in Delphi. Just me (Sir to you) Thursday, August 5, 2004 HELP ME SORT. WOULD BE SORTER Thursday, August 5, 2004 "HELP ME SORT." No. Anon to protect the guilty Thursday, August 5, 2004   Fog Creek Home