Fog Creek Software
g
Discussion Board




SpreadSheets rows limit

First, I don't undestand why excel (or others spreadsheet softwares) have a number of rows limit. My PC has 512MB ram, but I'm limited by this. Sad :(

Ok, my problem: I have some CSV text files with a lot os rows. How do I manipulate the data in the files ? Delete some columns, multiply or divide lines/columns by a constant, sort, filter, etc...

Someone knows a software (free if possible) that do this ?

Of course I can make pearl or c# scripts myself. But a ready solution will be great.

Augusto
Sunday, October 5, 2003

Use Access.

Or use Cygwin's "split" utility to split the files into
smaller one that Excel can handle.

Amour Tan
Sunday, October 5, 2003

"First, I don't undestand why excel (or others spreadsheet softwares) have a number of rows limit."

2^16 = 65,536.  Offhand, I'm going to say it's because the row index is still using a 16-bit variable.  (=

I don't have it on my machine, but does OpenOffice.org have the same limitation?  Alternately, if you wanted to stay in the MS Office family, you could import your CSV data into Access.  (Access would actually let you manipulate the CSV file as a linked table, so you mightn't even have to import and export -- although depending on how much crunching you're doing, this could be rather slow.)

Sam Livingston-Gray
Sunday, October 5, 2003

If you're running into the size limitations of a spreadsheet, that's generally an indication that a spreadsheet is the wrong tool for the job.  It's like trying to cut railroad ties with a hacksaw.  Spreadsheets are designed to do the sort of things that people normally do with spreadsheets.  People don't normally go anywhere near the 65000 row limit.  Something like MATLAB, IDL, Octave, or even ordinary C/C++/Fortran code would likely be a better fit your purposes.

Matt Latourette
Sunday, October 5, 2003

Sam,

I know the limit is by the variable limit. I just cant understand why the design is "static".

Quatro Pro or Lotus 123 in 1981(?) with this limit is admissible. But more than 20 years and the same limit ???

Augusto
Sunday, October 5, 2003

First, try opening a data source where you keep the data in a Access table or MSDB (free SQL Server) table. That way you can still do pivot tables on it.

Second, if the data averages out, you might try a trick one of my coworker did, you basically randomly delete the original data so that you are left with an average sample of less than 60K rows.

Third, get a real analysis tool.

:-)

Li-fan Chen
Sunday, October 5, 2003

There are APIs that allow you to query csv files as if they are database tables, so you could use SQL. I know perl and python have libraries available.

Tom Vu
Sunday, October 5, 2003

Li-fan Chen, what is wrong with using Excel as a manipulation tool? I know it won't work for his current problem of >65K rows, but Excel is excellent for quick manipulations of data. I even use it to write repetitive code.

Troy King
Sunday, October 5, 2003

It depends on what you want to do after you've gotten the data into a program, but our scientific graphics program (GraphPad Prism) allows an essentially unlimited number of rows. You can get a 30 day demo from:
http:\\www.graphpad.com

You'll be able to import your data. After that, it depends on what you need to do.

Harvey Motulsky
Sunday, October 5, 2003

If you have that much data either dump in in excel, which is the better tool anyway, or throw it on a Unix box and use Sed and Awk.

Mike
Monday, October 6, 2003

Oops, meant to say dump it into access

Mike
Monday, October 6, 2003

re: If you have that much data either dump in in excel, which is the better tool anyway, or throw it on a Unix box and use Sed and Awk.

You don't even need to dump it on a unix box to use awk. There's versions of gawk which will run on a windows box. google it to get the right one for your system.

I reguarlly handle text files with 100K or 200K lines where the format is a pipe delimited fields and crlf terminators and i have found awk to be very easy to use. If you understand basic c syntax, then you can get a script which does a count of the fourth field of every line in the file in two lines. I also find awk to be very quick, both in script creation and in general running time.

ko
Monday, October 6, 2003

I ran into this limitation the other day when I tried to import an IIS log.  I wanted to use the filtering and sorting features on it but it exceeded the 64k row limit.  I ended up writing a script to prefilter out some of the events that I knew I wasn't concerned with.

SomeBody
Monday, October 6, 2003

Using a canoe to transport people across the Atlantic is feasible but not practical.

Simon Lucy
Monday, October 6, 2003

Unfortuantely OpenOffice is worse than excel, the row limit is 32768. I usually use awk for simple calculations on csv files.

Martin Beckett
Monday, October 6, 2003

With the full 64k rows in use, Excel responds instantly for me.  It's very quick and easy to drill down to the rows I need using the active filter feature.  It also has no problem handling a formula extended throughout all rows.  I'd hardly consider it impractical. 

The reasons why Excel is very practical for things like this are (a) it's already installed on my computers, (b) I already know how to use it, and (c) it's already installed on pretty much everyone else's computers. 

I better analogy would be taking a cruise ship with a capacity of 2000 passengers and only letting 100 on board.

SomeBody
Monday, October 6, 2003

Thanks Mike. I didnt know about awk. One 30 minute course on it and I think I get _the_ solution. Great tool.

I should learn linux too. It has great 'secrets' to the windows developer...

Augusto
Monday, October 6, 2003

Augusto, I know a little Sed and am just learning awk myself.  We do use Sed in production, on Windows, here at my work.  We use it to run over some .csv files and replace characters and whatnot, then the file is later imported to SQL Server. 

Whether you like *nix or think it sucks there is one thing seems to outpace just about anything on and that is the sheer power of munging text. 

Mike
Monday, October 6, 2003

learn perl.

sometimes known as the pathalogically eclectic rubbish lister.

who is this larry wall guy anyway?

mb
Tuesday, October 7, 2003

The column limit is even more severe.

If anyone knows of spreadsheet software without artificial limits on rows/columns, please advise!

i like i
Tuesday, October 7, 2003

(To explain how severe the column limit is, my dataset was a 766x766 grid of scores.  Could I load it?  Could I hell..  grrhhh)

i like i
Tuesday, October 7, 2003

*  Recent Topics

*  Fog Creek Home