Fog Creek Software
g
Discussion Board




"Better" language vs. Big Software

Since the Smalltalk-thread came up, I have a general question. You very often hear that rather academic (read: interesting but not used in practice so much) languages like Smalltalk, Ruby etc. are real performance-boosters concerning development time. I'm a student and this is something many of the older lecturers claim.

I never bought into that very much. I'm mostly programming in C++, some Visual Basic and smaller scripting stuff at work (part-time). I always wonder if these claimed productivity boosts hold up for larger applications. I don't care very much if Smalltalk beats C++ for 200 LOC applications. But I'm not sure if development time really is, say, halfed when we're talking 500.000 LOC (C++) applications. Neither am I sure you _could_ develop a software that size with the same characteristics. Would it load/run as fast as a C++ app?

A specific question: it seems hard to get a decent comparison, but are there any large and wide-spread software applications written in something else than C/C++?

A Student
Wednesday, December 10, 2003

COBOL, FORTRAN... :-)

ice
Wednesday, December 10, 2003

I've seen production guis in smalltalk, I've seen python in production, the yahoo stories was (is?) written in LISP.

Don't believe all the FUD, try them out, see what you think, in production you should think about the best tool for the job.

fw
Wednesday, December 10, 2003

Since I failed to mention Ruby and did screw up my spelling, maybe I should write something in ruby to say 'this makes no sense, reword it before you post it you little monkey'

fw
Wednesday, December 10, 2003

This article "Scripting: Higher Level Programming
for the 21st Century" written by the designer of TCL has a table of examples comparing the size of Java and C programs to their TCL and some perl equivalents.

http://home.pacbell.net/ouster/scripting.html

"In every case the scripting version required less code and development time than the system programming version; the difference varied from a factor of 2 to a factor of 60."

Matthew Lock
Wednesday, December 10, 2003

I was just having similar thoughts myself.

I love old books, finding random 2nd hand copies of classics.

Recently I've acquired Smalltalk 80 [1] and Advanced C Primer ++. [2]

Theres only a couple of years difference between the 2 books, but the suprising thing is how up to date Smalltalk 80 is and how archaic the C Primer is.  This is unexpected, because C++ is the won the language wars, not Smalltalk.

C(++)'s real power is its agility.  The big projects are all written in this language, but the principles are stolen from all these other upstarts.

Take the Windows API, for example.  All those messages being sent around are emulating Smalltalk.

What I think happens is that all the productivity boosting features offered by the new languages are absorbed by C++ in the form of libraries or frameworks.

[1] http://www.amazon.com/exec/obidos/tg/detail/-/0201136880/002-6121207-8112800?v=glance
[2] http://www.amazon.com/exec/obidos/ASIN/0672224860/pdxbookscom/002-6121207-8112800

Ged Byrne
Wednesday, December 10, 2003

I believe C++ can never be as inherently productive as other languages. The design decisions taken just don't lend themselves to it.

You're always going to have to worry about stray pointers, memory leaks, obscure little corners of the spec that will bite you one day, etc.

The decisions were made because the C++ philosophy is "no overhead if you don't use it, and as little as possible if you do". Which lends itself well to machine efficiency, but very poorly to human efficiency.

Although some people have even questioned the machine efficiency aspect. An interesting speculation from some of the non-mainstream language advocates is that one of the reasons C and C++ perform as well as they do is that there has been so much time and money spent on optimising the compilers. They suggest if as much time was spent on alternative languages, then C and C++ might not hold such an edge.

Interesting thought, but I doubt we're ever going to see Smalltalk or Lisp catch on to the extent that such an investment ever occurs, so we'll never know.

As for the original post, some people speculate that one of the reasons Lisp and Smalltalk never caught on is because it was never the scripting language for anything. They believe that many large projects start out as small projects that someone hacks up in a scripting language first (with C being regarded as the original "scripting" language for Unix). Hence because Lisp and Smalltalk don't get used for small projects (where the convenience of having the tool already there tends to dominate the long term consequences), they didn't get the momentum of being used when those projects turn big (which is when you really need to care about having a better language!)

Personally, I think it's a load of bunk and the reasons why Lisp and Smalltalk never caught on are largely sociological and political, but I find the theory interesting anyway.

Sum Dum Gai
Wednesday, December 10, 2003

Ged, they're not always assimilated well -- compare the boost lambda library to Scheme's lambda/closure type, or Prolog's constraint-based programming to C++'s ??? (maybe SQL is the closest to this).

K
Wednesday, December 10, 2003

About ten years ago, I was working on a pretty big system, written in Smalltalk.  It was a big simulation for complicated electronics systems and had a built in rule engine and lots of other stuff.  It was a big, complicated system, even by today's standards.

Smalltalk made writing code quickly and cleanly a breeze.  I've done a lot of non-Smalltalk work, both before and after that project, and I can honestly say I've never used a more developer-friendly environment than the VisualWorks Smalltalk package (once you get used to it), or even the Digitalk Smalltalk package we used before it. 

Even better when these environments were augmented by some of the (way ahead of their time) tools that we added in, like Envy Developer for version control, or the Refactoring Browser for code maintenance.  I remember one gizmo we had that would try to help you with potential compile problems.  For example, if you compiled a method that referenced an undeclared variable, a dialog would pop up asking if you'd like to define it as a temporary, define it as an instance variable, fail the compile and edit the code, or just compile and move on.  The entire system was designed to make the process of writing code quick and easy.

The project immediately following the Smalltalk work was to build a similar system for a different client, but in Java.  Granted, that was in 1997 when Java environments were relatively immature, but the switch was shocking.  Where before I felt like I was bouncing along at a comfortable pace, now it felt like trying to run through deep mud. 

Even now, using Visual Studio .NET with C#, I still offer an occasional comment to my teammates about how much *more* productive we could be if we were using my old Smalltalk system.  It really was that good.

I will admit, though, that some perspectives on Smalltalk wouldn't be so kind.  It was hell to deploy a Smalltalk application properly. Big time.  I'm still in therapy for it :P

It was also, in some instances, hard on maintenance programmers because it was just as easy to write really bizarre, enormously ugly code as it was to write clean, elegant code.  The environment would conspire to help you do it :)

It was also probably hard on the end users because the GUI was non-standard (for Windows apps), and the system was packaged as a single, enormous executable so it took a long time to load up. 

Anxiously awaiting Smalltalk.NET
Wednesday, December 10, 2003

Sum Dum Gai, perhaps I'm missing your point with the stray pointers and such, but what's the problem with using a garbage collector in C++ or selectively using a managed pointer type?  You can use a garbage collector on old programs that you'd rather not delve into to fix some odd memory leak or another, and managed pointer types for libraries where it makes sense (eg: some library for parsing regular expressions and returning a dynamically allocated state machine representing the expression).

The STL's auto_ptr is good for small-scale simple libraries (like the regex example), and boost has a shared_ptr type that does reference counting (instead of auto_ptr's single owner model).  Neither of those will help with cyclic dependencies of course, but that's where you might hook in a traditional garbage collector.

As complex as C++ is, its features are more or less orthogonal, so the esoteric details of other features don't affect the use of (say) managed pointers.  Of course, you still need to know which details are relevant to which features you're using, but that's true of any language.  Deterministic destruction is one C++ language feature that you've got to understand to really get how managed pointers work in the language (though I think that it's widely regarded as a benefit).

K
Wednesday, December 10, 2003

"You're always going to have to worry about stray pointers, memory leaks, obscure little corners of the spec that will bite you one day, etc"

Let me preface this by saying that I think garbage collection, as seen in .NET, is one of the most idiotic and most poorly thought out concepts going -- it's taking something simple (object goes out of scope -- destroy it) into something ridiculously complex. Now instead of having to worry about stray points and memory leaks you're worrying that you leave an ADO.NET connection on the "stack" that garbage collection will destroy whenever it gets around to it --- Did you call .Close? If not it's close to as bad as leaking massive amounts of memory as your scalability self-destructs when all of the sitting-in-the-dump connections still connected hang around.

Whenever I program in C++, including with some very large projects, I've never, ever had a problem with lifetime management of memory objects. If I did, I'd have no trust in my own code as there are countless less obvious, but more damaging,  mistakes that you can make in a hand-holding environment like .NET.

Dennis Forbes
Wednesday, December 10, 2003

K,

they don't need to be assimilated well - just well enough.

Ged Byrne
Wednesday, December 10, 2003

Spot on Student,

I think the reason academia is often overstating benefits of language X over language Y is that for most their development world centers around writing brief < 5KLOC demonstrations/examples that need not be maintained at all.

Annecdote: Years ago there was a very large public software development that went spectacularly wrong (yes, you have heard about it). A friend mine had worked on this thing for one of the subcontractors and had explained to me (as one does over beers) how the whole development was completely fucked up since there was no QA on the (very numerous) subcontracters at all and schedule management was a joke. As a result all the subcontractors had put this thing on lowest priority, and only put their most junior developers on it in a sort of last minute quickly put out something that arguably complies with the contract.
As it happes a few years later there was a talk by a university professor reporting on a large international academic post-mortem study they had completed on this failed project. Their main conclusion: The project should have used Java instead of C++.

Just me (Sir to you)
Wednesday, December 10, 2003

I would rather write and maintain a Python project than a C++ project.

Why?

Because the Python source will in most cases be a lot shorter than the C++ source.

There is also a practical reason of using Delphi, VB or Python rather than C++: it is much, much harder to find a competent C++ programmer, than a competent VB programmer.

I learned this the hard way. :-(

Also, about suitability to very large projects, among some modern languages I know:

C# and Java are excellent for very large projects

Python, Delphi and C++ are good for very large projects

VB, JavaScript, etc are a disaster for very large projects :)

MX
Wednesday, December 10, 2003

I know for a fact that one of the largest banks in the world (top 5) uses Smalltalk for their clearing system. It is used to move $18 bn every night.

whattimeisiteccles
Wednesday, December 10, 2003

"Let me preface this by saying that I think garbage collection, as seen in .NET, is one of the most idiotic and most poorly thought out concepts going -- it's taking something simple (object goes out of scope -- destroy it) into something ridiculously complex. "

I have to disagree with this statement in the general case.  First of all, in an environment where there are a lot of long-lived, collaborating objects, it is very difficult to get the manual deallocation logic correct in the first place.  Even if you do manage to get it right, you have to build a lot of knowledge about system structure into places that have no business knowing about such things.

Second, this makes the code very brittle. Making a modification requires you to know a lot about how and where memory is managed.  Finally, you get into a situation where you design around memory management rather than designing around the semantics of your problem and solution strategy.

There are times when you *must* optimize your memory management and there are situations that lend themselves naturally to a manual approach, too (highly procedural code, for one). It is good to be able to understand what's going on and to do it correctly. These situations are the exception, rather than the rule, in a lot of instances though.  In most other cases, robust garbage collection is a far superior approach.

contrarian
Wednesday, December 10, 2003

From my ( little ) experience, interpreted languages are much more productive than compiled ones, especially if they have a good debugger.

Let's say you're coding a DB application which it takes 5 minutes to make the query. You're looping thru your 2,192th record, and find a bug when a field is NULL. If you're in C++/Java/other compiled language, you'll have to:

1) stop the program
2) make a correction and hope it works
3) recompile
4) Lose 5 other minutes waiting for the query

If you're in VB, or other interpreted language you:

1) Make a correction in the code WHILE THE PROGRAM IS STILL RUNNING
2) Drag-and-drop the cursor to the desired instruction
3) Check if it works, and repeat until it works

On some applications where I work, although some things in VB bugs me, it would be a nightmare to switch to C++/Java. Sure, you can serialize/unserialize your data so it takes less time than querying the DB. But as soon as the data is more complex than a flat struct, the code to do the serialization becomes harder to write.

Eric V.
Wednesday, December 10, 2003

I meant run the query, not make. Sorry

Eric V.
Wednesday, December 10, 2003

yeah but I can't have my users downloading Perl or Python in addition to my .exe.

VB and C++ for now.

 
Wednesday, December 10, 2003

The user never has to download Perl or Python.

You can precompile your perl (I would guess python too) and then bundle the interpreter, your code, and any 3rd party libraries into a nice installer.

VB uses a runtime too, ya know.

In these days of 250GB hard drives (where most people probably have at least 20GB), I doubt the users will notice an extra 1.5 to 3 MB in the EXE.

I haven't been a perl programmer in a while, but when I was the perl.exe distributed alongside my windows app was 1.5MB and had no external dependancies.  It was a simple master/detail grid using Perl/Tk, you unzipped it to a folder to install, and the total size of the entire thing was 5MB.

Damn, I coulda saved so much space if I'd been allowed to use Delphi :P (which I would have, but it had to run on Linux and Solaris too).

Richard P
Wednesday, December 10, 2003

For the record, for Python (on Windows), it's even easier.
You just have py2exe generate a directory containing a .exe, with one or two dlls.

GP
Wednesday, December 10, 2003

The way .NET and Java implement garbage collection is non-ideal I think. It's certainly possible to have garbage collection and also have your database connections closed for you in most cases (and almost all important cases). Other languages do it, but neither Java nor C# aimed to be the most innovative languages, and instead go with "safe" choices (safe as in not too far from what people are used to).

The problem with memory management in C++ is that there are just so many ways to do it. It's fine if you're the only one working on the code and you can mandate consistency for yourself, but that's unlikely to be the case in practice. Even if you're the sole developer, you still have third party libraries to worry about.

Garbage collection can not just be added in to C++. It is impossible to write a garbage collector that collects 100% of the garbage in C++, at least without extending the language. The garbage collector simply doesn't have enough information to know what is a pointer and what isn't. So it has to be conservative and treat anything that could be pointing to memory as pointing to memory. So it will always run the risk of leaking.

What's the point of garbage collection if you're going to leak anyway?

Anyway, back to C# or Java, it's easy to tell if you're holding important objects by not disposing them. Put in their finalizers something like:

DebugMessage("Blah finalizer called");

That way, if they ever reach finalization, rather than being explicitly destroyed, you'll know the first time you run that code. Which is a hell of a lot better than the average stray pointer or memory leak error!

Sum Dum Gai
Wednesday, December 10, 2003

AFAIK, Java and .NET use "true" garbage collectors.  That is, they go out and figure what's garbage and toss it.

Python (and maybe Perl, IIRC) use reference counting collectors.  Objects are "garbage" and deallocated as soon as the reference count goes to zero.

So, the only case that's complicated is circular references (A reference B references A, or longer variations thereof).  Python has a secondary "cycle collector" that looks for these loops and deallocates them, as long as none of the objects have finalizers.  So, if you have cycles containing external resources like files, you will still want them to have an explicit close mechanism.

But the vast majority of the time, you don't even need to think about it.

Phillip J. Eby
Wednesday, December 10, 2003

"The way .NET and Java implement garbage collection is non-ideal I think."

Why?

contrarian
Wednesday, December 10, 2003

Well, garbage collection may be implemented like sh#t in Java, C# and Python.

From a productivity point of view, I don't know, and I don't care.

What I care is that they work pretty well, and this increases programmer's productivity.

Things don't need to be perfect (especially from an academic point of view) to enhance your productivity.

You may have an academic programming language which is conceptually perfect, yet not very productive, and you can have a programming language that's not as perfect, but it's a lot more productive and practical.

When talking about productivity, practical things like library availability (is there a library to access database X?), third party components, ease of writing in the language, etc, matter a lot.

MX
Thursday, December 11, 2003

Why? Because there are ways of implementing garbage collection that allow some types to be garbage collected, and some types to be automatically collected as they go out of scope.

I feel that's a better choice. Other's opinions may vary on that fact.

In practice Python's way of doing ref counts + loop detection is pretty good, although it is somewhat slow compared to a proper gc. However, the programmer experience is good, so given Python doesn't aim to be super fast, it's a good comprimise.

As for library availability, the importance of that can vary wildly. In some cases, it's the most important thing. In other cases, you're doing most stuff from scratch and so it doesn't really matter. Horses for courses.

In any case, there are libraries for many common tasks (sockets/internet protocols, xml parsing, database, etc) available in C, C++, Delphi, Java, C#, Perl, Python, etc. Library availibility is a bit of a red herring, since most established languages have pretty good libraries available these days. Additionally, most languages can call C code fairly easily, so if the library is available in C, it's available pretty much everywhere.

Sum Dum Gai
Thursday, December 11, 2003

Sun Dum Gai: It's not true that Lisp and Smalltalk were never the scripting language for anything. Smalltalk wasn't, so far as I know, but versions of Lisp are used for scripting (1) Emacs, a text editor and (2) AutoCAD, a CAD package. As is the way of such things, the versions of Lisp they use are ugly, old-fashioned, poorly implemented and generally not such as to help the reputation of the language :-).

Lisp has been used for some large systems. Notably, the "Lisp machines" that were (slightly) popular back in the 80s (before "AI winter" set in) had all their system software written in Lisp. Operating system, compilers, editors, mail clients, the lot. People who have used those machines say that they were among the best programming environments *ever*, which suggests that Lisp isn't a bad language for such purposes.

Gareth McCaughan
Thursday, December 11, 2003

Dennis: "Let me preface this by saying that I think garbage collection, as seen in .NET, is one of the most idiotic and most poorly thought out concepts going -- it's taking something simple (object goes out of scope -- destroy it) into something ridiculously complex."

I agree 100% with this. There is no reason why memory cannot be allocated / freed automatically using scope rules. You can do this in C++ if you don't alias pointers. I even did this in http://www.lingolanguage.com which manages arbritarily complex structures without explicit allocation or deallocation instructions. In my view gargbage collection is a broad highway leading to a dead-end in language design.

Bill
Thursday, December 11, 2003

"There is no reason why memory cannot be allocated / freed automatically using scope rules."

And what if you have something that has a dynamic lifetime, or lives beyond the scope it which it was allocated?

Chris Tavares
Thursday, December 11, 2003

"I agree 100% with this. There is no reason why memory cannot be allocated / freed automatically using scope rules."

If you don't mind making memory management the primary focus of your application design efforts, this may be true.  I prefer to focus my design efforts on the problem I am trying to solve. 

What's worse, if you are developing libraries or anything that is meant to be used by someone other than yourself, your approach will many times force a similar decision on your client.  Instead of constraining his use of your product through semantic constraints, you do it through temporal constraints and memory use.  This isn't something I want to do to my clients, and it's certainly not something I want done to me. 

I have yet to run into a general situation where I'd call robust garbage collection a liability.  I cannot say this about any alternative with which I am familiar.  Garbage collection for me.

contrarian
Thursday, December 11, 2003

contrarian: "I prefer to focus my design efforts on the problem I am trying to solve. "

I agee, and that is exactly why memory should be handled automatically without the non-determinism introduced by garbage collection (and as accurately described by the OP)

Bill
Thursday, December 11, 2003

Bill,

I don't see why automatic memory management, without the non-determinism implies a focus on your problem.  (Unless your problem is automatic memory management without determinism).  Also, I don't understand the reference to the OP.  Can you clarify for me?  thanks!

contrarian
Thursday, December 11, 2003

It seems that, with a little discipline, good resource management can be achieved without the overhead of a garbace collector.  'Resource Acquisition is Initialization' for example[1][2]

I prefer a GC myself, but I always like to know there are alternatives out there.

[1] http://www.artima.com/intv/modern3.html
[2] http://www.relisoft.com/resource/resmain.html

Ged Byrne
Thursday, December 11, 2003

I don't want to step in to the argument about GC vs non-GC, but I can tell you I want the programmers who I have working on the application that I'm managing to be thinking about what the app has to do, and what the user wants and not to be thinking about memory management techniques. If dotnet isn't efficient, I don't care until impacts my app and if required write the smallest subset possible in unmanaged code. I really doubt this will be the case.

This is the real gripe I have with C/C++ used as a language for writing standard business applications. We've proved it for ourselves that (even crummy) VB is more productive than C/C++ in most cases.

pdq
Thursday, December 11, 2003

Quite late follow up, though, I could not stop myself.

My humbe opinion; the direction of discussion is diversed from the topic, first of all.

The language you use in building a software product, that can be a library, an operating system, a simple desktop application or a web site, depends on the purpose. Languages have one major purpose at the very first sight. In the mean time, they differ from each other on implementation of the logic they carry on.

I have used primarily C and C++ in almost all of my applications (including web pages). Because I have had two primary purposes:
1. Having broad control over the platform application will be used
2. Having the top speed.

On the other hand, I do not agree with commenters exposing requirement of garbage collector. I think, there is no GC required. Of course, a developer or all developers involved in a project have to have the clear picture in their minds, and know when an object is constructed, destructed, or needed. Developer should first understand the logic, this is why they code for, so they have to understand this part clear - there is no escape from this. One should then design a solution that maintains both speed and stability along the run-time of the application. Speed is a key concern - speed is why we use computers, they make things faster than humans. However, development team may need to inbalance the "speed-stability" equality, because code ending up very high speed may cause weak implementation in logic layer.

Big software still can be written in C and C++ with a good development team, with those who know the code, can design scalable but fast algorithms. Better language... This depends on the purpose. This subject, I think, exposes a beautiful side of .NET environment - you can use any language that generates CLR code. By the way, as far as I know, .NET uses generational GC, that traces objects along the run-time, moves them if required, reduce the fragmentation, and also you have a choice (in C++, I am not sure about other languages) to "pin" the pointer, or, you may keep it alive in unmanaged heap, if you have a paranoia about collection of your object from the heap. But, I do not like C++ .NET and I find .NET very slow + not wide-spread enough. .NET is not something dumb. We may not maintain memory as good as .NET GC does, we may cause fragmentation, because we allocate, then free, but dont/cant move objects to another location to reduce the fragmentation which is a critical performance concern.

Microsoft builds giant packages with .NET, such as BizTalk server, and use mixed code in in Visual Studio IDE, Microsoft uses ASP.NET and other .NET packages in their own live web site, in MSN world, in the company... If there would be something wrong with it, I guess Microsoft would be the first mouth loudly and clearly swearing .NET's mother.

Old Geek
Friday, July 9, 2004

*  Recent Topics

*  Fog Creek Home