Fog Creek Software
Discussion Board




Syntactic similarity of languages

I agree that the syntactic differences between imperative languages are generally uninteresting, but don't forget about functional languages (CLISP, ML, Scheme, etc.)  It is significantly different to program in these languages than imperative languages, and it would be nice to have the choice to use these.  Especially if one will be able to gain easy access to the .NET class library, as well as portablility to all systems implementing the Common Language Infrastructure.

Kevin Postlewaite
Monday, May 06, 2002

If you really would want to argue the point about .Net then you could say following (of course the best way is always of course to choose the right tool for the job):

>"But how do I develop a UNIX command line utility in .NET?"
Just write a .Net app and let it run on go-mono (http://www.go-mono.net).

>" How do I develop a tiny Windows EXE in less than 16K in .NET?"
Just write a normal .Net app. Since it compiles to MSIL and has access to already all libaries (no linking necessary), you will normally get applications which are in the 10K area .Of course this assumes that the CLR is already installed, but that is a flaw in MS's execution not .NET.

Mark

Mark
Monday, May 06, 2002

>> but that is a flaw in MS's execution not .NET.

Flaw?  I think that is the upgrade strategy, not a flaw in execution.

Nat Ersoz
Monday, May 06, 2002

"Of course this assumes that the CLR is already installed" -- aha you figured it out.

With all due respect to the mono project I don't think it's quite ready yet.

Joel Spolsky
Monday, May 06, 2002

Its featuritis.  James over on the third floor loves Perl, so now he can write code for .NET in Perl.  Mike over on the fifth floor loves C so he can write .NET apps in C.  Caesar on the tenth floor, that hippy in the corner, loves Haskell to the death, and he can write his .NET apps in .NET.

It is advantageous for two reasons.  First, the people who complain "I'd use this API but its not available for Language X" stop complaining.  Two, the more languages .NET is available to, the more apps will be written to the .NET runtime.  In theory you can take advantage of that business-wise.

Adam Keys
Monday, May 06, 2002

The problem is not that the languages are all the same except for the syntax, it's that almost none of the languages survive with their semantics intact.  In C++, they had to get rid of destructors and multiple-inheretance.  Last time I checked, these were pretty core features.  Kind of precludes the use of ATL, wouldn't use say?  Java probably could be ported verbatim, since that is what they copied and incrementally improved anyway.

Orangefield
Monday, May 06, 2002

You've sure heated up the weblogs today, Joel. :) Here's my take:

http://www.quality.nu/dotnetguy/2002/05/06.aspx

Enjoy!

Brad Wilson
Monday, May 06, 2002

""" It is advantageous for two reasons. First, the people who complain "I'd use this API but its not available for Language X" stop complaining. Two, the more languages .NET is available to, the more apps will be written to the .NET runtime. In theory you can take advantage of that business-wise. """

Nice in theory, looks good in marketing material, doesn't work in practice. Unfortunately, and I find this very saddenning, most of the software industry today produces low quality products, that can only be used in very limited circumstances.

Let's put Lisp and Haskell aside for the moment - the 3 people using them actually know what they are doing, but ...

Let's say I'm using Visual Basic to write code, and I'm trying to employ a C# or C++ component. It seems to work well, but then it breaks. So I launch the debugger, and go looking through the component to find exactly where it breaks... Except that I'm not that proficient with C# or managed C++ (which has substantial semantic differences compared to  standard C++, with which I'm familiar). I find what looks like the problem, try to fix it, and it seems to work - but it breaks something else, because of my (lack) of proficiency. That is, of course, I assuming I have the source of the component to begin with - which I'm probably not going to in the near future.

"But that's always been a problem with component software ... wasn't it?" Well, yes and no; Yes, if you couldn't debug a component yourself and it had a flaw you were in a big problem. However, the well defined interface between languages and/or technologies PROMOTED interfaces that are complete, and (relatively) well specified semantically; It also more-or-less forced the component designers to think about typical use cases in an environment other than their own.

Come .net, this distinction is going away; C++ class libraries, or COM objects meant to be used from C/C++ often make it a requirement to inherit an existing class or create a new COM class adhering to a given interface.  How easy is this to comply with? Pretty easy, as long as you're using C/C++; Little harder with other interpreted languages such as Python or Perl, but still pretty easy. Hard or nearly impossible with other languages like batch files or VB (last time I used it, which was a LONG time ago - has this become easier). [And, yes, batch files are still immensely useful]. If you look at it closely, you'll notice that components designed for use within VB look and behave distinctively different than those designed for use within e.g. C++, even if they are both using the same COM technology. It's that way for a reason, and it should stay that way.

And it WILL stay that way with professional component designers, because different languages have different idioms, different uses and different needs - and in order to be able to sell their merchandise, they'll cater to those needs.

But it won't stay that way with less-than-professional programmers, which constitute the majority of developers out there. If .net actually delivers the language neutrality (which it doesn't seem to -- as noted by others, .net is more of a skinnable language than anything else), apps developed by 2nd-floor Perl-loving Mike, 4th-floor VB-religious Jake and 5th-floor Haskell-master Jonathan will be entirely unmaintainable and largely buggy.

I expect every shop to standardize on one (or at most two)  languages in .net - which is how things are now. People seem to be forgetting that it's not he mechanics of combining two languages that are the problem (COM, JNI, Swig, ffcall, and other solutions already do this quite well) - it's the semantics and idioms. If you have problems combining C++ and Java through JNI or Swig, you won't be doing that through .net either (except, possibly, by using a C++ subset that's perfectly compatible with Java semantics, which is exactly what managed C++ is all about - and for which the name C++ is a misnomer).

Ori Berger
Tuesday, May 07, 2002

"Let's say I'm using Visual Basic to write code, and I'm trying to employ a C# or C++ component."

Systems built out of multiple languages have their problems.  I acknowledge those problems and am not condoning those problems.  However, I think it is a truth that the number of possible apps for a system would go up as the number of languages that system can be coded with increase.  Perhaps not linearly, but it should go up. 

Said apps need not be cross-language implementations.  Given the nature of .NET, anything that does not use the implementation language of the CLR will be a cross-language implementation.  This doesn't have to be a bad thing.  When you get right down to it, most systems are cross-language.  Java, Python, Perl, Emacs and any kind of command interpreter are all implemented with a mixture of high-level assembly (C) and then some kind of high-level language above that.  It is presumed that if you are mucking around in the C part of the implementation, you are familiar with both languages.

I am not too familiar with component software, except for the plumbing I've picked up in Unix.  I know the gist of things like COM and Bonobo and what they purport to be doing.  I am not quite convinced that component software is an answer to anything besides GUI applications.  Perhaps you could enlighten me?

I agree with you on standardization, but I think it will most likely happen at the project level and probably at the in-company group level.  For example, the Widget business group might standardize on C# while the Automaton department might standardize on Haskell.

Regarding language neurtrality, I must admit I'm watching all this from the sidelines and have not gotten my hands on anything .NET.  The commonly-held notion that .NET reduces languages to a compatible subset seems reasonable to me.

Adam Keys
Tuesday, May 07, 2002

""" I am not too familiar with component software, except for the plumbing I've picked up in Unix. I know the gist of things like COM and Bonobo and what they purport to be doing. I am not quite convinced that component software is an answer to anything besides GUI applications. Perhaps you could enlighten me?"""

IMO, It works best for 'interactive' applications, of which GUI is a prime example. By 'interactive' I mean "dependent on input which arrives while program is running, possibly based on previous program output"; It doesn't have to involve GUI - it can be motor control program or something.

It usually doesn't work as well for batch applications; I don't see anything material that would make it so, but it seems most components are designed for interactive systems.

Unix plumbing is significantly simpler than COM or Bonobo - it's usually based on passing textual data, more often than not in just one direction. COM is in many senses not much more than a well-defined ABI for a C++ class (inaccurate, but close enough for a one-line description). Like many C++ classes and in constrast with Unix plumbing, COM components assume that whatever uses them is willing to be used interactively.

Let's take, for example a simple Unix "component integration" example:

wc -l * | sort -nr | head -10

(that means "for each file in the current directory, count the number of lines in that file; Then sort the output numerically with the large numbers at the beginning, and finally select the first 10 items. Yes, it says all of that in that line, and it's standard practice to actually do this).

A solution in COM that does the same would be along the lines of having a "count" component, that gets a list of files and provides a similar list with line count appended; a "Sort" component -  a class that recieves an input, sorts it by some criterion and makes it available in sorted order; Also having a "head browser" component that takes input, goes through the beginning of that input until some condition (e.g., 10 items) have been provided and then terminates. Very similar to the distinct Unix components.

The difference lies in how the components communicate - in Unix, the natural way is one unidirectional textual stream. In COM, input would be provided instead by creating an instance of class "Enumerator" (the COM name for iterator) that would yield each input/output in turn when called. The 'sort' would get an enumerator object from 'wordcount', upon which it will call a method each time a new item is required. Sort would provide an enumerator to "head", which would produce sorted outputs. All components actively participate in the data flow process - in contrast to the unidirectional nature of the Unix way.

Components such as 'wc' and 'sort' are usually NOT available in COM though - the overhead they incure (in lines of code) per use is not worth the trouble. The COM model is much more flexible than the Unix pipe, but that flexibility costs a lot in complexity. Personally, I prefer the Unix model whenever feasible - you pay with performance and flexibility, and in return get amazing returns in (reduced) complexity.

In unix environments, integrators are often sysadmins (want to backup only files with length divisible by 50? you can do that in one line using "find", "grep", "tar" and friends). In Windows environments, integrators are programmers (want to backup only files with length divisible by 50? VB or Python are your best bet, but you need to be proficient with COM, and it's going to take you signficantly more).

Ori Berger
Tuesday, May 07, 2002

ori - what you say is certainly true, but it is unfair to "blame" com i think. can one pipe the output from one x-windows app to another (and so on, ad infinitum, with switches to indicate what behaviour each tool should have)? in that respect clis seem much more flexible than guis to me.

nope
Wednesday, May 08, 2002

I'm not putting a blame on COM, and - no, it isn't possible to pipe X applications one over another. The Unix way doesn't work well for GUI apps in general. Thera are some notable exceptions, though - GDB (The GNU Debugger) is a debugging engine that is used interactively by many GUIs. This is done by a well-defined two way textual data stream.

Compared to, e.g., a COM interface for a debugger, this has mostly performance disadvantages; If your front end wants to, e.g., display an entire linked list, you'd ask GDB to decode the first element (e.g., by issuing a "print *head" command), then the next ones (by issuing a "print $$->next;" until satisfied or a null pointer is reached); And at every time you'll be serializing and unserializing the contents. A COM interface would probably let you directly access the debugged program's memory.

The complexity, however, is significantly reduced; The interface was designed assuming that whoever uses the GDB engine mostly has text processing capabilities, so it provides capabilities to decode just about any expression into display form with type and value; COM interfaces usually stop decoding at raw information, and leave final composition to the user.
Tracing and debugging a connection to GDB is significantly simpler than debugging a COM interface; Everything about the interaction between components is captured in the stream between them, which can easily be inspected by a variety of means, and often used to replay events with just one side active during debugging. If the COM interface gave access to the debugee's memory, there's usually no feasible way to account for all interactions between components. If you can debug both at the same time (e.g., because you have the source to both), that's usually not a problem, but with commercially availab le components, this is rarely the case.

I usually prefer raw performance when choosing implementations, but common _practice_ with present component architectures makes robustness hard enough to achieve that I usually prefer the Unix way to COM way, even though the latter is theoretically superior.

[note: Consider databases as a component in your system; Using SQL or MDX to communicate is the Unix way, no matter how you try wrap it. A "native" COM style interface would have looked a lot like the old Btrieve or ISAM interfaces, and no one likes to use those].

Ori Berger
Wednesday, May 08, 2002

Ori - thanks for your explanation of component programming.  From your explanation, it seems COM and Unix piping solve the same problem: I don't want to write a {word counter, sorting component, text editing widget} so I'll pipe it to a component and get the output when its done.

Performance wise, component programming certainly has an advantage because it communicates via stack instead of using stdin and stdout.  Perhaps it loses that advantage in marshalling/demarshalling arguments?  Is it easier to parse output from stdout or to let the COM infrastructure massage the arguments to what you need?

What advantages are there to using components over shared libraries?  It seems you can get 90% of the functionality you need in reusing code out of a few function calls to utility libraries.  So in what situations do component libraries trump traditional usage of shared libraries?

Can COM/Bonobo cure brittle interfaces?  In the GDB example, let's say the GDB folks decide that successor should replace the next command.  If GDB was a component, then you'd probably (?) get a runtime linking error and the program would fail.  With something like DDD that interfaces with GDB through text, it would just get really confused and not work until you downgraded GDB or upgrade DDD.  Presumably it is the job of the embedder to ensure that new versions of {components, applications} don't break their application.  That aside, can COM prevent these kinds of problems, or are they universal problems we have to deal with no matter what?

My last question about component programming is, if I am using WinkSoft's Text Editing Component and then decide that CerealWare's Text Editing Plus Mr. Coffee Component is better, can I just drop CerealWare's component in, or is there some kind of retooling required?  I guess I'm asking, is there any kind of standardization in interfaces?

[IIRC, it is actually possible to embed an X11 app within another.  Outside of the realm of dock applets, I don't think it is used often.  Unfortunately I have no evidence to really substantiate this.  On further thought, I can prove it by pointing at window managers.  They are X11 clients that embed other apps within them.  Admittedly the communication is extremely limited.]

Thanks again, Ori.  You're welcome to email me if you'd like to take it off-board.  I'm new around here so I don't know if I'm increasing or decreasing the signal/noise ratio.

Adam Keys
Wednesday, May 08, 2002

Worry not, Mr. Keys.  You're raising the S/N ratio.  A lot.  Ori too.  Trust me.  :-)  I don't have enough time to read what y'all wrote over the past few days, but I sure hope to.  Please continue your discussion here.

Paul Brinkley
Thursday, May 09, 2002

Well, components ARE shared libraries, despite what some people believe, though they do have somewhat better specified semantics than "plain" shared libraries.

A COM component, for example, specifieds a "factory" entry point that can be used to generate new instances of the component. The COM model, being a C++ binary interface[1] specifies that each class (or more generally, instance) provide dynamic cast abilities, some type information, etc. The COM model also dictates how you do threading (you have some choice, but you have to declare which of the possible conventions you're using upfront) and how you use resources (e.g., you can't just allocate memory with "malloc" and expect it to work in all cases). But in the end, it's just a shared library.

The COM framework also specifies how the availability of the component can be advertised in the Windows registry. Components interfaces have a 128-bit unique identifier, and a "pretty" name, such as "Microsoft.Word.8" which identifies the 8th revision of the "Microsoft.Word" interface. The interface naming is supposed to solve the problem of backward compatibility, as it allows either a new component to provide two incompatible interfaces (each having different ID and name), or a both revisions to be installed side by side independently. Microsoft Word, for example,  makes new versions provide old interfaces because it's not designed to be installed alongisde an older version of itself.

This is also how different vendors are able to provide alternative component implementations - their component just has to support the (hopefully well defined) interface that is expected by the user. Sometimes it works, and sometimes it doesn't; Obviously, every provider attempts to lock users in.

Also, it is common for COM components to assume that the user will implement some interface in order to use the component.  If you use the "IWebBrowser2" interface "Internet.Explorer.5", for example, you have to implement an "IEventSink2" component which the WebBrowser will use to nofity of an events such as clicks, navigation events, etc.

The practical differences between "simple" shared libraries and COM lie in the ability to identify and advertise component existence (through the shared, global Windows registry and a standard for doing so), and an assumption of interactivity between the component and its user - shared libraries usually provide a "model" (in the model-view-controller sense) for the user to query, sometimes supported by simple callbacks to be implemented by the user. COM components can rarely function without a (usually nontrivial) "callback" interface implemented by the user.

pipes and component programming address the same problem space; However, they have very distinct natures - pipes provide (relatively) easy to specify/document/use, well isolated, suited for batch, not-too-high performance components. COM/CORBA and friends provide (relatively) complex to specify/document/use, not-so-well isolated, suitable for interactive, potentially very-high-performance components. As a result, they rarely compete with each other when considered by a knowledgable developer. Unfortunately, most developers are only familiar with one (usually COM), and it's thus very easy to find solutions done one way which would have been much better the other way.

It should be noted that COM's later extensions (DCOM and COM+) are significantly more complex frameworks, though.

[1] Again, not entirely accurate - but close enough for discussion purposes.

Paul: Thanks for the compliment. I'm with you on keeping discussions public - in fact, at work I tried to institute a policy that private emails on technical subjects should be outlawed, and that all professional correspondence should be carried in public forums (newsgroups or public exchange folders).

Ori Berger
Thursday, May 09, 2002

*  Recent Topics

*  Fog Creek Home