Fog Creek Software
Discussion Board




Syntax, Idioms, and Japanease

Benji raised the issue of an improved syntax for C++. I strongly disagree with the idea that code readability should be measured in some absolute terms, or by the (relatively objective) non-professional-programmer bystander.

Code should be concise, and verbosity improves clarity only for someone who is not versed.  If you work on an ADA project, you very quickly develop strong dislike towards overly verbose languages. That has been my experience, and of many others, anyway. VHDL (an ADA inspired language for hardware design) is infinitely more 'readable' to an untrained eye than Verilog (a C inspired language for hardware design[1]). At least, it seems that way initially, but soon enough you notice that without proper training, you can't make any real sense of code written in either (and it takes you much longer with VHDL to figure that out), and when you do have proper training, the "readable" syntax of VHDL gets in the way of expressing details concisely and clearly. It may come as a great surprise to you, but C is vastly more popular than ADA and Verilog than VHDL. While the complexities of hardware design strongly stress the need for proper training, it is not less true for 'pure' software.

You can't escape knowing all the little details; if the source says "a logical-or b" rather than "a || b", you still have to be aware that it short circuits, or otherwise you won't be able to read the code. Similarly, you still have to remember that "a divide b" behaves differently when both arguments are integers than it does if one of them is floating point  -- e.g., "a divide (a divide 2) " will be exactly '2.0' for floating a=1.0, but a division by zero for int a=1 - and it will be approximately 2.0 for many other floating point numbers. And I don't think you would want the operators to be called "a shortcircuit-logical-or b" and "a integer-divide b"; And if you do want to call the operators that, you DO know that 'shortcircuit' is a term that some people will require further explanation, along the lines 'a logical-or-that-evaluates-second-only-if-first-false b'. Every programming language has its idioms, and some languages (K, Lisp) even have meta-idioms that guide the creation of new idioms. It's an inherent feature of the software world.

So you might as well spend some time and get used to the syntax. At the extreme, there are languages like APL and K which reduce program lengths 100 to 1000 fold. They look unreadable to anyone not versed in them, but once you "get it" (not an easy task), they are much more readable. Reading a K line may take 10 times as much, even if you are versed, but if it's 1000 times shorter, don't you think the tradeoff is justified? Plus, there's much less place for bugs to hide :)

I recently read an article by a japanease fellow that mourned the decaying use of kanji. That struck me very odd, as I always thought the japanease moving to a phonetic system was making things better for them. His complaint mostly revolved around "I can read kanji more than 10 times as fast as I can read the phonetic alphabet" (whose name escapes me at the moment. kana, maybe?).

The philosophical debate whether increased complexity justifies the increased barrier to entry for a natural language is one I do not want to get into (at least not in this forum). However, software people are supposed to be professional (yeah, right), and are supposed to make software that works and that can be supported by a software professional. It won't ever be supported by joe user, because of bugs like 'divide' and 'logical-or'. So I think there is _every_ reason for 'kanji'-like programming languages (e.g., K).

You can find a shallow introduction to K in:
[ http://www.kuro5hin.org/story/2002/11/14/22741/791 ]

A few examples of how concise (and strange looking, until you get used to it) K is, can be seen in
[ http://nsl.com/papers/papers.htm ]

And I quote, from the "N queens" link (a generalization of the 8-queen problem):

qn:{[n],/nq'[n;w-1;w+1]w:!n}
nq:{[n;l;r;v]
:[n=#v;,v;,/{nq[n;-1+l,x;1+r,x]v,x}'(!n)_dvl v,l,r]}

Would have fit in just two lines if this editbox was slightly wider. An additional line,

bd:`0:"_Q"(!8)=/:p

prints a solution neatly as an 8x8 board.

As for kanji examples .... sorry, I don't speak japanease :)

Ori Berger
Sunday, January 26, 2003

Japanese _is_ much much easier to read with kanji (chinese characters) because this language happens to have very few founds, and thus, filled with homophones, ie. words that sound the same, and will be written the same way if using a phonetic writing system, while there is no ambiguity when using kanji, since they represent symbols, not sounds.

Hence the (at first look) funny idea that a symbol-based writing system is easier to use than a phonetic system.

Frederic Faure
Sunday, January 26, 2003

I'm a Japanese speaker and let me tell you that Kanji is hard to learn, but not as hard as you might think.

Although there are around 2000 characters to learn before you can read a newspaper the characters are not all completely different but made out of 80 sub characters (called radicals).

For example the character for "like" is made from joining mother and child.

As for the homophones it's only a problem in theory. In practice Japanese have no trouble making their meanings clear when speaking. Only a very cryptic and silly sentence would be ambigious in speech but clear in Kanji.

Matthew Lock
Sunday, January 26, 2003

I also feel very strongly that code should be concise because:
a) less typing
b) less lines means you can read more code on the screen
c) easier to remember

Compare the use of localtime in perl...:

($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);


Against localtime in C...


#include <time.h>
struct tm * localtime( const time_t *timer );
struct tm *_localtime( const time_t *timer,
          struct tm *tmbuf );

struct  tm {
int tm_sec;  /* seconds after the minute -- [0,61] */
int tm_min;  /* minutes after the hour  -- [0,59] */
int tm_hour;  /* hours after midnight    -- [0,23] */
int tm_mday;  /* day of the month        -- [1,31] */
int tm_mon;  /* months since January    -- [0,11] */
int tm_year;  /* years since 1900                  */
int tm_wday;  /* days since Sunday        -- [0,6]  */
int tm_yday;  /* days since January 1    -- [0,365]*/
int tm_isdst; /* Daylight Savings Time flag */
};

Matthew Lock
Sunday, January 26, 2003

I think that the use of the time function, well, simplistic...

First, #include <time.h> eliminates the need for your declarations.

Therefore, usage of localtime is:

#include <time.h>
struct tm_loc;
struct time_t t_sec;

// ... your stuff, whatever ...

localtime_r( &tm_loc, &t_sec );
// prefer localtime_r as its threadsafe - no errors returned

Here, try this in Perl:

#include <sys/mman.h>

int fd;
unsigned char *ptr;
size_t len = 640*480;  // prefer ioctl to get framebuffer size

fd = open( "/dev/fb0" );

ptr = mmap(NULL, len, PROT_READ , MAP_PRIVATE, fd, 0);

Now ptr is a pointer mapped directly into a display framebuffer.  Draw away.

Cryptic?  Perhaps, to the uninitiated (read the man page).
Powerful? Always.

That's why C will always be with you.  The OS is written in C (and likely will be for some time to come) and you can always get from *here* to *there*.

And certainly for the localtime example, nothing was gained by Perl (count the number of bytes typed as the first order of approx, since that's what Matt L. offered as one form of simplicity).

Nat Ersoz
Monday, January 27, 2003

Hey, people!

No comments about K?

Ori Berger
Tuesday, January 28, 2003

Actually, Kanji can be tricky because each individual character can have up to about ten different meanings and associated pronounciations.

This is more a problem with proper names, rather than with everyday speech/writing. The rule of thumb my teacher provided was that you couldn't tell how a name was pronounced by the kanji, and you couldn't tell how it was spelled by the sound.

Steve Wheeler
Tuesday, January 28, 2003

Well, K just struck me as a shorthand.  Shorthand is useful for the reasons you describe, but it does have problems in communicating with a wide range of people.

Usually there's two separate issues to terseness:  The power to create abstractions, and actually having the little symbols themselves be tiny.  I usually care about the former, since I'm transitioning away from C-like languages which don't have this power.  (Just received my copy of Norvig's PAIP!)  I'm not quite ready for the latter.  But it looks very enchanting; makes me think there will always be these special languages.  Sometimes concepts are so old that they only require the tiniest symbols available.

Tj
Tuesday, January 28, 2003

Check out Paul Graham's article "Succintness is power":

http://paulgraham.com/power.html

Robert Cowham
Wednesday, February 05, 2003

*  Recent Topics

*  Fog Creek Home