Fog Creek Software
Discussion Board

Assembler? HL Assembler? Anybody?

I am curious if anybody still codes in assembler?

I realize there is not really much benefit for what most people seem to be programming these days which are usually I/O or Network bound.

I have always hated assembler (despite have some experience, though not much on x86 assembler which always looks to me like a hugely more ugly than say 68K)

I recently ran across a book (actually about game programming) which has a big section on MASM HL.  MASM HL basically is a bunch of macros that let you define locals, procedures, loops (while etc) conditions etc.  The book has a whole Tetris-like game coded this way - and get this it really is not much worse to read than an integer only C program to do the same thing

Mostly I am shooting the breeze here, but there is sort of a practical application in my mind.

In one of my own apps, I do have a CPU bound routine which is (a) about as fast as I can make it in C, (b) complicated enough to not want to hand-code in traditional assembler.  So at a practical level, I'm wondering if MASM HL is going to procedure better code than the C version.

S. Tanna
Wednesday, June 11, 2003

I doubt it would be worth your time, especially if your minimum target machine runs the C code fast enough.

For sale: 1 Programmer's Workbench.  Slightly worn but useable.

Dave B.
Wednesday, June 11, 2003

Unless you're an assembly language God, you're unlikely to outperform a good optimizing compiler in either straight assembly or a higher level abstraction.

You'd be much better off trying to come up with an algorithmic improvement rather than spend time on micro-optimization.

Mister Fancypants
Wednesday, June 11, 2003

I read _Zen and the Art of Optimization_ and it's amazing what _can_ be done with assembler.

On the other hand, the Pentium (with its dual-pipeline) is really hard (or time-consuling) to optimize by hand.

On the third hand, I notice that many routines in Microsoft's C run-time library are coded in assembler.

My guess is that for any but the smallest functions, the extra time you spend in assembler would be better spent in improving your algorithm.

Christopher Wells
Wednesday, June 11, 2003

(It has to be said)

The same number of people who program in compiler.

Wednesday, June 11, 2003

> algorithmic improvement rather than spend time on micro-optimization.

What if you've reached the limits of algorithmic improvement but are still not happy

S. Tanna
Wednesday, June 11, 2003

Do you know why your routine is slow? What is it doing? Consider first ensuring it is cache-efficient and does not touch memory more than necessary. On modern CPUs this is _very_ important, and can affect the speed a great deal.

Consider whether it's even possible to make the routine fast. You're going to have a hard time speeding this kind of thing up greatly:


because the load from "table[src[x]]" is dependent on the load of "src[x]". Make sure you know your target instruction set too. You're not using "*dest++" on x86 without checking the compiler output... right?

Before doing anything, profile it with one of the various profilers around -- VC6 has one built in that takes metrics, and AMD's free CodeAnalyst you can use for sampling. Use both. (If you have Vtune, that is better than CodeAnalyst, so use that.)

Oh, and *check the assembler output* to ensure that there's nothing heinous going on. Make sure you output assembly _and_ machine code, and check the instructions for inefficient encodings that could be made better by simple changes. (For example, on x86 and 68000 you can move values closer to the start of a struct so the constant offset can be encoded in 8 bits, etc.)

My gut feeling would be you will get better results from C than from any horrible concoction of macros that you feed into your assembler. The reason is that the compiler has an optimiser to fix up the generated code, where the assembler doesn't. The ultimate output from your macros will probably resemble unoptimised C code. The assembler output from a compiler by contrast often bears surprisingly little relation to the code that went in.

If you're hot stuff at assembly language, you will of course get best results from using that! But it seems damnably hard these days to beat the compiler by a significant margin. (Nope I don't claim to be great at assembly language!!)

Wednesday, June 11, 2003

<<CPU bound routine>>

argh I didn't notice this, well anyway!

Wednesday, June 11, 2003

I don't program IN assembly (at least not often), but I do have the dubious joy of working with assembly a fair bit (on all sorts of chips).

As for your specific problem, S. Tanna, my gut feeling is your best likelihood of improving your CPU bound routine will be to compile it with all speed optimizations on, then look at the generated assembly and attempt to tune that. Remember to compare speed of your tuned version with the originally compiled versions, as pipelines and such can really bite you in the ass.

Steven C.
Wednesday, June 11, 2003

Purebasic is a rather impressive basic compiler the core of which has been compiled in nASM.

It allows you to mix assembler with your Basic, just like the good old days :)

Ged Byrne
Wednesday, June 11, 2003

I work with assembly (variety of MCUs and CPUs), mostly to bring up ports to new hardware or to code hardware specific things like writing to flash memory etc.  So, yes, there are still people who do it; even if not on a PC.

I have to echo Steve C.'s comment.  It is likely much more productive for you to read the compiler output and see what is going on.  I think the Linux kernel guys do this.  I've read that they write their C so as to produce a particular assembly output.

Thursday, June 12, 2003

I've recoded graphics routines in assembly recently, with mixed results.  (And graphics routines are a special case--most compilers can't generate MMX, which helps a lot.)

In general, assembly isn't worth it.  I got only a 33% speed improvement, which wasn't enough to justify the maintainence headache.

Optimization tips:

First, profile.

Second, fix your algorithm.

Third, grok the memory hierachy, and stop blowing out your caches and TLB.  A register access takes less than a cycle; fetching something from main memory takes about 200 cycles if it's in the TLB, and a TLB miss takes basically forever.  This stuff typically matters much more than the quality of your generated code.

Fourth, try to use specialized instruction sets your compiler doesn't have access to.

Fifth, actually (gag) try to beat the compiler by hand-optimizing its output.  You'll need good reference materials on how the Pentium and AMD pipelines work this year.

J. Random Hacker
Friday, June 13, 2003

*  Recent Topics

*  Fog Creek Home