Fog Creek Software
g
Discussion Board




More thoughts on single threaded servers

Ori and anyone else who cares.  I've taken your advice and implemented a single threaded server.  I thought I was pretty bright and all.  I was convinced this would provide the ultimate server performance.  I wanted it to be THE server solution. Then I realized something bad... 

Page Faults happen....

If the server process page faults while running the one thread, I'm screwed.  The whole process stops while an I/O operation of unknown duration occurs.  In a server which made use of kernel threading it wouldn't be as big of a problem since only thread handling the connection when the page fault occured would block.

NT is particularily bad.  There really is no way to control the how the paging mechanism works.  If the server goes idle it, NT slowly starts paging the process out to disk.  If traffic picks up suddenly the server could behave very slowly while the image is paged back into memory.

I think what I really want is a non-preemptive or coorporative threading mechanism that task switches when page faults occur.  I know of no major server OS that provides such a mechanism.

Any thoughts?

christopher baus (www.baus.net)
Saturday, December 13, 2003

Use an API like SetProcessWorkingSetSize to reduce NT's paging your process to disk?

Christopher Wells
Saturday, December 13, 2003

According to MSDN...

Using the SetProcessWorkingSetSize function to set an
application's minimum and maximum working set sizes does not guarantee that the requested memory will be reserved, or that it will remain resident at all times. When the application is idle, or a low-memory situation causes a demand for memory, the operating system can reduce the application's working set. An application can use the VirtualLock function to lock ranges of the application's virtual address space in memory; however, that can potentially degrade the performance of the system.
....

Maybe if used in combination with VirtualLock this would work.  I'll have to look into it...

christopher baus (www.baus.net)
Saturday, December 13, 2003

NT never seems willing to "guarantee" much about performance, to applications ... but I found that if the system has plenty of memory, then increasing the process working set size reduces paging out (even when the application is idle), which for me alleviated the "If traffic picks up suddenly ..." problem.

Christopher Wells
Saturday, December 13, 2003

Maybe put in some low cpu loop (maybe a socket connecting once a second and then closing, or a calculation of 1+1 into the main thread and run it in a loop, i.e. to the os it will never seem idle

the artist formerly known as prince
Saturday, December 13, 2003

if (numConnection=<2)
{
...Do Something
}

The Artist Again
Saturday, December 13, 2003

if (numConnection=<2)
{
...Do Something
}

That would be a BadIdea w/out at least a sleep.  But then a sleep would ruin performance if there is only one thread.

I just wish I could be notified when a page fault is about to occur.  Then I could go do something else.  Problem is that would make the code look really wierd.  Just about any line could cause a page fault. 

I'm starting to think that under NT at least that using a single threaded server isn't optimal.  It seems like a great idea, but for this problem...

christopher baus (www.baus.net)
Saturday, December 13, 2003

I've probably missed something in a previous discussion; but I don't think this is a problem of single vs. multiple threads.

Windows (and presumably Linux) won't swap out your code if it is running. No matter how many threads are running through that code. Once your server is idle, you have only one thread blocking on an accept call -- whether the server is coded as single- or multi-threaded. Therefore, Windows feels free to swap you out of memory.

Of course, this is all relatively moot if some other process is getting hammered on the machine and is eating up available memory. In that case, your server WILL probably be swapped out regardless of how many threads you have running.

Jeff
Saturday, December 13, 2003

Jeff - I think the point is that when a single threaded server has a page fault the whole process is blocked by the IO.  When a multi-threaded server has a page fault, that thread is blocked by IO, but another thread can do some work in the meantime.

r1ch
Saturday, December 13, 2003

Two points:

1. I have witnessed idle processes on idle machines getting paged out.  i.e. the machine will page you out even if it does NOT need the memory.  Perhaps this is a result of poor tuning, perhaps the systems are creating a contingency for new processes. But it does happen.

2. Some systems implement threads in user-space.  If a page fault occurs, then the whole process is suspended to handle the fault, so multiple threads don't run.  I think FreeBSD 4.x is like this (and 5.x fixes this).  Beware writable pages shared between threads; if they get paged out then all threads can block on them.

David Jones
Saturday, December 13, 2003

Hello Christopher.

Yes, page faults could be a problem; The quick-and-dirty solution is to make sure you access relevant pages continuously - e.g., once every second. But that's more of a band-aid than a real solution.

This problem isn't unique to single threaded servers, btw - it's the same for multithreaded servers. Less likely, but still happens.

The first thing I'd do to solve this is increase my working set - by actually continuously accessing relevant pages. If that's not enough, and you can run another process to take the load (without requiring much synchronization) then I'd do that - you'll get the benefit that you can also distribute over multiple machines.

If all else fails, what I do in these cases is set up special purpose threads that do a specific task; e.g., suppose what you run is a webserver, and there are requests that require computation over 200megabytes of memory which are likely to be swapped out. Set up a thread that waits for a message, does the calculation, and posts a message when complete.

The main thread, upon request, will post this message, and resume to handling other connections. When the computation is complete, it will return the reply. If you need more concurrency, set up more threads like this. However, make sure that the initial "start processing" message delivered upfront contains everything needed about the job - or you would need all of the main thread to be instrumented with synchronization primitives.

This, IMHO, is a proper way to use threads - the common way, of making all threads equivalent in their function, is what makes them so hard to use.

Ori Berger
Saturday, December 13, 2003

Problem is that as soon as a transfer memory from the TCP/IP stack into my process with recv (or similar) a page fault could happen.  This would be a really bad time for a page fault to occur.  I have assumed that I/O has finished asynchronously and I'm ready to process the data, then bam, page fault, all processing stops while an I/O operation of unknown length occurs. 

This probably isn't horrible if I do everything possible to lock buffers into physical RAM (with previous stated tricks), but it really messes up my original assumptions.  And I'm not convinced a single threaded solution will always out perform a solution with kernel mode threads. 

christopher baus (www.baus.net)
Saturday, December 13, 2003

It could happen -- almost anything could happen -- but it shouldn't, at least not often. I have written some memory-hungry servers, and none had a problem with the recv() buffer swapping out.

Consider recv() into a local variable -- the probability that the stack will be paged out is much, much lower.

Also, if you're running on Windows, the best thing you can do is use WSARecv; It's asynchronous and won't ever block. It will alert your thread when the data has been received. If you're using Unix, use aio_read/aio_write stuff (sadly, the asynchronous stuff is not standard across platforms).

Is this _really_ a problem you observe in practice? What kind of server is that, if I may ask? What kind of data is being transferred (length, encoding, statistics, etc ...)?

Ori Berger
Sunday, December 14, 2003

This is NOT a problem in real world practice.  I think you are over-analyzing the situation. Put adequate RAM in your machine, you will be fine.

dwilliams
Sunday, December 14, 2003

In my opinion putting the recv buffer on the stack is a bad idea as well.  You wonder why so many programs have buffer over run problems?  Here you go...

Programmer puts recv buffer on stack, and since the BSD Socket API only accepts raw buffers, there is no built in protection.  Now programmer operates on buffer, such as parsing the data.  Malformed data isn't properly checked and the program runs past the end of the buffer into the stack frame. 

Bad guys put code in the stack so when function returns the code is executed. 

Service Release 8125 is put out to fix the problem.

I prefer to allocate my recv buffers off the heap at startup, and pool them. 

christopher baus (www.baus.net)
Monday, December 15, 2003

Suit yourself.

Most buffer overruns are exploitable, whether on the heap or the stack, although the stack is sometimes easier to exlpoit. A bug is a bug - Don't delude yourself that you're doing anything inherently more secure.

You didn't tell us, though - is the buffer page-out a real problem observed with real world workload?

Ori Berger
Monday, December 15, 2003

Ori,

I'm going to put the server under load to see what happens, but my thesis still holds true.  There HAS to be some page faults occuring, and what the exact implications of this, I'm not totally sure of yet.  The problem with a single threaded server is ANY page fault is really bad.  I don't think it is possible to ignore this.  The question is, is the time saved by not invoking context switches greater than the time lost by page faults?  I'm not sure what the answer is yet. 

Yes, buffer over runs are bad in general, but I think putting recv buffer on the stack is a really bad practice, and it also assumes that the buffer scope doesn't exceed one function call, which isn't true in my application. 

It also assumes that the stack isn't paged out as often as the rest of the memory, which I haven't seen documented anywhere.  Richter's book seems to imply that it is about the same as the Heap.  On Linux I would expect it to be about the same. 

christopher baus (www.baus.net)
Monday, December 15, 2003

Page faults (and virtual memory) exhibit a "phase transition" phenomona. In most environments, either you have enough physical memory and don't have any noticable paging - or you don't have enough physical memory and you thrash all the time, often slowing down by a factor of 1000.

There is very little in-between  unless you've designed your system for that in-between behaviour (e.g., you make sure almost all memory access is made of long sequential bursts).

My reason to avoid threads is not context switches at all. It's the synchronization overhead (compared to which, context switches are cheap). And it's also the correctness - its damn hard to trust code when everything's implicitly shared and may change at any time, because locks weren't used properly (If you don't trust placing data on the stack for security reasons, surely you can't trust that all syncrhonization was correctly employed). Furthermore, race conditions are rarely reproducible.

Don't use single threaded async code to improve performance - it does get you that, if done right, but in 6 months you'd have the average 10% improvement you're getting from the hardware.

The reason (my reason, anyway) to use it to create a robust, easy to debug server with reproducible behaviour.

Ori Berger
Tuesday, December 16, 2003

p.s. the local stack is rarely, if ever, paged out because you use it. If you allocate a 4k buffer on the local stack and recv() to it, you can be almost certain that the recv() won't pagefault (as long as you don't have a 1MB buffer allocated on the same stack which disconnects your 4k buffer from the routine's local variables).

Ori Berger
Tuesday, December 16, 2003

Unless when you grow the stack by 4k that memory must paged in.  The very act of calling a function 4k buffer might cause a page fault.  This was one of my arguements as to why C++ exception handling is bogus.  There is no such thing as "nothrow" in my opinion. 

My goal is 1000 concurrent connections.  I would guess that with that many connections context switches could be a problem.  My application would require 2 threads per connection. 

My goals for the project are:

Correctness
Security
Scalability

In that order.

The page fault issue will not prevent me from using a single threaded model, but it is something to keep in mind.  It is my intention to lock the amount of memory required by the server under full load at startup.  Servers that degrade exponentially as the number of connections increase, should limit the number of connections they allow.

~christopher

christopher baus (www.baus.net)
Tuesday, December 16, 2003

I've seen small 20k stacks paged out only on a machine under extreme load (600MB allocated and actively used, 128MB physical memory available). I was actually debugging a kernel driver that had to do with disk access, so I was well aware of what was paged.

As long as your code manages to run, it will touch the stack pages it uses. The recent 8k or so will not get out of the working set, and will not be paged out.

I agree that, theoretically, every memory access can cause a page fault. Practically, however, this doesn't happen.

And if it does happen (such as in the case I observed), multilple threads would have probably saved it from slowing down by a factor of 1000 as it did, to a factor of 300 or so. Not worth the complications, in my book.

Virtual memory is a band aid. Often a very useful bandaid, but still a bandaid. May help for surface wounds, but not if your arm is being cut off. If there isn't enough physical memory to support the app, it won't run well; Threads may, perhaps, reduce by 10% the required physical memory to give the same kind of responsiveness, but the problem remains.

(And no, I don't have hard numbers to back that 10% reduction in required physical memory. I can argue that threads actually increase physical memory consumption. But I won't, not in this thread anyway).

Ori Berger
Tuesday, December 16, 2003

*  Recent Topics

*  Fog Creek Home