Account

Log in (OpenID enabled)
Register

Debunking the Erlang and Haskell hype for servers

Category: Internet, Programming   Tags: , ,

Ever since Intel and AMD have been selling multi-core cpus, the Erlang hype has been growing continuously. The number of high profile projects using Erlang were flagrantly announced over the blogosphere as the coming of the second C.

We kept hearing about: rabbitmq, couchdb, nearly all of Amazon’s AWS, Heroku’s routing grid, Facebook chat, all switching over to Erlang because it was the fast and concurrent language of the future. No other language could hold a candle to a language run by telecom giant Ericsson which was then validated by Amazon and Facebook! Or not? When I tried to find benchmarks for Erlang, they all showed otherwise.

http://muharem.wordpress.com/2007/07/31/erlang-vs-stackless-python-a-first-benchmark/

http://www.joeandmotorboat.com/2009/01/03/nginx-vs-yaws-vs-mochiweb-web-server-performance-deathmatch-part-2/

Since I initially bought into the hype, I felt compelled to test it myself to see if they made a mistake. To do this, I wrote a simple http server in Erlang, Haskell, and Python that simply outputs an HTTP reply “Pong!”. And here are the results in a graph.

The green line is the maximum req/sec possible. Higher is better.

For more details, continue reading.

Update
See comments for a faster implementation for Haskell that puts it almost on par with the Erlang.

Summary

A simple server was written in Python, Haskell, and Erlang. The server accepts any input from clients and outputs

HTTP/1.0 200 OK
Content-Length: 5
 
Pong!

and then disconnects. The servers were benchmarked with httperf compiled with increased select limit of 65535 connections. In every test, 0 errors occurred.

Python

  • 1st place
  • Using a single process/thread epoll sustained the most connections per second before hitting a cpu bottleneck.
  • This was highly unexpected since we are comparing it with Erlang and Haskell.

Erlang

  • 2nd place.
  • Having SMP/multicore enabled reduced requests/sec by a factor of 4!
  • Enabling kernel polling (epoll) made a negligble performance impact (less than +- 1%).
  • Someone suggested enabling “active” receive mode which asynchronously puts received packets in the Erlang message queue. This made a negligible difference.

Haskell

  • 3rd place
  • This might be because it uses select instead of epoll. However, this did not make a difference for Erlang, so I suspect it would not for Haskell as well.
  • The program was compiled with -O2 with modest performance gains. Compiling with –threaded for SMP support reduced performance by a factor of 2!

Conclusion

DO NOT WRITE A SERVER IN ERLANG JUST BECAUSE YOU HEARD ERLANG IS THE FASTEST AND MOST CONCURRENT LANGUAGE.

Erlang is not “made for multicore”. Erlang only just received SMP support in 2006!

Setup

2 x 4 core Xeon E5420 @ 2.50GHz

The following steps were done to lift the usual limits that prevent default installations of Linux from being able to hammer servers. Neglecting this step will artificially cap concurrent connections at 1024 while timeout errors will increase. This is an important step that many other benchmarks have left out, and it shows in the error rate. All tests here resulted in 0 errors.

Httperf also depends on select, so the select limit was increased to 65,535 file descriptors.

  • edit /etc/security/limits.conf and add the lines: “* hard nofile 65535” and “* soft nofile 65535" (reboot if ulimit -n does not change)
  • edit /usr/include/bits/typesizes.h and change “#define __FD_SET_SIZE 1024″ to “#define __FD_SET_SIZE 65535″
  • compile httperf from source
  • Increase kernel file descriptor limit: sudo bash -c “echo “128000″ > /proc/sys/fs/file-max
  • Increase the backlog: sudo sysctl -w net.core.netdev_max_backlog=60000
  • Increase maximum connection limit: sudo sysctl -w net.core.somaxconn=250000

Software

Ubuntu 9.04 x64

Erlang: BEAM 5.6.5

Haskell: GHC 6.10.4

Python: CPython 2.6.4

Httperf: 0.9.0

Httperf was run with the following settings, where port and rate were adjusted accordingly:

httperf –port=8000 –num-conns=40000 –rate=5000

Erlang active mode

-module(echo).
-export([listen/1]).
 
-define(TCP_OPTIONS, [binary, {packet, 0}, {active, true}, {reuseaddr, true}, {backlog, 60000}]).
 
% Call echo:listen(Port) to start the service.
listen(Port) ->
    {ok, LSocket} = gen_tcp:listen(Port, ?TCP_OPTIONS),
    spawn(fun() -> accept(LSocket) end).
 
% Wait for incoming connections and spawn the echo loop when we get one.
accept(LSocket) ->
    {ok, Socket} = gen_tcp:accept(LSocket),
    Pid = spawn(fun() -> loop(Socket) end),
    gen_tcp:controlling_process(Socket, Pid),
    accept(LSocket).
 
% Echo back whatever data we receive on Socket.
loop(Socket) ->
    receive
        {tcp, Socket, Data} ->
            gen_tcp:send(Socket, "HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n"),
            gen_tcp:close(Socket);
        {error, eaddrinuse} ->
            done
    end.

Haskell

import IO
import Control.Exception hiding (catch)
import Control.Concurrent
import Network
import System.Posix
 
main = withSocketsDo (installHandler sigPIPE Ignore Nothing >> main')
main' = listenOn (PortNumber 9900) >>= acceptConnections
 
acceptConnections sock = do
        conn@(h,host,port) <- accept sock
        forkIO $ catch (talk conn `finally` hClose h) (\e -> print e)
        acceptConnections sock
 
talk conn@(h,_,_) = hPutStrLn h "HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n" >>
                    hFlush h >>
                    hClose h

Python

import select
import socket
 
EPOLLIN = select.EPOLLIN
EPOLLOUT = select.EPOLLOUT
 
epoll = select.epoll(60000)
connections = {}
 
class Server(object):
    def __init__(self):
        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
        sock.setblocking(0)
        sock.bind(('', 8050))
        sock.listen(60000)
        self.socket = sock
 
        fileno = sock.fileno()
        connections[fileno] = self
        epoll.register(fileno, EPOLLIN)
 
    def onInput(self):
        sock, address = self.socket.accept()
        Client(sock)
 
class Client(object):
    input  = ''
    output = "HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n"
 
    def __init__(self, sock):
        sock.setblocking(0)
        fileno = sock.fileno()
        epoll.register(fileno, EPOLLIN|EPOLLOUT)
        connections[fileno] = self
        self.socket = sock
 
    def onInput(self):
        newdata = self.socket.recv(1024)
        if len(newdata) is 0:
            self.close()
        self.input += newdata
 
    def onOutput(self):
        sent = self.socket.send(self.output)
        self.output = self.output[sent:]
        if len(self.output) is 0:
            self.close()
 
    def close(self):
        fileno = self.socket.fileno()
        del connections[fileno]
        epoll.unregister(fileno)
        self.socket.close()
 
Server()
 
while 1:
    for fd, event in epoll.poll():
        if event &amp; EPOLLIN:
            connections[fd].onInput()
 
        if event &amp; EPOLLOUT:
            connections[fd].onOutput()

Source Code

Erlang version 1 (active mode): hello.erl

Erlang version 2 (passive mode): hello2.erl

Python: hello.py

Haskell: hello.hs

  • Reddit
  • Facebook
  • Google Bookmarks
  • RSS

Related posts:

  1. Hot code swapping for servers not written in Erlang
  2. Why Aren’t Functional Languages like Haskell and Ocaml Popular?
  3. Debunking Google’s Internet Optimization Tips
  4. Defending Against the New DOS Tool Slowloris
  5. Make a Python JIT compiler without writing a single line of C or 3rd party library

116 Comments  »

  1. Don Stewart says:

    Judging by the length of the Python example (look how low level it is!!) this kind of benchmark is highly dependent on the quality of the code in each language.

    For example, in Haskell you’re doing slow String IO, instead of bytestring IO, and not using the new epoll library: http://www.serpentine.com/blog/2009/12/17/making-ghcs-io-manager-more-scalable/

    Here’s a simple improvement to make the Haskell not entirely naive: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=16221#a16221

    On my machine, I can get 10.2k conn/sec, while your example only does 6k sec.

    • admin says:

      The Python version is longer because light weight threading isn’t built into the core VM unlike Haskell and Erlang. I implemented the multiplexing in pure Python which should by all means be a slower language than Haskell or Erlang.

      I could have implemented this in asynchat, and it would have hidden away much of the length as you can see here.

      I tried to get your example to work, but it said “Could not find module `Network.Socket.ByteString’” even after I did cabal install Network. I also couldn’t find out where to download the epoll library in that link. I welcome any additions or improvements to these examples.

      • network-bytestring is the package you’re looking for (you can find’em using Hoogle or Hayoo)

        • admin says:

          Thanks it started working, but I ran into another problem.

          hello2.hs:18:25:
          Couldn’t match expected type `ByteString’
          against inferred type `[Char]‘
          In the second argument of `sendAll’, namely `msg’
          In a stmt of a ‘do’ expression: sendAll c msg
          In the expression:
          do sendAll c msg
          sClose c

          Edit
          I found out the missing code to get it to work:

          import qualified Data.ByteString.Char8 as B
          msg = B.pack “HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n”

          Benching results with -O2 -threaded:
          rate req/s
          9000 8998.9
          10000 9806.4
          11000 7460.2

          • Don Stewart says:

            Will you update the graphs to use the bytestring version?

          • admin says:

            @Don I’m not sure it would be fair to Erlang to do so.

            First of all, this is a package that is not bundled with the standard distribution and is even labeled experimental

            And second, you have to convert all your strings by packing it which is an O(n) operation.

          • Don Stewart says:

            That’s why I used {-# LANGUAGE OverloadedStrings #-} which is the idiomatic way to use bytestring literals — did you remove that from your verrsion?. This is idiomatic Haskell, so I don’t think it is fair not to use it.

          • Don Stewart says:

            BTW, bytestring isn’t “experimental” in that sense. It is in the Haskell Platform: http://hackage.haskell.org/platform/contents.html which is the only standard Haskell distribution.

          • admin says:

            @Don, if I am reading the manual correctly, all that Overloaded string does is to make the conversion from string to bytestring implicit. Behind the scenes, it still has to convert with an O(n) cost.

            And while bytestring is included in that distro, the package network-bytestring does not appear to be there.

          • Don Stewart says:

            Indeed, there is a conversion, but it happens at compile time.

            It’s fine if you’re not going to redo the benchmarks, but it’s a bit silly to be deciding what is and isn’t idiomatic Haskell. Using String for a high performance server isn’t idiomatic, for example, so the result meaningless.

    • Don Stewart says:

      Here’s a simple example in Haskell using the new epoll scalable IO library, available at, http://github.com/tibbe/event

      The code is here, and is little changed from the naive code: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=16253#a16253

      While your original example reaches 6k conn/sec on my measurements, the epoll-based version reached 15.1k conn/sec.

      I’ve summarised the three Haskell versions: String-based concurrent, bytestring concurrent and epoll bytestring on a wiki page here, along with corresponding measurements: http://haskell.org/haskellwiki/Simple_Servers

      I think the conclusion is that a bit of custom epoll code pushes the work into the kernel, and you’ll get much the same performance in *any* language when the program is not compute bound.

      I’d be interested to see if you get similar results to mine for the Haskell epoll version.

      • Don Stewart says:

        On my machine, your Python epoll version achives 10.1 k req/sec, while the Haskell epoll version above achives 15k req/sec.

        • admin says:

          I just tested my Python again, and it is still bottlenecks at > 12k/sec. I noticed that you have num-conns at 10,000. This number is a bit low and prone to measurement errors especially when doing a rate of 10,000+. I used num-conns 40,000.

          • Don Stewart says:

            I get the same results for Haskell epoll and Python epoll with –num-cons=40000,

            * Haskell epoll, Request rate: 14683.7 req/s
            * Python epoll, Request rate: 10103.3 req/s

            Linux 2.6.31-ARCH x86_64, ghc 6.10.4

          • admin says:

            Did you do the kernel tweaks?

    • Don Stewart says:

      Following up, Bryan and Johan have published a paper on replacing the GHC IO manager with epoll:

      http://www.serpentine.com/bos/files/ghc-event-manager.pdf

      In particular, look at the benchmarks on page 5, where they evaluate performance of HTTP servers in Haskell, getting results comparable to my epoll example below, and far more realistic than your naive example above.

      • admin says:

        Well that’s good news now that it’s finally been implemented.

        But you are in no position to say it is naive because:

        1. Haskell didn’t have a functional epoll manager at the time of the post.
        2. This paper you show also uses a “pong” test which is no more realistic than mine.
        3. They are using a newer and faster processor.

        You really have to stop trying so hard to spread Haskell propaganda.

        • Don Stewart says:

          It’s naive to use Strings in a network benchmark for Haskell, instead of bytestring, when on the other hand you’re hand coding low-level epoll manipulation. If you’re not seeking to compare similar things, you should just say so.

          • admin says:

            Don, we’ve been over this before, and this is the last time I am going to say it.

            When bytestring stops being experimental and epoll on Haskell actually part of the standard distribution, this will still remain a valid comparison.

            People using epoll on Python and Erlang have every expectation that it is part of the default distribution and is stable. Anyone on the Internet can write a fast binding in C for their favorite language.

  2. k.pierre.k says:

    So you’re basically comparing non-preemptive select vs. multiple preemptive processes, not python and erlang/haskell runtimes. Very funny, indeed.

    How try this: for every input, you should talk to database. Or compose heavy webpage. Or you have a pretty slow client. Your python solution would block everyone while it processes data for one client (e.g. waits for db), while solutions in haskell and erlang would not. That’s the whole point. In python/ruby/… the real solution would consist of something like multiple single-threaded interpreter processes balanced by external server. Which is far more ugly and slow.

    For comparison to be fair, you should spawn a preemptive thread (os one in python?) for each client or write your own scheduler for python (i doubt it’s possible)

    • admin says:

      Congratulations for being the first person to blatantly skim through the details. I knew it wouldn’t take very long. I tested SMP performance, which should have blown away a single thread out of the water on a 8 core system.

      Since it decreased performance, I decided to turn off SMP for both Haskell and Erlang. Even after doing this, they are still not as fast as Python which is considered to be a slower language.

      “Your python solution would block everyone while it processes data for one client (e.g. waits for db), while solutions in haskell and erlang would not.”

      That is incorrect. You can use a thread in Python to call C modules which may release the global lock. Erlang and Haskell are not devoid of blocking function calls as well. It depends on the implementation.

      • k.pierre.k says:

        > I tested SMP performance, which should have blown away a single thread out of the water on a 8 core system.

        This is a common misconception, SMP does not always help. Especially when doing various no-ops.

        > Since it decreased performance, I decided to turn off SMP for both Haskell and Erlang.

        Turning off SMP does not turn off preemption in haskell and erlang. Their runtime performs preemption without any OS help (python can’t do that)

        > That is incorrect. You can use a thread in Python to call C modules which may release the global lock.

        “You can use a thread” — of course, you mean OS thread? Well, try to do something meaningful with 10000+ of these.

        > Erlang and Haskell are not devoid of blocking function calls as well.

        They are not, but i’m not talking about blocking system/function calls there. In your python example you block with every computation, might it use any syscalls or not, but in haskell i’m sure you could compute something like pi digits in non-blocking way.

        Your comparison is incorrect because haskell is preemptive here (without any use of OS threads! — it is OS threads that are called ‘SMP’ in haskell) and python is not.

        • admin says:

          You are severely overestimating the overhead of a context switch for the lightweight threads used for Haskell and Erlang.

          The cost of preempting every X opcodes in Haskell and in Erlang is negligible. It is only when you use real OS threads and processes that context switching carries a large cost.

          If you want to compare this with Python’s implementation of stackless Python, the context switching overhead is merely the cost of 1 Python function call. This is not enough to cover the discrepency in speed outlined in the graph here.

          • k.pierre.k says:

            > You are severely overestimating the overhead of a context switch for the lightweight threads used for Haskell and Erlang.

            You are severely overestimating your (and everyone’s) knowledge of what’s happening there. For example, your cpu might lose its cache while preemting etc. When doing no-ops, everything is very slow in comparision to no-ops. When doing real work it is not, so you should do some bechmark with real work first and _then_ say that “preemption is negligible in this benchmark”.

          • admin says:

            @k.pierre.k

            “You are severely overestimating your (and everyone’s) knowledge of what’s happening there. When doing no-ops, everything is very slow in comparision to no-ops. ”

            I seriously doubt you have a clue what you are talking about. The cache loss in a green thread should be equivalent to implementing your own multiplexer as I have in Python. This is common knowledge to anyone who knows something about operating systems and have seen the benchmarks of Stackless Python.

            Right now it sounds like you simply read about preemption off of Wikipedia and trying to apply it to every situation that has that word.

  3. Ulf Wiger says:

    I assume that by “hype”, you mean the amount of blogs written by people who just picked up the language or read stuff about it on the web? You certainly cannot accuse Ericsson for hyping Erlang as a product, as they do not market it at all.

    Haskell is arguably the most powerful programming language on earth. Speed has never been a priority, yet it is surprisingly fast. Erlang was designed for writing complex telecom systems with near-zero downtime. Speed was not a priority, as long as it is “fast enough”.

    When people say that Erlang was “designed for multicore”, it is obviously strictly untrue, as Erlang pre-dates multicore by almost two decades. There was, however, a working SMP prototype for Erlang in 1997, which showed how Erlang fits naturally with SMP. It was not made into a product, since the commercial systems using Erlang were embedded systems with neither sufficient space nor power budget to house the SMP boards available at the time.

    Erlang has never done that well in microbenchmarks. The advice has always been to consider your total requirements and try to find experience reports from people who’ve written real products with similar requiremets, or write representative prototypes and measure.

    In your particular benchmark, I imagine that the dramatic drop is due to SYN flood protection, and the python program fares best because it’s picking off connections in a tight loop. Also, the fact that enabling multicore reduces performance for both Erlang and Haskell suggests that what your benchmark is measuring is the ability to peel connections off the socket(s) as fast as possible, in which case disabling SMP and running a tight loop with blocking semantics would be the fastest option by far.

    • admin says:

      “In your particular benchmark, I imagine that the dramatic drop is due to SYN flood protection”

      My machine has no iptables settings that limit this. The kernel backlog limit was enlarged to 60,000 as I explained in detail, which means that it is impossible for this to be due to disconnection of half-open sockets when you keep in mind that the total connections is 40,000. I know this is a common flaw for many benchmarks online which is why I specifically mentioned it above.

      HTTPERF also indicated 0 errors for all tests. The dropoff you are seeing is due to the saturation in CPU usage, not any connection problems.

      “running a tight loop with blocking semantics would be the fastest option by far.”

      I did bench Erlang and Haskell with SMP disabled, and the Python loop is not blocking at all. I am quite certain that Erlang and Haskell also utilize nonblocking sockets since forkIO does not dedicate an OS-thread, and Erlang’s active mode asynchronously receives messages in a queue.

      “Erlang has never done that well in microbenchmarks.”

      The problem here is that if Erlang should be fast at anything, it should be fast at being a server and handing out bytes. After all, that is what it is made for, isn’t it?

      I agree with you that the code-swapping is good, but is it really necessary these days when distributed systems are spread through multiple machines that can be turned off at will? A couple of decades ago, a company like Ericsson had a handful of computers. Now companies have tens of thousands in a single datacenter.

      • Ulf Wiger says:

        The normal backlog setting does not affect the SYN flood protection in the IP stack (as far as I’m aware, this is not an iptables issue).
        See e.g. http://www.erlang.org/cgi-bin/ezmlm-cgi/4/43624 for some more detail on this. The drop off in your graphs indicate that there is more going on than simply CPU saturation, which by itself doesn’t tend to give such dramatic performance degradation.

        And no, this is not what Erlang was made for. Erlang was made for control system logic, where you do have lots of communication over sockets, but most importantly, there is coordination between lots of different “actors” – resource reservation, bandwidth regulation, billing data reporting, etc.
        Most of the time, these systems are not “central office applications”, but spread out all over the place, and with fairly stringent footprint and power consumption requirements.

        And what matters most in these systems is not that they are as fast as can be, but that they behave robustly and are easy to evolve and maintain. They need to be “fast enough”, and Erlang has proven to provide sufficient performance to satisfy this requirement. From what I’ve seen of Haskell and Python benchmarks, so would they. The question then is if they meet the other requirements. The answer will vary from domain to domain, just as it does for Erlang.

        One of the things that tends to complicate matters a lot is when you start thinking about fault tolerance, and “recovery units”. In Erlang, for a wide range of errors, the only affected part of the system is the session where the error occurred. For a fairly ambitious study of Erlang’s performance in more realistic setting, read e.g. http://www.macs.hw.ac.uk/~dsg/telecoms/publications/erlang03.pdf

        • admin says:

          Quote from your link:

          “My experiments with this is that there is a very close relation between
          the backlog and how many connections / second you can handle.”

          I increased the backlog limit to 60,000 as I mentioned above, so there is no syn flood protection dropping connections. And even if there was, httperf would have detected it as a timeout error.

          • Michael says:

            It’s funny how you keep misrepresenting what Erlang is supposed to be good at, and then ignoring the rebuttals which clarify things, both in this thread, your article, and throughout the comments section.

          • admin says:

            @Michael

            Why don’t you elaborate then? I have responded to every supposed flaw, while the detractors here keep ignoring them and repeating the same thing over and over again.

            Here is a list of nonsensical arguments that I have debunked that you all keep repeating:

            - Preemption at the VM level of Erlang and Haskell should be as expensive as a true OS thread switch:

            Wrong, it should only cost maybe 2-3 C level function calls, with most of the registers remaining unchanged. When you compare this to an implementation in Python, there is absolutely no reason for these languages to be slower.

            -This benchmark doesn’t do anything like query databases and do expensive computations:

            The point of this benchmark was to test the language implementation of multiplexing. What are you supposed to do in a benchmark? You remove variables, you don’t add them.

            - The graph is inaccurate because of syn protection etc…:

            Wrong. I’ve tweaked the kernel to support more connections than I test with.

            - The Python implementation is stateless or it isn’t a scheduler:

            Wrong, the Python implementation is stateful. Every socket is attached to a Python object, and you can attach arbitrary state to it. And the accusations about not being a “real” scheduler? How do you think a scheduler is implemented? Hint: it looks like the Python script.

            - Erlang wasn’t made for fast servers:

            Ok, I’ll take your word for it. But it still doesn’t change the public perception. And if it’s not a suitable language for servers, then what is it good for?

  4. augustss says:

    Since your benchmark does no significant computation a “slow language” versus a “fast language” makes very little difference. You are benchmarking some limited aspects of the runtime system. That’s not a bad thing, but it’s good to know where the time is spent.

  5. I gave this some thought and here is my hypothesis:

    It is clear that the Python example is different from the examples of Haskell and Erlang. In the python example, we run rounds off of epoll() whereas the other system use a light-weight userland scheduler. What happens in the python code is that in each epoll-round, we will get any new connections accepted due to the onInput() dispatch in the server. Haskell and Erlang will only do this whenever the server process is scheduled to run on a core. The chance that the server process is run will dwindle when we have many processes, in particular when we accept a lot of processes and can’t complete all of them in their time slot. In contrast, no such thing will happen in the Python process. It will clog when the acceptor queue fills up and that happens when it can’t complete epoll rounds fast enough.

    This hypothesis can be tested by measurement of the number of active processes in the erlang and haskell runtimes compared to when the server process runs. Haskell and Erlang dies due to the scheduler.

    A knob worth trying to play with is the Erlang +A option. It controls async threads for IO use. Even a low setting like 32 should experience a speedup if there is anything to win here.

    Another idea is to run the scheduler in Erlang and Haskell in a transposed fashion like epoll() does in Python. That is, we spawn a fairly small amount of processes. Each process basically runs a forever loop where it in turn: 1) accepts, 2) communicates. It may not help much though.

    • Ulf Wiger says:

      I don’t think the +A option will make any difference in this case. The inet driver doesn’t make use of the asynch thread pool (the driver has to do this explicitly, and e.g. the efile driver does).

      Personally, I think it’s pretty impressive that Erlang is able to keep up as far as to 10K requests/sec, considering that for each incoming, it (1) spawns a process, (2) hands over control of the socket to the new process, (3) schedules it for execution, (4) the new process sends a reply when it gets a timeslice, and (5) the process is terminated and memory recovered.

      …all within 100 usecs of effective cpu time.

      For this tiny benchmark, Erlang’s acrobatics are clearly overkill, and there really is no way in Erlang to cut down on the overhead much. This is due to the assumption that all meaningful applications where Erlang is a suitable choice, this is a bare minimum of what you need.

      • admin says:

        “For this tiny benchmark, Erlang’s acrobatics are clearly overkill, and there really is no way in Erlang to cut down on the overhead much.”

        Erlang “processes” are supposed to be lightweight. It is not a true OS thread switch. Handling over control of the socket to a new process is merely writing a (process, fileno) tuple into a hash table or tree. Scheduling of execution should then put the “pid” at the end of the scheduler’s linked list if it needs more cpu or put into the list for select or epoll to poll. None of this should be cpu intensive.

        I am effectively doing the same thing that Erlang does in this Python script by creating a new Python object which costs at least 280 + 64 bytes not even including the member variables every time a client comes in. The socket is handed over by saving it as a member variable and then saving it to the global fd->client hash table where it is scheduled to be checked on upon connection. When disconnected, the client object removes itself from the global mapping, and is appropriately garbage collected.

    • admin says:

      If Erlang and Haskell run round robin schedulers, then it should have the same logic as the Python script.

      There is 1 acceptor “thread” and X client “threads”. With each call of epoll (which is essentially the python scheduler here), one accept is made per loop, and in each loop the number of client “threads” accumulates.

      If Erlang/Haskell is dedicating all the cpu to accepting connections instead of serving clients, this is indicative of a bad scheduler. A good scheduler would round robin through the tasks.

      • You are exactly at where I am at: In Python, you are essentially measuring how fast epoll is. In Haskell and Erlang you are measuring how fast select/epoll is with the added work of running the scheduler on top of that. It should come as no surprise that Python is faster in this case. “Green” threads may be cheap, but they are certainly not free.

        One thing makes me wonder however: why are the Erlang and Haskell programs not receiving up to 1024 bytes as is the case for Python?

        • Don Stewart says:

          So this really is “debunking user land schedulers versus custom epoll” for a benchmark :-)

        • admin says:

          The problem here is that if Erlang and Haskell were sanely implemented, they should be blocking on epoll or select when the threads were waiting on IO.

          Otherwise, what you are suggesting is that Erlang and Haskell are wasting cycles on busy loops while blocking on IO.

          And no, I am not simply measuring how fast epoll is. I am using epoll to trigger my own scheduler written in pure Python which should by all means be slower than Erlang or Haskell.

          • Don Stewart says:

            It is an interesting result: a custom user land scheduler in Python can beat general purpose preemptive threads in Erlang or Haskell, for this particular benchmark.

            And in general, I would be fairly confident custom schedulers in each language will beat general purpose schedulers in each language.

            I’d still love to see a custom scheduler in Erlang and Haskell as well, alongside, e.g. using the Haskell epoll user events library, http://github.com/tibbe/event/blob/master/docs/design.md

          • Ulf Wiger says:

            >The problem here is that if Erlang and Haskell were sanely implemented

            You should at least try to consider the possibility that there is some sanity in the designs of the Haskell and Erlang runtimes, and that your small benchmarks may not hold the entire truth about performance in concurrency-related applications.

            Having been fairly closely involved with the last 13 years of tuning Erlang for massively concurrent commercial systems, I can safely say that your benchmark ignores several things that are real issues in products of any significant complexity.

            You choose not to believe this. Fine. You are not the first to be supremely confident that your small prototype beats mature frameworks, and that the things you’ve omitted will have no effect on the outcome, once added. Verifying your assumptions in a full-scale project will be a good learning experience, if nothing else.

            Depending on the problem you’re trying to solve, your approach may indeed give better performance (and be sufficiently powerful). In order to draw more general conclusions, you need to do more work.

  6. Jim Roepcke says:

    You’re surprised a single thread on a single core outperformed SMP and multicore implementations for an IO-bound problem? This lesson is taught in every undergraduate operating systems, concurrency and networking course.

    Modify your examples to handle hundreds of thousands of processes, all seamlessly distributed over multiple nodes, gracefully handling errors between nodes… that is trivial with Erlang, and will continue to perform very well. How much code is it going to take to add those capabilities to your super-fast single-threaded utterly not fault-tolerant Python program? Lots. How fast is it going to be once you start throwing in locks to work around shared memory concurrency? Not… very.

    The conclusion of this web server shootout is about as useful as racing a jetski against a tugboat, declaring the jetskis to be faster since the jetski won the race, and then deciding to pull barges into the harbour with jetskis from now on since they’re so much faster.

    • admin says:

      Please actually read the entire article. I disabled SMP support.

      And no, it is not an IO bound problem. The bottleneck was the CPU while IO measured at below 1 megabyte per second.

      • Jim Roepcke says:

        The fact you disabled SMP support has nothing to do with it. I did read the article… several times. It starts:

        “Ever since Intel and AMD have been selling multi-core cpus, the Erlang hype has been growing continuously.”

        And then you “debunk” the concurrency Erlang hype by showing us a python program that runs on a SINGLE core and does not use ANY concurrency constructs like threads, processes, fibres, coroutines, (etc), WHATSOEVER.

        Let me state that again, because it really is stunning… you assert Erlang’s hype for multi-core support is undeserved because a *non*-concurrent, single-core Python program ran something a little bit faster than a *concurrent* Erlang program running on a single core. *claps*

        As I said, you’ve demonstrated what every undergrad in CS learns, that a single thread of execution will perform better on a single core than multiple threads are. This is well understood!

        The Erlang program you posted spawns a new process for every accepted connection. Yes, Erlang processes are very lightweight, but they are not a NOOP. The Python program does not spawn a new process, or thread, or any other concurrency construct. The Python and Erlang programs consume bytes over the network at the same rate. The difference in performance is found with what the programs are doing during and between the connections. The Python program does nothing, while the Erlang program is pre-emptively scheduling thousands of processes to handle the incoming connections.

        Again, there is no surprise in your results, and it certainly does nothing to damper the enthusiasm for a language and runtime built for concurrency, distribution and fault-tolerance running on distributed clusters of multi-core systems.

        • admin says:

          “And then you “debunk” the concurrency Erlang hype by showing us a python program that runs on a SINGLE core and does not use ANY concurrency constructs like threads, processes, fibres, coroutines, (etc), WHATSOEVER.”

          If you don’t think a webserver shows any concurrency whatsoever, I don’t know what to tell you. Just because you can see the scheduler in Python doesn’t mean that it is magically non-concurrent.

          “that a single thread of execution will perform better on a single core than multiple threads are. This is well understood!”

          And you are simply incorrect when the bottleneck is the cpu. I just added a fork on the python version and it increased req/sec by 1,000. So what do you say to that?

          “while the Erlang program is pre-emptively scheduling thousands of processes to handle the incoming connections.”

          You need to brush up on your Erlang skills because that is not what my Erlang script does.

          • Jim Roepcke says:

            In your Erlang accept/1: “Pid = spawn(fun() -> loop(Socket) end)”

            accept/1 spawns a process for each connection. BEAM schedules the execution of the processes running for each connection.

          • admin says:

            @Jim

            Erlang is synchronously creating a worker for each client as they come in. It is not by any means pre-emptively creating these threads before the clients come in, and therefore no busy waiting is done by Erlang.

          • Jim Roepcke says:

            I’m not sure what part of spawn creating new Erlang processes in accept you aren’t willing to recognize and admit here. I’m not suggesting busy waiting occurs, I’m suggesting BEAM context switches between the processes and that takes cycles.

            The Erlang program uses its VM’s concurrency mechanism, the Python program uses only epoll and thus only the OS’ asynchronous I/O mechansim. No Python language or VM features for concurrency are at play, only kernel-level event handling. This is not an apple-to-apple comparison.

          • admin says:

            Haskell and Erlang also uses “kernel-level event handling”. They use select and epoll.

            Do you see the multiplexing part that is calling methods on the client objects? This is concurrency whether you think so or not.

          • Jim Roepcke says:

            It is vastly less resource-intensive (by design) to do epoll-based multiplexing than pre-emptive scheduling of hundreds or thousands of concurrent Erlang processes each doing IO.

            Your comparison of epoll vs. select(or epoll (or whatever))+spawn is not even close to reasonable, and it absolutely has NOTHING to do with Erlang’s suitability in a multi-core environment, because your argument is based on Python code that does not utilize multiple cores, nevermind multiple nodes, which Erlang is designed to support as transparently as it does a single core on a single node.

            Write a Python server that, for each connection, spawns a thread or whatever lighter-weight alternative you prefer to be competitive with Erlang’s lightweight processes, ensure those processes run evenly on all (or one less to avoid the ‘last core parallel slowdown’ problem on Linux) available cores, and then you can start comparing Python to Erlang and evaluating Erlang’s suitability for multi-core development.

            You do everyone a disservice by suggesting multi-core performance can be characterized by these trivial programs.

          • admin says:

            Last time I checked, serving a web page per client belongs in the class of embarrassingly parallel problems.

            The Python server already has lightweight “threads of execution”. It is called the onInput and onOutput method. It would probably be helpful for you to imagine these methods unified as resumeThread. This is essentially equivalent to what is happening under the hood of Erlang and Haskell.

            And speaking of multicore development, you still haven’t addressed why if I put a single fork statement in the python script, effectively making it multithreaded, it adds 1000 req/sec.

            I am done arguing with a wall here. I would suggest that you try to implement your own user-mode scheduler as it is clear you do not understand how they work.

          • Jim Roepcke says:

            You keep suggesting they’re equivalent, but they aren’t. Erlang processes are general purpose, select/epoll are not.

            I can’t address the fork version of your Python script without seeing it!

            Your assumption I do not understand what is happening is invalid. I have, for example, implemented a coroutine library in C, written a web server using that library, and evaluated its performance. So I’ve done these shootouts before, I know the results. They’re not a surprise. No, mine are not published, but why would they be, these results are in every textbook. They are not interesting.

            Like I said, if you want to evaluate the hype around Erlang’s multi-core support, you have to dig a lot deeper than a hard-coded 5-byte HTTP response, and when comparing performance of a Python script to an Erlang program that uses thousands of Erlang processes, the Python script should also be using a general purpose concurrency mechanism, not something optimized for a single purpose like select/epoll and furthermore executing only at the kernel level and not at the language/VM level.

            Since you refuse to address the obvious holes in your comparison, there is no hope for a useful conclusion here.

            Good day, and good luck.

          • admin says:

            The source code is already out there for you to try. Since you claim to know Python, there is no reason for you to not try it yourself.

            I don’t doubt that adding a scheduling mechanism adds overhead.

            The real question is, which you keep avoiding, does it really add over 20% overhead on top of Python being a slower dynamic language? The shootout benchmarks show that Python is expected to be 3-30x slower.

            Since you claim to have implemented coroutines and schedulers yourself, then you should already know that the context switch overhead is very small and equivalent to a function call.

            Only a few registers are changed, and the instruction register is not loaded from an unusual point where the branch predictor fails. The cost of a few mov opcodes is negligible compared to the rest of the program. How you refuse to understand this is simply baffling.

  7. Steve Davis says:

    If python does it for you, then I guess you should continue to use it.

    If you claim to “debunk” then you need to cover the angles. Your arguments are totally unconvincing.

    /s

  8. MononcQc says:

    I certainly am partial to Erlang, but I’m curious as to if there is any gain in performance by using the newest version of Erlang (R13B03, erts-5.7.4).

    The implementation you use (R12B5, erts-5.6.5) pre-dates big improvements made to the SMP system in R13B. Using a new version might boost the performances. You might also want to try compiling with HiPE enabled, which might or might not make the code faster (you really got to test it to know).

    See the SMP details and why it might make a difference here: http://www.erlang.org/doc/highlights.html

    • admin says:

      Hello MononcQc, thank you for the suggestion.

      I benchmarked with R13B03 (erts-5.7.4) HIPE compiled native and it made the SMP problem go away. Now it is on par with smp disabled Erlang.

      • MononcQc says:

        Glad to know it made it better. I’d be interested to see a follow-up article with graphs showing the relative speedups of each of your programs when adding/removing cores, if you’re ever interested in doing that.

  9. Robert says:

    You say funny things.

  10. simonmar says:

    The Haskell program as it stands won’t scale up on a multicore because it only has a single accept loop, and the subtasks are too small. The cost of migrating a thread for load-balancing is too high compared to the cost of completing the request, so it’s impossible to get a speedup this way. If you create one accept loop per CPU then in principle it ought to scale, but in practice it won’t at the moment because there is only one IO manager thread calling select(). Hopefully this will be fixed as part of the ongoing epoll() work that was mentioned earlier.

    Regarding the slowdown you see with -threaded, this is most likely because you’re running the accept loop in the main thread. The main thread is special – it is a “bound thread”, which means it is effectively a fully-fledged OS thread rather than a lightweight thread, and hence communication with the main thread is very expensive. Fork a subthread for the accept loop, and you should see a speedup with -threaded.

    More background on a similar benchmark in this ticket: http://hackage.haskell.org/trac/ghc/ticket/3758

  11. Josh says:

    This would be impressive if you actually did something with the client connections.

    Unfortunately for you, since Erlang and Haskell compile to machine code, the more computation you do for each client, the better they’re going to fair versus Python.

    • admin says:

      How much is enough to be considered “something”?

      I guarantee you that as long as the benchmarks show Python performing faster than Haskell and Erlang, people like you will still complain that the benchmark isn’t doing “something”.

    • Ulf Wiger says:

      Erlang doesn’t necessarily compile down to machine code. Most of what’s going on in this benchmark is exercising the runtime system and the inet driver.

      A big deal is made out of the fact that the python program hits the ceiling at 20% higher load than Erlang. Yet the python program schedules completely on socket events and is completely stateless, whereas the Erlang processes have their own thread of control and are able to maintain state through context switches as well as block individually and selectively await certain signals. While one can discuss back and forth how much overhead it should add to accomodate for that kind of functionality, the only proper way to find out is to perform the relevant benchmarks.

      If all you need to do is switch on incoming packets and dispatch stateless callbacks, this benchmark shows that you can easily write a program in Python that outperforms an Erlang program. This should be encouraging news to python programmers who have exactly this problem.

      Others can conclude from the figures that Erlang and Haskell do pretty well too, even though this is more low-level than the problems they were intended for. Ultimately, all three languages demonstrate that they can be pretty darn fast, even though they all have a reputation for being slow.

      • admin says:

        “Yet the python program schedules completely on socket events and is completely stateless”

        That’s incorrect. The state of each socket is encoded in a Python object, which is at least 280 + 64 bytes. You can add any state you like inside this object with the appropriate instance variables or overridden methods.

        • Dennis Decker Jensen says:

          The Python objects are stateless, because they don’t carry around message queues, run-time-stacks, and so forth.

          And all of this still runs on one machine, so it really doesn’t show anything about how well a distributed system in each language is going to run – and “well” can mean a lot of different things depending on what one wants to achieve.

          • admin says:

            An object by it’s very nature is stateful. It already contains state for input and output buffers along with the socket.

            Just because you stash data in a member variable as opposed to a C stack does not make it “stateless”. This is like arguing “a stack allocated in the heap is not really a stack”.

            And about scaling to multiple computers, that is not the objective of this benchmark. Scaling to multiple cores however, was shown to have slowdowns in Erlang and Haskell.

          • Dennis Decker Jensen says:

            When people talk of state in this context, they aren’t just talking about allocating memory, which all of the programs are doing of course. It’s obvious that more is going on, right? An Erlang process, an active object, is not the same as a passive object although they both take up memory. I can’t speak for the Haskell example.

            As far as I can see for the Python program, you have exercised it’s ability to use the epoll of the OS, and it does so rather well, but that has nothing to do with scaling to multiple cores. That has to do with how well the OS does polls meaningfully in accordance with the specification of epoll – and from Python, and how well the hardware can do that.

            It would be really interesting to see an example in Twisted (Python framework) with some kind of scheduling above to compare it to Erlang – or the other way around to see an Erlang example of a custom scheduler on top of a call directly to epoll like the Python program does.

            I suspect there wouldn’t be much of a difference. Once the wall of the OS or the hardware has been hit, there isn’t much more that can be done.

          • admin says:

            @Dennis: I’ve been over this again and again, and I’m getting very tired of dispelling this myth.

            When you run Erlang on their lightweight thread, this is not an OS thread, the “context switching” is managed by their VM.

            Guess how “context switching” happens in the Python script? That’s right, it is also managed by the VM, except it is interpreted, which means it should be even slower!

            I simply do not understand why there are a few people who refuse to understand this and are very vocal about it. The context-switching overhead for Erlang should be at the slowest, equivalent to the Python interpreter which calls much more than 1 C function per instruction executed. There is no lock contention slowdown for Erlang and Haskell in this case because I disabled SMP.

            And finally, I don’t know how many times I have to say this, but this benchmark is about servers.

            If you have a server that uses 10 cores and is slower than a server that uses 1 core, which one are you going to pick?

            This benchmark doesn’t show the limiting factor to be the system calls. If it was, Erlang would at least bottleneck at the same amount of requests/second with Kernel polling enabled since it also uses epoll.

          • dude says:

            @admin

            You’re _such_ a dick and _so_ wrong about all of this that it’s beyond stupid and has moved into ‘amusing’ territory.

            Despite my amusement, I wont be returning to your blog in the future.

  12. jrEving says:

    i don’t get it. Ok, python’s FFI works. Your program(scheduler?) looks like any C, C++ code snippet out there, used for the last 20 years (with bsd select instead of poll) to explain beginners how to use sockets,epoll,select and the like. Ok, it’s not C++, it’s python. However, what does that show? Ok, that Haskell is a really nice programming language. But wait, the Haskell program is a totally different beast, yielding the same “Nothing” when used wise (see Don Stewarts comments) more faster then the python thing. and what?
    I would suggest: write the Haskell piece again using the same FFIs and the same imperative do monad scheduler(what’s scheduled here? what the OS is telling, giving to us; in the same order the system is giving that to us. Wow! call it a soldier, not a scheduler, or even multiplexer, pah!) until the Haskell program looks much more similar to THE WORLDWIDE WELLKNOWN socket/accept/poll loop you are so proud of, generating tons of lazy calls, ignoring “onInput :: () -> ()” :-)

  13. GR says:

    @admin

    I wanted to read this article to *learn* something.

    However, after reading this article, and your rather antagonistic comments, you come across as a stone wall of arrogance rather than a font of knowlege. Did you take this ‘benchmark’ project on in order to learn something, or rather just to pick a fight to up your ‘street cred’.

    After reading the comments I wanted to learn more about who you are, and to find out if your credentials can backup your arrogance…

    Alas, you:

    - Hide behind the ‘admin’ name. A rather unusual choice for the blog poster himself to be the ‘anonymous coward’.
    - Provide no information about yourself or your experience in performing benchmarks.
    - Your codexon.com domain lives behind a secret DNS registration that was only registered 8 months ago.
    - You tend to write controversial articles about US Taxes, browser benchmarks, and how to game the system to gain reputation points on StackOverflow. ( http://bit.ly/ckhCps )
    - Several of your articles claim to ‘debunk’ something.

    Since you decline to provide any bona fides or reasons why I should assign you any credibility at all I will choose not to do so. You shall soon be forgotten as just another benchmark poser looking for a quick fix on digg.com, reddit, stackoverflow, etc.

    Too bad.

    • admin says:

      I do not present credentials because I prefer that the content speaks for itself. The benchmarks and source code are always available for you to verify.

      Would these facts be any less correct if I said I did not even graduate from high school? No. Would these facts be any more correct if I said I had a PhD from Harvard? No.

      You are a shining example why private DNS registration is free from 1and1. And so what if this domain was only registered 8 months ago. Are you saying that the older your domain is, the more people should believe whatever you say?

      Since you decline to provide any bona fides or reasons why I should assign you any credibility at all I will choose not to do so.

      If you choose to ignore a reproducible benchmark, that is your loss and reflects on your own laziness and arrogance. I do not feel the need to reveal my personal information just because you cannot be bothered to reproduce it. I can tell you right now that no one who has downloaded the source code has refuted the results.

      And about my tone, it’s true it isn’t always courteous. But what do you expect when most of the people complaining such as yourself are far more argumentative, and saying things like “I didn’t do X” when it says “I did do X”. The only reason you didn’t call them out for being arrogant is because you are an Erlang fanboy yourself. You are hardly unbiased.

  14. anon says:

    Egads. Anyone doing any research on Elrang online will quickly realize that it doesn’t do benchmarks very well. It sits within a battleship of a VM with oodles of overhead written in C to do very generalized message passing between lightweight distributed node computers on vast networks. It’s simply a cost effective way for Ericsson to run and update their networks. Go into the VM source code and take out the stuff you don’t need for a web server and then you might start seeing some speed. The only slower languages are like PHP and javascript. Haskell is made to be sublimely academic. It was made to outdo Lisp as a meta language for proving math theorems and PhD CS problems. No one is selling it as a web server solution. Finally, you don’t usually roll your own web servers. Apache wasn’t built in a day, so in general you need to see what language you want to use on top of the web server, not as the web server. Where erlang is being sold is as a unix-like “been there, done that” vm that has solved a lot of real world problems already so that you don’t have to make the same mistakes all over again. This is mainly for proprietary networks/protocols, not the Internet. What would be a more interesting comparison for you would be Python vs Lua, which really is being touted as fast.

  15. Banador says:

    Ok, this is a bit ad hominem, but if you take on the quest, it shall give us more… It should also be mentioned that some humour is required, sadly. :P

    What have you done with Erlang in real-life? I don’t say that you evade this question, but it seems that you could be a former “Erlang blogbaboonfanboy” yourself.

    Does Erlang suck at everything you do with it?

    I’m a CS major myself and I could not write such a Python set. In my uni the “CS” is more about information system building and engineering instead of the tightest loop and the slimmest select. Surely those companies have used time and money to investigate the available languages/libraries before doing such a drastic step to adopt yet another language that barely no one uses professionally compared to Python, Java and C++. I’d say, that is the biggest NO-NO currently in Erlang. Those ass-stenching micro-benchmarks mean nothing in the management level. See Bjarne Däcker’s paper on what Ericsson Radio Ab did to Erlang. (Yeah, you know it already. It never died and its’ open sourcing gave you a cancer in the costume of a messiah. Cancer as in a crab.)

    The thing I have learned from the blogosphere is that one should stay extremely critical. You are creating a double-fool of yourself here. You debunked yourself because you were a fool by believing the fanboy blogs and you debunked the management of those mentioned companies. How foolish is that? Perhaps you are a polymath triple degree manager of a multi-million corporation using Erlang for something it shouldn’t, but since the boulder has started rolling you cannot reverse the outcome: your company is going to suffer, because the ugly truth presented in this blog.

    How about that. Looks like I wrote a letter!

  16. Tim says:

    If you’re not doing any meaningful work in each request then you may as well compare programs that simply print a message like “Hello World” or “Pong”. Oh, wait… that IS what you are doing. If you can find a useful application for a system that simple and does nothing else, and make money from it, then maybe you are on to something. Good luck with that, I hope you make millions.

    I hate to say it, but the naivete of the benchmark and your mostly lame responses to valid criticisms indicate to me that you are one of :
    a) A Python fan-boy.
    b) An out and out troll.
    c) Just not that well rounded in terms of real-world experience.

    So prove me wrong by changing your benchmark so that each request does something that is more in tune with the real world :
    1) Have each request do some real work before completing. e.g. do some calculations, query a database row and do an update. Just do something that actually consumes not insignificant resources before it finishes.
    2) Include the simulation of bugs by having some of the requests deliberately hard-crash by doing a divide-by-zero, and don’t catch the errors in the code. (For it to be meaningful to the real world it MUST include this.)

    When you’ve done all that, come back and tell us what you see. If Python stands up equally well as it does in your current benchmark then you will have something interesting to say. Until you can do that your benchmark proves nothing and has no value in an either an academic or industrial context.

    • admin says:

      The whole point of this benchmark is to see what the overhead in creating a web server is. Why should I fill my benchmark with useless database queries that are probably written as modules in C? If I did that, then I would have blathering idiots saying that I wasn’t really benchmarking Python.

      You see I have 2 choices:

      1. Use the scientific method and try to benchmark the bare minimum amount of code in order to have 1 independent variable. And as a consequence, be criticized for the benchmark “not doing enough”.

      With a 20% speed difference that Python has over the other 2 languages, it is easy enough to add some junk computation and still be #1. I leave it as an exercise to you to implement it.

      2. Benchmark a “real life load” whatever that is, and be criticized for doing too much and comparing apples to oranges since many Python modules are implemented in C.

      If you want to benchmark exceptions, go ahead. But the fact is that exceptions are meant to be exceptional. If you have a web server throwing exceptions 1,000 times a second, you have bigger problems than the language. Not to mention that Python exception handling isn’t slow.

      So far, the only people who have criticized this benchmark are fanboys from the other two languages. This is probably because everyone on Twitter and elsewhere have caused it to rank highly on Google for Erlang and Haskell. Next time I would suggest tweaking the benchmark to your tastes (the source is on the page) instead of repeating Ian Bicking.

      • Tim says:

        No, your point apparently was “Debunking the Erlang and Haskell hype for servers”. Well here is a quick attempt at debunking your debunking.

        You don’t have to do database queries to make it a meaningful measure, but you DO have to do SOMETHING. Clearly you have great technical ability, but I think you are failing to temper it with the bigger picture, and without that your knowledge is much less useful that it could otherwise be ; what you are apparently failing to understand or willfully ignoring is the whole point that your Python test does not serve the queries in parallel (it is doing them one at a time) whereas Erlang is doing them all at the same time. So you’re not comparing apples to apples, and it’s questionable if you’re even comparing apples to any kind of fruit at all.

        Watch and learn.

        I’m going to keep this simple, and not mess with any OS or tcp tuning, because it’s simply not necessary to prove my point. Also I’m not even going to bother with including the failure case that I said was mandatory ; if I did then you’d still get good performance from Erlang, but your Python test would fall flat on it’s face (but that’s not what I’m trying to illustrate here : I’ll leave that as an exercise for you, if you care to understand it). I’m also not even going to worry about a high load in terms of number of connections per second ; again, I can easily show how bad your Python method sucks wind without having to bother with any of the above fluff.

        Here’s what you do : take your python code and add the following immediately after “def onOutput(self):”, “time.sleep(.01)”. This is a naive test, but all I’m illustrating here is that it does take some kind of wall-time to serve non-static content, i.e. each time an HTTP request is made a real-world server has to do “something”. The something in this case is to simply sleep for 1/100th of a second.

        Now, I’ll do the same thing to the Erlang server, from hello2.erl : in “loop(Socket)” immediately after “{ok, _} ->” I’ve added “timer:sleep(2000)”. Note this well : I am giving you a BIG headstart, and lots of rope for you to try to hang me : each Python iteration will sleep for 0.01 seconds, whereas each Erlang iteration will sleep for a whole 2 seconds. Yes, I’m going to sleep for 2 seconds in each and every Erlang transaction.

        Now, let’s benchmark them. I’m running Erlang on port 8050 and Python on port 8051. First let’s just run one transaction through to verify my code does what I’m saying it does :
        $ httperf –port=8050 –num-conns=1 –rate=1
        Total: connections 1 requests 1 replies 1 test-duration 2.024
        $ httperf –port=8051 –num-conns=1 –rate=1
        Total: connections 1 requests 1 replies 1 test-duration 0.010 s

        As expected the Erlang server took 2 seconds to serve one lousy request, and Python served one in .01 seconds.

        Now we’ll run 1,000 connections at 1,000 a second. What should we expect ? Well, “work done” by Erlang will be “1000*2 seconds”, and work done by Python will be “1000*0.01 seconds”. Python should win hands down, right ?
        Let’s see :

        $ httperf –port=8050 –num-conns=1000 –rate=1000
        Total: connections 1000 requests 1000 replies 1000 test-duration 3.001 s
        $ httperf –port=8051 –num-conns=1000 –rate=1000
        Total: connections 1000 requests 1000 replies 728 test-duration 10.052 s

        OK, so Erlang took 3 seconds (1 second is httperf sending requests and 2 seconds is Erlang processing 1000 two second delays in parallel), versus 10 for Python (1000 * 0.01 in serial). You can see here that in a non-trivial real-world system, by far the most important point is wall-time to service an http request, not how long it takes to handle the IO alone. If you want to live in the world of glorified “hello world” then by all means go ahead, but don’t expect to earn much geek-cred, and by doing that you’re not proving or disproving anything, and you’re certainly not debunking any putative hype.

        And we’re not even comparing apples to apples. Let’s try it when Erlang sleeps for 0.01 seconds, as Python does :

        $ httperf –port=8050 –num-conns=1 –rate=1
        Total: connections 1 requests 1 replies 1 test-duration 0.039 s
        $ httperf –port=8050 –num-conns=1000 –rate=1000
        Total: connections 1000 requests 1000 replies 1000 test-duration 1.010 s

        Do you now understand why you test has little or no bearing on the real world ?

        • admin says:

          Your test has even less bearing on the real world. How many websites do you know have a “sleep” function so they can return a request 1 second slower? Python can do an event based sleep just as efficiently as Erlang.

          A Python “sleep” guarantees your process will idle, while the Erlang one doesn’t. Erlang calls the same OS sleep function in its VM scheduler. You can implement that in Python as well. The big difference is that Python allows you to choose, while in Erlang, this is implicit.

          your Python test does not serve the queries in parallel whereas Erlang is doing them all at the same time.

          Is Erlang more “parallel”, for whatever fuzzy definition you want to use, than Python? Of course. But as the test shows, you might as well just use a Python implementation and fork it for each CPU you have. (Erlang does not even context switch if a thread uses less than 2000 function calls).

          The question you need to ask yourself is this: Do I really need Erlang’s capability to serve 1,000 requests equally slowly with the overhead of switching 1,000 times, or would I rather serve 1 request (and multiply that for each CPU) at a time so I can serve 1,200 times in the same period?

          For 99% of web servers, the answer isn’t Erlang.

          • Tim says:

            I did say my test was naive, but doing the sleep takes a lot more effort than doing nothing at all, which is what you are measuring in your tests. Again, you miss the point : serving “nothing” proves nothing. To serve “something” you have to do something, and that “something” will take time. In this case the sleep is the “something”. You can substitute the sleep with any other piece of logic of your choosing and my point will remain valid.

            My definition is not “fuzzy” at all. Erlang serves the requests in parallel, your Python code is strictly sequential. That is crystal clear.

            You say “at a time so I can serve 1,200 times in the same period?” ; Did you even read the httperf results ? Apparently not… I left something in my results that I deliberately didn’t point out : the Python test did not even complete all the requests in time. In the version where Erlang did the equivalent delay, it served each request within a few milliseconds of being submitted, whereas Python served each request with a longer and longer wall time until eventually the last few hundred requests timed out altogether. Until you have a forking version of Python that serves in parallel then you don’t have an equivalent test.

            Erlang is not a silver bullet, and it’s certainly not the right answer for 99% of web servers (who ever said it was ?). Make both the Erlang and Python programs do real work then you will see performance profiles change again. The only real point I’m trying to make here is that you are basically saying “Python is faster because it does task X faster than Erlang does task X” but in fact you have having Erlang and Python perform entirely different tasks.

            This is the last I will say on the matter. I think I and others have gone to great lengths to explain the flaws in your tests is to you, and so far you’ve shown little ability to grasp these extremely simple concepts. This discussion has now deviated far away from your initial proposition, and is now mainly about your shocking lack of understanding of some extremely elementary multitasking concepts. I find it impossible to accept that anyone with your technical ability can be that ignorant. Therefore, as per my initial suspicions, I have now concluded that you are trolling. Good day.

          • admin says:

            The default Python sleep function will tell the OS to pause the thread completely, while the Erlang sleep function will return immediately and pass control to the Erlang scheduler, potentially executing more cpu cycles. These are two completely different things. (You apparently do not know what cooperative multitasking is)

            The fact that you are even trying to argue this means that you are either extremely ignorant, or you are a troll.

            Since I cannot convince you of your ignorance (you are obviously out to prove me wrong), please post on Stackoverflow asking if sleep in Erlang takes more “effort” than buffering IO. You will simply get laughed at.

          • Are you *really* trying to refute this example? It’s quite obvious that Erlang handles concurrency better. Two seconds per requests vs .1 and still blows it away! Your benchmarks prove that your python code can print “Pong!” faster than the other two. Impressive. These benchmarks won’t prove how the languages will do in the real world unless they do some sort of work. And the real world performance is what you are trying to debunk, no?

          • admin says:

            Matt, read below where Tim finally admits my benchmark is valid. Stop trolling for the sake of your future customers.

  17. Sandeep says:

    I hope readers haven’t been put off by the lack of positive comments here, mostly because the author has done a good job with the rebuttals.

    As the author said, most of the people here do not have a fundamental understanding of how concurrency in Erlang works and it reflects in their comments. Erlang and Haskell both have a fanatical fanbase, and any negative article will get attacked like this. That is why this article has been mentioned so many times on Twitter and Delicious.

    You have done a great service in showing that it is not panacea or even suitable for mainstream servers. With any luck, this will temper the next wave of software choosing Erlang for no good reason.

    • Tim says:

      @admin : I’m replying out of thread here, sorry. I’d reply above, but there’s no reply button now. (?)

      Ug… I’m not sure why I’m still here. Once more : again, it’s not about “sleep
      , it’s about doing work.

      Just show us a benchmark where the programs do some actual work (an equivalent amount), and where Python beats Erlang. I’m going to predict the future and say that you either don’t respond to this, or waffle on about why you shouldn’t have to, don’t need to, or don’t want to. But the simple answer is you won’t because you can’t. Why not just implement a version in Python that actually does a fork ? Heck, you might even have a point to make ! OK that’s enough from me, I’m not saying anything else until you’ve posted some benchmarks of programs that actually do something.

      @Sandeep : Depending on exactly what you mean by “mainstream”… if you mean “general purpose web server” I think you’ll struggle to find all that many people who yet advocate Erlang as their primary platform of choice, for many reasons ; I think for the forseeable future we can very safely safely leave that crown to the established players such as Apache, etc. There ARE some folks who are using Erlang based web servers, but they generally have specific reasons for doing so, and fall outside of mainstream use. Erlang is definitely a specialized technology. On the other hand, if by “mainstream” you mean “suitable for use in highly available large scale systems” then I think you’ll find that Ericsson (AXD301 switch), Facebook (Chat), and Amazon (SimpleDB) are some obvious examples where it is indeed suitable.

      • admin says:

        You are here simply because you are trying to prove me wrong, and even facts and logic will not change your mind.

        I know that the speed/instruction of Python is slower than Erlang. Its obvious that adding more Python will cause it to converge to Erlang’s speed.

        What is not obvious, is that the overhead of a bare server in Python is competitive or even better than Erlang. There’s plenty of computation you can add into the Python server before it slows down to the level of Erlang. Maybe turn it into a simple K/V store like memcached. I am willing to bet money that it still won’t slow down to the same speed as Erlang.

        Why not just implement a version in Python that actually does a fork ?

        I did. I put a single fork statement and it increased the number of requests/sec by 2000. This doesn’t really doesn’t change anything because of the fact I am about to tell you again:

        Erlang in this benchmark, is not any more concurrent than Python. It only does a context switch after 2000 function calls. I don’t know how all of you supposed Erlang experts missed this piece of crucial knowledge. It only proves my point that Erlang is overhyped and full of people willing to defend it without knowing anything about it.

        • Tim says:

          I’m more than willing to be persuaded : facts and logic WILL change my mind, but you need to present them. Just post the code with benchmarks. Just saying “I did it” does not cut the mustard. So : no benchmarks, as predicted, essentially waffle, as predicted. You can’t do it, you failed. QED. Goodbye.

          • admin says:

            No you are not willing to be persuaded. If I came up with another benchmark, you would just find another excuse. If I bowed down to the wishes of every Internet troll, I’d never have time to do any actual programming.

            Let me give you a hint since programming that is so hard for you. Simply copy and paste this line at an appropriate place: import os;os.fork()

            And what a surprise, you have nothing to say to the fact that Erlang doesn’t context switch if a thread calls less than 2000 functions (which means it is just as concurrent as Python here). This piece of knowledge debunks your whole argument.

        • Tim says:

          OK last point (I promise !) Maybe I should be helping you rather than just criticizing. Here, I did a simple test : replace the sleep with a loop that counts to 100,000 on each request. Timings ?

          Erlang :
          Total: connections 1000 requests 1000 replies 1000 test-duration 3.765 s
          Python :
          Total: connections 1000 requests 1000 replies 827 test-duration 14.506 s
          Errors: total 173 client-timo 0 socket-timo 0 connrefused 0 connreset 173

          Now that is hardly scientific, but it’s not “plenty of computation” and it’s already slower, and failing too. I don’t think it’s possible to do a truly scientific test, as various parts of each language are going to be faster or slower than the equivalent in the other language. So you might be able to find cases where Python is faster.

          So take that as a starting point and see if you can get Python to run faster.

          • admin says:

            The Python benchmark had connreset errors shows that you didn’t apply the kernel tweaks. It is quite obvious that you ran Erlang first, and reached the 1024 socket limit, and with some of those connections still attached to Erlang, you ran the Python. Here is the result with your busy loop, done properly:

            Total: connections 1000 requests 1000 replies 1000 test-duration 2.731 s
            Errors: total 0 client-timo 0 socket-timo 0 connrefused 0 connreset 0
            Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0

            By now it is blatantly obviously that you have a superficial understanding of Erlang, let alone basic networking principles. I am wasting my time trying to teach you, when you think you are trying to teach me.

            And again, you have no comment on the fact that Erlang is running non-concurrently in this benchmark.

          • Tim says:

            Sticks and stones may break my bones, but names will never hurt me. Hey, I know far from everything, and I’m always willing to learn (and still hoping I can learn something off this page), but on the other hand I’ve been doing network programming and testing for 20 years in a professional capacity, so I suspect my knowledge is a little more than superficial, especially when it comes to testing the things that actually matter.

            OK, so applied the tweaks, that gets rid of the reset errors. So back to debunking Erlang performance, which is, I believe, your thesis : the Python version still takes around 13 seconds to serve the 1000 requests (this is where we count to 100,000 on each request), whereas the Erlang version takes a tenth of that time.

            If I bump the httperf requests up to 10,000 then Python takes 130 seconds, again versus about a tenth of that for Erlang, so there’s a consistent pattern. Now, of course this counting test is entirely arbitrary, but then you test of doing nothing at all is equally arbitrary. But at least mine acknowledges that real world servers that you are attempting to “debunk” do more work than nothing at all. And like I’ve said before, you may be able to find scenarios where Python can handle whatever the work is faster than Erlang can.

            There is no absolute and general truth that you can apply to network servers : what’s best depends on your needs, your software, your hardware, your architecture, the loads you subject it to, your error handling mechanisms, and you error and failure rates.

            What this boils down to is you underestimating how fast your Python I/O is and overestimating how fast it is at doing non-I/O stuff : the networking stuff is fast (because it’s essentially the underlying OS doing the work for you, not Python itself), the rest is not so fast. Using your original Python code, on my machine I can go all the way to 29,000 connections, no problem. So yes, the speed you rip the connection attempts off the network is nice and fast, perhaps even impressive…. but as soon as you try to actually do any work at that high rate and keep up then Python falls hopelessly behind. So your point about it serving dumb requests is valid – but your point about saying it beats Erlang for real servers is not yet proven (at least not with this type of rudimentary test), so I can’t see how you can legitimately claim to have debunked anything.

            I’ll worry about “Erlang running non-concurrently” when you manage to show Python running faster (*with* evidence !). In my job, if someone came to me and said “I have technology X that can go super-fast”, I ask for a demo of it doing real work, not just a skeleton that showed it had the potential to maybe do work – which is basically all you have done. I’d then ask for cold hard figures, with the source code. And I’d then independently verify the test results. So let’s rewind, once more : just show me some Python code doing some actual work rather than just “pong”, with benchmarks, that show it being faster. You are yet to do that.

          • admin says:

            No offense, but you are obviously not very good at your job if I had to tell you that a connreset error is indicative of hitting the default Linux socket limit of 1024.These are basic tweaks for any production server.

            I am quite certain that you also made a mistake in you benchmark since I have a result of 2.7 seconds. I have timed the loop to run 100,000 empty iterations on my machine and it is less than 3 ms. Httperf confirms that the requests take around 3 ms each.

            For you to claim that it expands to 13ms is simply ridiculous and unbelievable.

            when you manage to show Python running faster

            I did and you refuse to acknowledge the evidence. You keep complaining about the benchmark not doing exactly what you would use in your imaginary business environment. Go ahead and change it to whatever you like. I don’t have time to bend to the wishes of every heckler on the Internet because it doesn’t suit his specific use-case. I already wasted enough of my time teaching you to increase the socket limit on your machine.

            You also keep saying that Erlang is doing things “concurrently” in this benchmark when it isn’t any more concurrent than Python in this benchmark. You keep dodging this point because you really don’t have a suitable retort.

          • Tim says:

            Actually that is offensive, but you’ve been rude to several people before on this blog, so it wasn’t unexpected. You can resort to name calling and not backing up your claims with documentation, but that just goes to show you are acting like an amateur, not I.

            So using this at the head of the onOutput(self),
            for x in range(100000):
            pass
            I get an average of about 9.5 seconds for Python. Let me know what you are doing and I will test with that and see if I can recreate your figures. Maybe the “pass” is making Python stall unnecessarily ? Like I said earlier, Python isn’t my forte.

            I’ll worry about Erlang concurrency when you’ve shown me it even needs looking at. So far all you’ve done is FAIL to show me that Python is faster, which is the core of your claims.

            If you really can’t post any code and numbers then there’s no point having any more discussion.

          • admin says:

            I have been rude, but for good reason. You start off your comments by characterizing me as “naivete”, “lame”, or “troll”. Is this not amateur name calling?

            You expect me to put up with this crap while being told I am wrong from people who don’t even know how to tweak a kernel or how multitasking works in Erlang? I am not afraid to call people out on their bullshit especially when it is readily apparent they don’t even know what they are talking about.

            And stop dodging the question about Erlang concurrency. You and many others used the same argument, that Erlang was doing things concurrently, that’s why it was slower than Python. The burden of proof is on you to show that Erlang can be faster. Not the other way around, since I already presented proof that it is slower in this case. I have presented 10x more evidence than people like you who have done exactly what you complain about: all talk and no testing.

            For the Python program, first you need to use xrange instead of range. Range creates a list of N integers in memory. Second, you need to put the loop in Server.onInput, because Client.onOutput may be called multiple times per request. That is why your test is skewed.

          • Tim says:

            OK, I’ll fess up to the trolling/lame stuff, that was born out of frustration, I apologize for that.

            But now we are talking and actually showing code, we have real progress…

            I moved the range, and made it xrange. That makes a big improvement.

            The Erlang I added is a simple tail-recursive loop:

            do_work(0) ->
            ok;
            do_work(N) ->
            do_more_work(N – 1).

            and “ok = do_more_work(100000)” in the tcp receive.

            Here’s the numbers I get, including SMP mode (“erl +sbt db”.), for a run of 10,000.

            Python: no errors, 10.9 seconds, 1.1ms/req
            Erlang: no errors, 13.6 seconds, 1.4ms/req
            Erlang SMP: no errors, 11.04 seconds, 1.1ms/req

            So on that simple test your 20% argument holds up. With SMP enabled Erlang goes basically the same speed. Results of course will vary with more complicated work, and it would be interesting to see what Erlang can do with more cores.

            If we simulate more work by tweaking the code to count to a million, for a run of 1,000 only I see… (and I’m not specific claims about Python or Erlang here, just illustrating a point)
            Python : 46 seconds
            Erl/1 : 13.5 seconds
            Erl/2 : 9.5 seconds

            So what does all this show ? Firstly, it shows there is some value to your 20% claim. Python is fast at ripping the stuff off the net, and for a light workload it holds up fine. Secondly, it shows my point about application logic is also true : the network I/O is an important part of the equation, but it’s just that, only a part. Having a benchmark that just measures I/O and then claims that the technology is superior is not telling the full story. Lastly, it shows that there’s no single truth. Which is faster, Erlang, Python or some other technology ? It depends on what you’re doing. The axiom of “right tool for the right job” holds true.

            That’s all I needed to see : real results.

          • admin says:

            It was born out of frustration that it was the first thing you had to say before we even had a conversation?

            Anyway, of course the benchmark holds up. Not even the people above are arguing that the timing is wrong. And as for running a loop of a million, it is obvious that python becomes slower.

            As I’ve said, Python has a slower speed/instruction. If you were running that many loops, you would be implementing it in C which would make it faster than Erlang.

          • Tim says:

            No, out of frustration to your replies to previous posters. People were not arguing your figures, they were making the same point as me, i.e. that the benchmark only really means anything if the webserver does some work, and you were not addressing their concerns. Again, I apologize for jumping in with the fanboy comments, that was uncalled for, wrong of me, and unnecessary.

            I can summarise what some of the other guys said :

            “k.pierre.k” made the point about doing work, early on, and you responded by saying that SMP made things worse (under no load). (And we’ve now seen my tests show that under high load SMP does of course make things better.)

            Ulf Wiger nailed it when he said “… python program fares best because it’s picking off connections in a tight loop.”, and we now see that’s true : make the code do some work, and loop isn’t so tight, and Python can’t keep up. In the same paragraph Ulf says that disabling SMP would help in this special case, which apparently led to your confusion and repeated challenges to run Erlang in SMP mode…. which when I finally did do under load proved that the SMP was actually faster.

            I think this whole mess is best summarized by the last 3 paragraphs of Ulf’s last comment. With the benefit of the numbers we saw out of my last test where we put high load on it and saw SMP Erlang beat Python by a wide margin, go back and read what Ulf said and maybe it will seem more pertinent now.

            In various postings you also make points about Erlang C modules and Python C modules. In those cases we’re hardly talking about Erlang versus Python servers are we ? We’d be talking about some kind of hybrid, and that is clearly not what the title of this blog mentions. Erlang “hype” is hype about servers actually written in Erlang. Trust me, when I’m writing an Erlang server, about the last thing I want to do is to drop down to C ; it’s less reliable, far more laborious to code in, harder to maintain, and it’s a pain to do all the ports work. Sure, it might be faster under some circumstances, but in terms of overall cost and ability to do work, that just doesn’t come into the cost-benefit analysis. As Ulf pointed out, all it needs to be is “fast enough”.

            And we haven’t even touched on reliability. I made the point way back that if you simulate a bug that causes a hard crash once in a while then there are severe implications for something like Python, whereas something like Erlang would cope with it much better. These bugs do happen in real life, which is why you are ill advised to run all your communications channels on a single point of failure : you get an unhandled exception and one dies then they all die. That just won’t fly in applications that demand high availability and reliability (sometimes with contractual and even legal ramifications), and those problem domains are where Erlang excels.

            To sum up : we’ve seen your numbers hold up to scrutiny, but only in the special case of a server handling light load on a single core. When we look at moderate load then you could go either way. When we look at heavy load on multicore CPUs your hypothesis fails. So that doesn’t definitively debunk Erlang performance does it ? Well anyway, I’ll sign off. Thanks for sticking with me through this. I learned something, I hope you did too.

          • admin says:

            I still stand by my benchmark showing that SMP Erlang does not make much of a difference.

            Even if we accept the marginal improvements shown in your benchmark, it is still approximately equivalent (assuming best case scenario for Erlang that you are running 2 cpus) to the 16% improvement from simply putting 1 fork statement in the python server. With all the mistakes you’ve made so far, I would not be surprised if there was another problem in your benchmark. Therefore the argument about SMP is invalid.

            What you keep repeating now is that Erlang is faster in terms of speed/average instruction. Everyone already knew this. No one is arguing that this isn’t the case. This is a very useless metric since you might as well pick one of the 14 languages that are faster that doesn’t come with horrible syntax, SSA, and poor documentation.

            http://shootout.alioth.debian.org/u64q/which-programming-languages-are-fastest.php

            Erlang is at the bottom of the list right there with Python. If you are choosing to sacrifice programming ease for speed, you might as well write a part of your server in C instead of Erlang. More importantly however is that this does not prove my article invalid: that it is possible to write a faster Python server than Erlang.

            Ulf wasn’t saying the obvious in that Python is slower in speed/instruction. He was arguing that Erlang operates concurrently, which is why it has a penalty. Even if this was true (its not in this benchmark because Erlang doesn’t context switch under 2000 function calls), it doesn’t disprove the article debunking Erlang being automatically better for every web server.

            And about exception handling, it’s easy enough to put one at the base of your handling loop and it will never crash. There are servers written in C that are as stable as Erlang. This is another over-hyped part of Erlang.

  18. AdamJ says:

    The people complaining here are idiots. I was astounded by how vocal and obnoxious they are when it was clear they didn’t know basics like tuning a Linux server EVEN WHEN THE INSTRUCTIONS ARE WRITTEN ON THE SAME PAGE.

    But it just goes to show that most people on the Internet are stupid and don’t know how to read.

    “Erlang is fast enough?” I could say the same thing about Python, except in this case, it is faster than Erlang.

    • Edwin Fine says:

      >>The people complaining here are idiots.
      One of the so-called idiots posting here was Ulf Wiger, the Chief Designer of the well-known Ericsson AXD-301 switch, comprising 2+ million lines of Erlang code and some C code.

      What have YOU done lately to equal or surpass this “idiot’s” achievements and experience?

      • AdamJ says:

        I was the “Chief Designer” for a system made for an intelligence agency which consisted of 3+ million lines of Cobol. See how pointless this type of argument is?

        I’ve never heard of the AXD-301, and I doubt that it is much different from it’s non-Erlang counterparts. So what if he wrote it in Erlang? This is like someone caring if I wrote a bloated piece of software in Fortran.

        It doesn’t matter if he wrote a billion lines of Erlang for one product in the only company that had access to said language (not much competition). The fact that he can’t read, or doesn’t know basic Linux is evident.

    • ahahaha, AdamJ == admin!

  19. Terry says:

    Although you cannot dispute the admin’s benchmark, Benchmarks are pointless except for the specific example.

    So you’ve proved that Python is faster in one circumstance congrats.
    Do you consider your example the same as what ever server does?
    If you do then your are wrong. I know most servers in this world do not go “Pong” when they are interacting.

    Another point to be made is that Programming Languages are neither slow or fast. The implementation of a language makes it slow or fast.

    You can find countless Benchmarks where Java is faster than C++ and visa versa. what do they prove? not a lot.

    Reading the arguments have proven that you either lack experience or are thick as a rock.

    1 benchmark does not prove a general theory. You would need to do countless benchmarks to prove that one is better then the other and even then you would only prove that they are better in those tests and nothing else.

    Language all have strong suits and you have proven 1 and only 1 case where Haskell and Erlang are slower. Good for you!

    I love Python so you can’t claim I’m just a stupid Erlang lover. You keep arguing you article is valid and it is for the specific instance created by yourself but that is as good as saying I’m amazing! *here is some example* Can you prove me wrong? probably not.

    • admin says:

      There’s one big problem with your argument. Erlang was designed from the ground up with on thing in mind: for Ericsson’s telecom and internet needs. This is different from C++ and Python. Erlang, being an older and less dynamic language is supposed to be faster, not slower when benchmarking a minimal server.

      So you’ve proved that Python is faster in one circumstance congrats.

      I’ve proved Python is faster with a 20% margin. This is quite significant. And it is in a situation where Erlang is supposed to shine, regardless of merely being a “Pong” server.

      Another point to be made is that Programming Languages are neither slow or fast. The implementation of a language makes it slow or fast.

      This is a very naive point of view that is often repeated by people who have never designed a language. Languages have a level of abstraction. Those that are closest to modern computers or the von Neumann architecture have the potential to be faster than those that are more abstracted (dynamic/functional languages). You can always write a slower lower level language on purpose, but the fastest high level language will never beat the fastest lower level language.

      Since you’ve already made up your mind that I am wrong and in a rather impolite manner, here is an article by someone else explaining the same thing. http://andresosinski.com.ar/blog_view_entry/?id=2

      In conclusion, programming languages do have a speed. That is why people don’t write operating systems in Python or Erlang. If the difference doesn’t matter to you, good for you. It doesn’t change the validity of this benchmark.

      • Terry says:

        I never said you benchmark was invalid.
        The point you totally missed / ignored is that you’ve tested 1 test and put a blanket statement on it. Look at a proper benchmarking site http://www.phoronix.com/

        They do lots of benchmarks and then show the difference before making any claims(if any).

        Believe what you want about Language speed vs implementation. Another blogger’s post proves nothing, but this is all beside the point anyways.(If you look at 1 of the original links in this article he reverts his opinion on Erlang vs Python.)

        Obviously you are not a stupid person. You hold a strong opinion and you should state it as such instead of ignoring criticism and restating that your are correct.

        Unfortunately all I’ve learnt from this article is Erlang is not a silver bullet. Which I already knew.

        • admin says:

          Alright let me try to explain this very simply so there is no more equivocation.

          Popular Mistaken Hype: Erlang is hands down the best choice for a fast web server.

          What this benchmark shows: Erlang’s web serving implementation built into the VM is slower than one written in Python.

          What critics wrongfully think I am saying: Erlang is unilaterally slower than Python for all web servers.

          • Pedram says:

            I didn’t really want to comment as I deemed you to be a bit unfriendly to cogent thought. I just wanted to say that Tim did establish that Python doesn’t stand on it’s own and beat Erlang as a high performance web server… By stand on it’s own I mean you aren’t
            really testing Python you’re testing epoll and its’ C bindings…

            You’ve seriously debunked nothing, in fact you’ve proved the opposite, which is even though in many cases Erlang may prove slower it is undoubtedly more mature for real-world concurrency.

            Not to get off topic, but look at Apache Cassandra, it powers Facebook, it’s slow by some standards but it scales linearly…

            If I cut off that sentence after the words “some standards”

            You would completely of missed the point, just like you have in this article.

            The only places where I agree with you is, in the kernel tuning and
            that there is a lack of repeatable benchmarks..

            As for this article I think you pointed out something of unimportance in complicated real-time or high concurrency systems.

            You’ve also shown how all your assumptions fail in which you’ve based this article when you start to veer off the code that accesses epoll and socket C bindings and start to actually use Python…

            You aren’t the first to ignore the big picture and become overly delighted in a trivial feature of a specific language…

          • admin says:

            @Pedram.

            I never claimed Erlang was slower overall. Why would I say that when the Great Programming Language Shootout has already done the work? I really don’t understand why you and so many other people here have problems with reading comprehension.

            The point I made, and will continue to make, is that Erlang’s core implementation is slower than Python. This is surprising given the fact that Erlang was built from the ground-up for fast servers and network switches. This is like finding out Haskell performs better than C for building operating systems or device drivers.

            And why should I be friendly? I have not replied in an unfriendly tone to anyone here who wasn’t unfriendly first. This is my website, and none of you have made any convincing arguments. You all keep repeat the same arguments that I have already addressed:

            1. The concurrency argument is invalid because the VM doesn’t even do a context switch.

            2. This benchmark is about server core, and not any other imaginary application you have tacked on.

  20. James says:

    I am appalled at the lack of intelligence and civility being shown by the Erlang supporters.

    Since I share the same sentiments as AdamJ, I will refrain from repeating what he just though it will probably overlooked in the sea of ignorant comments.

    When I am looking for an alternative to C to write my server in, I do not care whether this benchmark is testing binding speed. I am not going to bother writing my own bindings nor am I going to edit the VM to support what I need. This benchmark is absolutely valid in what it claims.

    I would suggest anyone else that feels the urge to complain to try to improve the Erlang VM instead.

    • TJB says:

      I agree. For instance the Javascript regex implementation on V8 is faster than C++. You would have to be willfully stupid to not use V8 for regex heavy code if you were familiar with Javascript already.

      With regards to this page, I’m willing to bet most people are more familiar with C++ and Python than Erlang.

      There are simply too many projects that have switched to Erlang for some phantom performance boost for their server.

  21. JK says:

    Python seems to win this micro benchmark but I would never use python for anything that needs to scale. The reason is the GIL and other issues with the python interpreter.

    • Eric says:

      So, I think one big misunderstanding about this benchmark is that it is not meant to prove scalability. Python can and does scale perfectly well. It doesn’t do so in a vacuum though. We use Python at my work and it performs really well, but the reality is there is a whole host of reasons it performs reasonably. MongoDB is one aspect that has helped performance along side our load balancing proxy/caching server. We sharded our data and systems along reasonable facets to get better speed. Our application was designed in a relatively stateless fashion to make spinning up more nodes simple. In short, “Python” didn’t scale, but was used in system that scaled.

      I think this is the same point that other people are claiming when they say the benchmark doesn’t prove anything. Erlang and Haskell both *can* scale, but that is not the point of the benchmark. It is simply to prove Erlang and Haskell are not the panacea of network programming that some people have made it out to be. Again, I’m not saying all bloggers who have promoted either language are drinking some sort of kool aid. It is just that there are people (obviously based on the response to this post) over zealous regarding both languages and their usefulness.

      Saying something like “I would never use python for anything that needs to scale” is simply ridiculous. In fact it is just as ridiculous as all the folks arguing the benchmark is invalid (and doing so rather rudely). It would be ridiculous to say Haskell or Erlang can’t scale because some benchmark is I/O bound. It really blows my mind there has been such a negative response to this kind of article and sincerely makes wonder if other criticisms of the Haskell and Erlang communities have more merit than I have given them in the past. I don’t plan on letting one stream of comments ruin my perspective, but it seems clear there is some fanaticism out there that doesn’t appeal to me.

      As a Python user, I’ve seen plenty of benchmarks where Python is slower. Even within Python web frameworks/tools/servers, my tool of choice has proven to be slower. While this might be the case, I’ve stuck with my decisions because the speed issues were easy to work around. I didn’t feel a need to comment regarding the incorrectness of the benchmark (unless it was to point out reasonable mistakes). I understood the benchmark was limited and in my use case, the benchmark didn’t apply. A great example of this is the Async vs. Threaded servers. Yes async servers can handle tons of connections. But they make easily using the standard library near impossible. If my application needs to support tons of connections per process and I can implement any other aspects I need, then sure, I’d go with (and have in the past) an async library. If it doesn’t matter, I’m perfectly fine using a simple Python server and the standard library to get things done.

      In short there is really nothing to get over zealous about.

      Now, as for Emacs! That is something I’ll get zealous about ;)

  22. Jonas says:

    You’re funny.

    To show that erlang scales worse on multicore than python, you show that not creating a bunch of processes is faster than doing it! Then you type some absolutely incoherent stuff about python objects and their state as if anyone still doubted your knowledge level in these matters.

    Very insightful.

    And to top it off we get to see a raving “I’m never wrong” lunatic on the Internet. Cheers!

  23. HNreader says:

    I find nothing wrong with this benchmark. Macro benchmarks are made up of Micro benchmarks.

    What is a real-life “work-load” that will satisfy every critic here? There is none.

    My advice to the author: Don’t let these buffoons discourage you. Their rampant name-calling shows they don’t have any valid arguments.

  24. Lawrence says:

    This article tells you everything you need to know about Erlang and their piss poor community.

    Hype, hype, hype, and hype. Poor documentation. Poor debugger. What you have are self-promoting smucks trying to charge you 200 euros for consulting because they are a hand-full of people desperate enough to be an early adopter.

    The other 90% of the know-it-alls here have never used Erlang commercially. So fuck off wankers.

    • anonifun says:

      “What you have are self-promoting smucks trying to charge you 200 euros for consulting because they are a hand-full of people desperate enough to be an early adopter.”

      Wow. Just wow. You just don’t get it. “Desperate enough to be an early adopter?” You know, I just read a wall-full of ignorance and trolling and I didn’t care to respond, but this phrase I cannot in conscience bear silently.

      Did you never learn to play as a child? Do you program because that’s the only way you feed yourself? Did you ever try learning a totally exotic language or cooking food not native to your culture? Maybe people want to try programming in something new for a change? Perhaps some people even want to try making money while at it! O ye orthodox gods, the agony!

      Desperate enough to be an early adopter – ha! Who painstakingly switches to learning a completely different paradigm of programming than the norm when financially desperate? Does that even make sense? If you don’t like the implementation fix it! Write some docs, help test out the debugger. If you don’t like the community, get some friends and code in the language! Who told you that you deserve free production-quality software implementations handmade for your taste?
      Allow the rest of us to try and make our field a little bit more fun and less wretched code hauling.

      Also, a slightly technical point: What do you mean by ‘early adopter’? Erlang was open-sourced in ’98. It’s been about 11-12 years since. Do you normally wait more than 10 years before trying to code in a language? Do you mean an early adopter in your field of interest?

      The admin’s proved his point, namely that the Erlang/Haskell main implementations are not as good at what it’s hyped up to do by a bunch of fresh language go-ers. That’s fine, this event happens with every language taken outside its first domain. I’d chide the admin for showing the obvious, but hey, there’s no point in this case.

  25. Cmoh says:

    Hey. I’ve ran your benchmark, using your code for erlang and for python on my own computer. So after starting up the servers and running httperf for a few times I ran into the errors related to system limits (ulimit, sysctl, FD_SET_SIZE etc.) Leaving the servers running I restarted httperf after recompiling and setting all the ulimits, running the sysctl -w commands your recommended, but (my mistake), the ulimit was valid only for httperf!

    I didn’t manage to cap any of the servers with –rate=5000, 10000, 20000, 40000 or 80000 (with num-conn adjusted accordingly). All reported back in ~8 seconds with all requests services. So I turned directly to the numbers –num-conn=32000000 and –rate=400000.

    With this setup erlang responded to all requests in 76.8 seconds (yeah, I miscounted a zero), while httperf failed to perform the test with the python server (connection failed with error 99 – meaning EADDRNOTAVAIL, meaning all 32767 ports available for the client were in use). See more in the httperf man page.

    Next looking on the erlang results I noticed httperf reported that erlang maintained a maximum of 2046 simultaneous connections, which reminded me I have started the servers BEFORE setting the ulimit etc. I rebooted and started the servers using the 65535 ulimit settings, the sysctl settings everything.

    After this change the python server yielded error 99 at around –rate=15000, while the erlang server around –rate=6000, which are consistent somewhat with your own data.

    Before moving on, please note that the above are merely facts, not assumptions or opinions.

    Now my own conclusions:

    1. throwing big numbers into system settings does not guarantee increased performance. Both the python and erlang servers were unable to jump over 16000 requests per second with ulimit -n = 65535, while both were able to service 80000 requests per second with ulimit -n = 2048. I believe each of the systems can be maxed out by graphing the max requests per second vs. ulimit -n (and perhaps the other net.core.blah settings you mentioned)
    2. the benchmarked code involves a socket server replying a predefined text (which simulates a HTTP reply) and closing the connection. The tuning I recommend in (1) should only be regarded as valid for systems of this complexity. As an example, a system opening a disk file on each request would need (perhaps) a file descriptor limit twice as big. Better would be to tune the actual system under expected usage patterns.
    3. tuning servers is a time-consuming task, but worthwhile. It is now clear to me a lot of the software technology on the market misuses the hardware resources available, by lack of proper tuning.

    I do not agree with your conclusion, admin, that your benchmarks are a solid base for debunking the erlang hype. However, I learned a lot while reading this thread and performing my own tests.

    Thanks for the opportunity.

    P.S. Notes on my system:
    uname -a
    Linux pink 2.6.30-gentoo-r5 #9 SMP PREEMPT Sat Jun 19 17:29:50 EEST 2010 x86_64 Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz GenuineIntel GNU/Linux

    I have 8GB of RAM and hyperthreading enabled, leading to 8 logical processors.

    Erlang R13B04 (erts-5.7.5) [source] [64-bit] [smp:8:8] [rq:8] [async-threads:0] [hipe] [kernel-poll:false]

    (not sure why kernel-poll is set to false. It is still compiled in but…)

    Python 2.6.5 (release26-maint, Jun 19 2010, 03:49:47)
    [GCC 4.4.3] on linux2

    P.P.S on my system I see the default is:
    cat /proc/sys/fs/file-max
    752300

    Might be worth checking before echo-ing 128000 into that.

Trackbacks/Pingbacks

    1. Playing with the new Haskell epoll event library « Control.Monad.Writer
    2. Debunking the Erlang and Haskell hype for servers – Codexon « Netcrema – creme de la social news via digg + delicious + stumpleupon + reddit

    RSS feed for comments on this post, TrackBack URI

    Leave a Comment