The task: Write a portable tool that takes a file name as first argument, and from that file outputs the number of lines, the longest line, the length of the longest line, and the line number of the longest line, in a binary-safe fashion (meaning, \n is the only reason a line should be delimited; embedded \0s may occur).

My sinister side goal is showing how complex the C code has to be in order to get performance near C++. It is possible to write a simpler version in C that is 20% faster for /usr/share/dict/words, but it is then 4x slower for the random files. It would also be possible to write a faster non-portable version using memory mapping, but can’t really call that simple. C++ has both readability and speed. If I missed some obvious portable optimization in the C code, let me know…

The contestants are:

SourceVersion / FlagsSource LinesSource Bytes
find-longest-line.cgcc 4.3.2 -std=c99 -O377 lines2410 bytes
find-longest-line.cppg++ 4.3.2 -O333 lines964 bytes
find-longest-line.phpPHP 5.2.934 lines751 bytes
find-longest-line.plPerl 5.10.031 lines658 bytes
find-longest-line.pyPython 2.5.2 (provided by Kniht / Roger Pate)22 lines515 bytes
find-longest-line.csMono 2.2 (provided by Windcape / Claus Jørgensen)45 lines1155 bytes


Continue Reading »