C &C++ &Code &Java &Perl &PHP &Python 17 Mar 2010 21:13:56

Language Comparison: Find Longest Line

The task: Write a portable tool that takes a file name as first argument, and from that file outputs the number of lines, the longest line, the length of the longest line, and the line number of the longest line, in a binary-safe fashion (meaning, \n is the only reason a line should be delimited; embedded \0s may occur).

My sinister side goal is showing how complex the C code has to be in order to get performance near C++. It is possible to write a simpler version in C that is 20% faster for /usr/share/dict/words, but it is then 4x slower for the random files. It would also be possible to write a faster non-portable version using memory mapping, but can’t really call that simple. C++ has both readability and speed. If I missed some obvious portable optimization in the C code, let me know…

The contestants are:
[table "12" seems to be empty /]

  • Callgrind Ticks are captured via valgrind --tool=callgrind --compress-strings=no --compress-pos=no --collect-jumps=yes --collect-systime=yes.
  • Number of allocations and bytes allocated via valgrind.
  • Random files are made with dd if=/dev/urandom of=randomfile bs=10M count=10.

Running the tools on /usr/share/dict/words yields:

ToolRun TimesCallgrind TicksAllocationsBytes Allocated
C0.040 user, 0.005 sys186545376 ticks9 allocs808 bytes
C++0.045 user, 0.003 sys174583115 ticks10 allocs9140 bytes
PHP0.394 user, 0.014 sys1862715779 ticks26616 allocs3577469 bytes
Perl0.188 user, 0.008 sys975388038 ticks5478 allocs596786 bytes
Python0.259 user, 0.006 sys1154764602 ticks2780 allocs7501695 bytes
C#0.173 user, 0.012 sys8556327874 ticks11640 allocs20428039 bytes

Running the tools on a pre-generated 100mb random file yields:

ToolRun TimesCallgrind TicksAllocationsBytes Allocated
C0.095 user, 0.046 sys381132062 ticks21 allocs16936 bytes
C++0.092 user, 0.049 sys366872893 ticks16 allocs35460 bytes
PHP0.412 user, 0.055 sys1830647006 ticks26617 allocs3577494 bytes
Perl0.352 user, 0.044 sys1861345814 ticks5498 allocs662696 bytes
Python0.312 user, 0.057 sys1287075330 ticks188052 allocs197365544 bytes

Running the tools on a pre-generated 500mb random file yields:

ToolRun TimesCallgrind TicksAllocationsBytes Allocated
C0.469 user, 0.247 sys1904131492 ticks21 allocs16936 bytes
C++0.467 user, 0.252 sys1827557566 ticks14 allocs29675 bytes
PHP2.084 user, 0.184 sys9024389200 ticks26617 allocs3577494 bytes
Perl1.623 user, 0.271 sys9262805274 ticks5499 allocs684029 bytes
Python1.605 user, 0.246 sys6357906786 ticks932498 allocs976961973 bytes

2 Responses to “Language Comparison: Find Longest Line”

  1. on 18 Mar 2010 at 10:03:37 1.dominik said …

    would be nice to include java in the test as well.
    thx for publishing such interesting tests!

  2. on 25 Apr 2010 at 11:08:37 2.Tino Didriksen said …

    Well, provide some Java source and I’ll add it…same goes for any other language that Fedora 10 has a package for.

Subscribe to the comments through RSS Feed

Leave a Reply

To prove you're a person (not a spam script), type the security word shown in the picture. Click on the picture to hear an audio file of the word.
Anti-spam image