C &C++ &Code &Java &Perl &PHP &Python 17 Mar 2010 21:13:56
Language Comparison: Find Longest Line
The task: Write a portable tool that takes a file name as first argument, and from that file outputs the number of lines, the longest line, the length of the longest line, and the line number of the longest line, in a binary-safe fashion (meaning, \n is the only reason a line should be delimited; embedded \0s may occur).
My sinister side goal is showing how complex the C code has to be in order to get performance near C++. It is possible to write a simpler version in C that is 20% faster for /usr/share/dict/words, but it is then 4x slower for the random files. It would also be possible to write a faster non-portable version using memory mapping, but can’t really call that simple. C++ has both readability and speed. If I missed some obvious portable optimization in the C code, let me know…
The contestants are:
[table "12" seems to be empty /]
- Callgrind Ticks are captured via
valgrind --tool=callgrind --compress-strings=no --compress-pos=no --collect-jumps=yes --collect-systime=yes
. - Number of allocations and bytes allocated via
valgrind
. - Random files are made with
dd if=/dev/urandom of=randomfile bs=10M count=10
.
Running the tools on /usr/share/dict/words yields:
Tool | Run Times | Callgrind Ticks | Allocations | Bytes Allocated |
---|---|---|---|---|
C | 0.040 user, 0.005 sys | 186545376 ticks | 9 allocs | 808 bytes |
C++ | 0.045 user, 0.003 sys | 174583115 ticks | 10 allocs | 9140 bytes |
PHP | 0.394 user, 0.014 sys | 1862715779 ticks | 26616 allocs | 3577469 bytes |
Perl | 0.188 user, 0.008 sys | 975388038 ticks | 5478 allocs | 596786 bytes |
Python | 0.259 user, 0.006 sys | 1154764602 ticks | 2780 allocs | 7501695 bytes |
C# | 0.173 user, 0.012 sys | 8556327874 ticks | 11640 allocs | 20428039 bytes |
Running the tools on a pre-generated 100mb random file yields:
Tool | Run Times | Callgrind Ticks | Allocations | Bytes Allocated |
---|---|---|---|---|
C | 0.095 user, 0.046 sys | 381132062 ticks | 21 allocs | 16936 bytes |
C++ | 0.092 user, 0.049 sys | 366872893 ticks | 16 allocs | 35460 bytes |
PHP | 0.412 user, 0.055 sys | 1830647006 ticks | 26617 allocs | 3577494 bytes |
Perl | 0.352 user, 0.044 sys | 1861345814 ticks | 5498 allocs | 662696 bytes |
Python | 0.312 user, 0.057 sys | 1287075330 ticks | 188052 allocs | 197365544 bytes |
Running the tools on a pre-generated 500mb random file yields:
Tool | Run Times | Callgrind Ticks | Allocations | Bytes Allocated |
---|---|---|---|---|
C | 0.469 user, 0.247 sys | 1904131492 ticks | 21 allocs | 16936 bytes |
C++ | 0.467 user, 0.252 sys | 1827557566 ticks | 14 allocs | 29675 bytes |
PHP | 2.084 user, 0.184 sys | 9024389200 ticks | 26617 allocs | 3577494 bytes |
Perl | 1.623 user, 0.271 sys | 9262805274 ticks | 5499 allocs | 684029 bytes |
Python | 1.605 user, 0.246 sys | 6357906786 ticks | 932498 allocs | 976961973 bytes |
on 18 Mar 2010 at 10:03:37 1.dominik said …
would be nice to include java in the test as well.
thx for publishing such interesting tests!
on 25 Apr 2010 at 11:08:37 2.Tino Didriksen said …
Well, provide some Java source and I’ll add it…same goes for any other language that Fedora 10 has a package for.