I got very little speed-up on pypy from being in a function (0.043 vs 0.033), but I got a 1.8X speed-up for py2 vs py3 (0.7686/.4322 = 1.78). (All programs emit 4999874750 as expected.)
I recall when Python3 was first going through its growing pains in the mid noughties that promises of eventually clawing back performance. This clawback now seems to have been a fantasy (or perhaps they have just piled on so many new features that they clawed back and then regressed?).
Anyway, Nim is even faster than PyPy and uses less memory than either version of CPython:
doIt.nim:
proc doIt() =
var i = 10000000
var r = 0
while i > 0:
i = i - 1
r += min(i, 500)
echo r
doIt()
#TM 0.024735 wall 0.024743 usr 0.000000 sys 100.0% 1.504 mxRM
In terms of "how Python-like is Nim", all I did was change `def` to `proc` and add the 2 `var`s and change `print` -> `echo`. { EDIT: though if you love Py print(), there is always https://github.com/c-blake/cligen/blob/master/cligen/print.n... or some other roll-your-own idea. Then instead of py2/py3 print x,y vs print(x,y) you can actually do either one in Nim since its call syntax is so flexible. }
It is perhaps noteworthy that, if the values are realized in some kind of array rather than generated by a loop, that modern CPUs can do this kind of calculation well with SIMD and compilers like gcc can even recognize the constructs and auto-vectorize. Of course, needing to load data costs memory bandwidth which may not be great compared to SIMD instruction throughput at scales past 80 MiB as in this problem.
> very little speed-up on pypy from being in a function
I believe PyPy is clever and checks for new or removed elements in module and builtin scope. If they haven't changed then lookup can re-use previously resolved information.
> clawing back performance
On my laptop, appropriately instrumented, I see Python 3.12 is faster than 2.7. (Note that they use different versions of clang):
Good data. Thanks. I got a bit of a speed-up on python-3.12.5 - 0.599 down from 0.76863, but still slower than 2.7 on the same (old) i7-6700k CPU. { EDIT: All versions on both CPUs compiled on gcc-13 -O3, FWIW. }
On a different CPU i7-1370P AlderLake P-core (via Linux taskset) I got a 1.5X time ratio (py3.12.5 being 0.371 and py2.7 0.245). Anyway, I should have qualified that it's likely the kind of thing that varies across CPUs. Maybe you are on AMD. And no, I sure don't have access to the CPUs I had back in 2007 anymore. :-) So, there is no danger of this being a very scientific perf shade toss. ;-)
Same pypy3 also showed very minor speed-up for being inside a function. So, I think you are probably right about PyPy's optimization there and maybe it does just vary across PyPy versions. Not sure how @mg several levels up got his big 2X boost, but down at the 10s of usec to 10s of ms scale, just fluctuations in OS scheduler or P-core/E-core kinds of things can create confusion which is part of the motivation for aforementioned
https://github.com/c-blake/bu/blob/main/doc/tim.md.
I should perhaps also have mentioned https://github.com/yglukhov/nimpy as a way to write extensions in Nim. Kind of like Cython/SWIG like stuff, but for Nim.
EDIT: This uses a Nim program https://github.com/c-blake/bu/blob/main/doc/ru.md run as `ru -t`, but for the very fast variants you can get a more precise wall time from https://github.com/c-blake/bu/blob/main/doc/tim.md
I recall when Python3 was first going through its growing pains in the mid noughties that promises of eventually clawing back performance. This clawback now seems to have been a fantasy (or perhaps they have just piled on so many new features that they clawed back and then regressed?).Anyway, Nim is even faster than PyPy and uses less memory than either version of CPython:
doIt.nim:
In terms of "how Python-like is Nim", all I did was change `def` to `proc` and add the 2 `var`s and change `print` -> `echo`. { EDIT: though if you love Py print(), there is always https://github.com/c-blake/cligen/blob/master/cligen/print.n... or some other roll-your-own idea. Then instead of py2/py3 print x,y vs print(x,y) you can actually do either one in Nim since its call syntax is so flexible. }It is perhaps noteworthy that, if the values are realized in some kind of array rather than generated by a loop, that modern CPUs can do this kind of calculation well with SIMD and compilers like gcc can even recognize the constructs and auto-vectorize. Of course, needing to load data costs memory bandwidth which may not be great compared to SIMD instruction throughput at scales past 80 MiB as in this problem.