In 2017 a paper was published in the Proceedings of 2017 ACM SIGPLAN called Energy Efficiency across Programming Languages
The paper compares different programming language on standardised algorithmical compute benchmarks and ranks them according to their energy efficiency.
One of the most loved and used languages today, Python, ranks very low in this paper having a 75x increased energy consumption over the low level language C.
To this day the paper is cited on many media outlets, LinkedIn, and even TikTok! It certainly shook up many people by creating a doom and gloom scenario that many digital products use one of the currently most inefficient languages.
Though many caveats exist in this claim like:
However, not close to 7 years later and many Python versions and also implementations later we thought it is due for a re-visit of the paper.
Since in the paper Python 3.6 was used we will look at newer version like Python 3.9, Python 3.12, Mojo, RustPython and also PyPy 3.10 if Python has advanced either in the CPython reference implementation of if also different interpreters might ease the inefficiency a bit.
We used the same benchmarks as in the original paper: The Computer Language Benchmark Game
Since the archived repository on Gitlab contains mutliple submitted variants we resorted to using the originally selected benchmarks by the study authors in their Github repository
The Green Metrics Tool makes it very easy to consume these benchmarks directly. We only added a container which contained the necessary Python version:
We just boot the container, execute the CLI command and let the Green Metrics Tool do it’s automated measurement magic. Find an example usage_scenario for Python 3.6 here.
During the run we are mainly looking at the CPU energy and the total machine energy. In the original paper they only looked at the CPU energy.
Pro Tip: If you do not know what the Green Metrics Tool is: It is our all-in-one open source professional software benchmarking and optimization solution. Find infos here
Language | Benchmark | vs. C [CPU Energy] | vs. C [Machine Energy] |
---|---|---|---|
Python 3.6 | binary-trees | 60x | 56x |
Python 3.6 | fannkuch-redux | 66x | 63x |
Python 3.6 | fasta | 34x | 38x |
Python 3.6 | TOTAL | 61x | 58x |
Language | Benchmark | vs. C [CPU Energy] | vs. C [Machine Energy] |
---|---|---|---|
Python 3.9 | binary-trees | 51x | 49x |
Python 3.9 | fannkuch-redux | 72x | 68x |
Python 3.9 | fasta | 30x | 33x |
Python 3.9 | TOTAL | 63x | 61x |
Source: Measurements, charts and details
Language | Benchmark | vs. C [CPU Energy] | vs. C [Machine Energy] |
---|---|---|---|
Python 3.12 | binary-trees | 33x | 33x |
Python 3.12 | fannkuch-redux | 57x | 54x |
Python 3.12 | fasta | 28x | 31x |
Python 3.12 | TOTAL | 50x | 48x |
Source: Measurements, charts and details
Language | Benchmark | vs. C [CPU Energy] | vs. C [Machine Energy] |
---|---|---|---|
PyPy 3.10 | binary-trees | 5x | 7x |
PyPy 3.10 | fannkuch-redux | 21x | 25x |
PyPy 3.10 | fasta | 22x | 18x |
PyPy 3.10 | TOTAL | 18x | 21x |
Language | Benchmark | vs. C [CPU Energy] | vs. C [Machine Energy] |
---|---|---|---|
Mojo | binary-trees | 51x | 48x |
Mojo | fannkuch-redux | 65x | 62x |
Mojo | fasta | 32x | 36x |
Mojo | TOTAL | 58x | 56x |
58x Difference in CPU Energy (Mean)
56x Difference in Total Machine Energy (Mean)
What => Change to 3.6 is significant!
What stands out with these results is that we cannot exactly reproduce the 75x difference between Python and C. Our data only shows a 60x difference.
These are the best python versions in descending order:
Language | Overhead vs. C [Machine Energy] | ||
---|---|---|---|
PyPy 3.10 | 21x | ||
Python 3.12 | 48x | ||
Mojo | 56x | ||
Python 3.6 | 58x | ||
Python 3.9 | 61x |
The reason for that is most likely that we use newer and different hardware. However what should be expected is that we at least have a similar offset for the singular tests, which is also not the case.
In the original paper the differences for the single tests comparing Python 3.6 with C are:
So not only are the values off, also the tendency swaps direction for the fasta test.
We have no explanation for that at the moment.
For the TOTAL value of all these three tests combined at least there is an uncertainty what the authors here accumulated exactly. Specifically if it is just the average of the ratios (60+66+34/3) or if it is the sum of the total energies and then the ratio (which is what the Green Metrics Tool does).
In any case, it would not explain the differences in the singular tests, so we did not investiage here any further.
What we can see though is that Python definitely made an increase in efficiency from Python 3.6 to Python 3.12 (with a suprising bump for Python 3.9 :) )
Coming back to our initial research question we can attest that using Python today is around ~18% more efficient. That being sad you are probably at least around 48x times worse than C on a plain compute job :)
Moving to a different interpreter like PyPy though makes a strong improvment and overall and is more than 50% more efficient than Python 3.12. Which means it is only around 21x worse than C … in selective cases also down to 7x, which is pretty strong!
Mojo showed no relevant improvements which is mostly due to the fact that it cannot natively enhance Python code at the time of writing.
It will just wrap the Python code and import it as a module and then run it with the native Python interpreter (libpython
). See our discussion on their Github
In this case study we have looked at Python and how it compares to C in a nostalgic look back on the original test setup from Greenlab and their paper Energy Efficiency across Programming Languages.
We have seen that Python has improved quite a bit (~18%) and different interpreters can remedy the slowness problem of the language quite a bit (PyPy).
Some numbers were hard to compare and their actual offset stays unknown. However it should not reduce the validity of the findings in this case study.
The general setup and absolute claims about wether or not Python is really 48x worse or 75x worse is also heavily debated on Reddit where the selection of the author to just use the fastest benchmark they could find in the repostiory of the The Computer Language Benchmark Game is criticized. People argue that usign a similar implementation should be more representative.
What exactly means we will leave up to you curious researchers and we encourage you to ping us if this article sparked your interest, you would like to ask questions, spotted a flaw or even want to reproduce our measurements. It could be a nice opportunity to give our free open source Green Metrics Tool a test drive ;)
Contact us on this article at [email protected]