Energy Efficiency of programming languages - Revisiting Python in 2024
- by Arne Tarara

In 2017 a paper was published in the Proceedings of 2017 ACM SIGPLAN called Energy Efficiency across Programming Languages

The paper compares different programming language on standardised algorithmical compute benchmarks and ranks them according to their energy efficiency.

One of the most loved and used languages today, Python, ranks very low in this paper having a 75x increased energy consumption over the low level language C.

Python vs. C
Python vs. C - Energy consumption (relative)

To this day the paper is cited on many media outlets, LinkedIn, and even TikTok! It certainly shook up many people by creating a doom and gloom scenario that many digital products use one of the currently most inefficient languages.

Though many caveats exist in this claim like:

  • Most python code makes intensive compute through C libraries and not through Python directly
  • Actual programs compared seem to be more like 4x in difference due to the nature of a program to not be compute only but also a lot of I/O, syscalls etc [1].

However, not close to 7 years later and many Python versions and also implementations later we thought it is due for a re-visit of the paper.

Since in the paper Python 3.6 was used we will look at newer version like Python 3.9, Python 3.12, Mojo, RustPython and also PyPy 3.10 if Python has advanced either in the CPython reference implementation of if also different interpreters might ease the inefficiency a bit.

Agenda
What do we want to find out?

Research question
How has the energy efficiency changed with Python 3.9 and Python 3.12 and how does PyPy and Mojo compare to C?

Benchmarking with the Green Metrics Tool

We used the same benchmarks as in the original paper: The Computer Language Benchmark Game

Since the archived repository on Gitlab contains mutliple submitted variants we resorted to using the originally selected benchmarks by the study authors in their Github repository

The Green Metrics Tool makes it very easy to consume these benchmarks directly. We only added a container which contained the necessary Python version:

We just boot the container, execute the CLI command and let the Green Metrics Tool do it’s automated measurement magic. Find an example usage_scenario for Python 3.6 here.

During the run we are mainly looking at the CPU energy and the total machine energy. In the original paper they only looked at the CPU energy.

Pro Tip: If you do not know what the Green Metrics Tool is: It is our all-in-one open source professional software benchmarking and optimization solution. Find infos here

Results

Language Benchmark vs. C [CPU Energy] vs. C [Machine Energy]
Python 3.6 binary-trees 60x 56x
Python 3.6 fannkuch-redux 66x 63x
Python 3.6 fasta 34x 38x
Python 3.6 TOTAL 61x 58x
Source: Measurements, charts and details


Language Benchmark vs. C [CPU Energy] vs. C [Machine Energy]
Python 3.9 binary-trees 51x 49x
Python 3.9 fannkuch-redux 72x 68x
Python 3.9 fasta 30x 33x
Python 3.9 TOTAL 63x 61x

Source: Measurements, charts and details


Language Benchmark vs. C [CPU Energy] vs. C [Machine Energy]
Python 3.12 binary-trees 33x 33x
Python 3.12 fannkuch-redux 57x 54x
Python 3.12 fasta 28x 31x
Python 3.12 TOTAL 50x 48x

Source: Measurements, charts and details


Language Benchmark vs. C [CPU Energy] vs. C [Machine Energy]
PyPy 3.10 binary-trees 5x 7x
PyPy 3.10 fannkuch-redux 21x 25x
PyPy 3.10 fasta 22x 18x
PyPy 3.10 TOTAL 18x 21x

Language Benchmark vs. C [CPU Energy] vs. C [Machine Energy]
Mojo binary-trees 51x 48x
Mojo fannkuch-redux 65x 62x
Mojo fasta 32x 36x
Mojo TOTAL 58x 56x

DISCUSSION

What stands out with these results is that we cannot exactly reproduce the 75x difference between Python and C. Our data only shows a 60x difference.

These are the best python versions in descending order:

Language Overhead vs. C [Machine Energy]
PyPy 3.10 21x
Python 3.12 48x
Mojo 56x
Python 3.6 58x
Python 3.9 61x

The reason for that is most likely that we use newer and different hardware. However what should be expected is that we at least have a similar offset for the singular tests, which is also not the case.

In the original paper the differences for the single tests comparing Python 3.6 with C are:

  • binary-trees: 45x (CPU Energy) => 25% less
  • fankuch-redux: 59x (CPU Energy) => 11% less
  • fasta: 38x (CPU Energy) => 11% more

So not only are the values off, also the tendency swaps direction for the fasta test.

We have no explanation for that at the moment.

For the TOTAL value of all these three tests combined at least there is an uncertainty what the authors here accumulated exactly. Specifically if it is just the average of the ratios (60+66+34/3) or if it is the sum of the total energies and then the ratio (which is what the Green Metrics Tool does).

In any case, it would not explain the differences in the singular tests, so we did not investiage here any further.

What we can see though is that Python definitely made an increase in efficiency from Python 3.6 to Python 3.12 (with a suprising bump for Python 3.9 :) )

Coming back to our initial research question we can attest that using Python today is around ~18% more efficient. That being sad you are probably at least around 48x times worse than C on a plain compute job :)

Moving to a different interpreter like PyPy though makes a strong improvment and overall and is more than 50% more efficient than Python 3.12. Which means it is only around 21x worse than C … in selective cases also down to 7x, which is pretty strong!

Mojo showed no relevant improvements which is mostly due to the fact that it cannot natively enhance Python code at the time of writing. It will just wrap the Python code and import it as a module and then run it with the native Python interpreter (libpython). See our discussion on their Github

Summary and further considerations

In this case study we have looked at Python and how it compares to C in a nostalgic look back on the original test setup from Greenlab and their paper Energy Efficiency across Programming Languages.

We have seen that Python has improved quite a bit (~18%) and different interpreters can remedy the slowness problem of the language quite a bit (PyPy).

Some numbers were hard to compare and their actual offset stays unknown. However it should not reduce the validity of the findings in this case study.

The general setup and absolute claims about wether or not Python is really 48x worse or 75x worse is also heavily debated on Reddit where the selection of the author to just use the fastest benchmark they could find in the repostiory of the The Computer Language Benchmark Game is criticized. People argue that usign a similar implementation should be more representative.

What exactly means we will leave up to you curious researchers and we encourage you to ping us if this article sparked your interest, you would like to ask questions, spotted a flaw or even want to reproduce our measurements. It could be a nice opportunity to give our free open source Green Metrics Tool a test drive ;)

Contact us on this article at [email protected]

Sources

[1] Microsoft research on energy consumption of UI apps