Energy Efficiency of programming languages - Revisiting Python in 2024
- by Arne Tarara

In 2017 a paper was published in the Proceedings of 2017 ACM SIGPLAN called Energy Efficiency across Programming Languages

The paper compares different programming language on standardised algorithmical compute benchmarks and ranks them according to their energy efficiency.

One of the most loved and used languages today, Python, ranks very low in this paper having a 75x increased energy consumption over the low level language C.

Python vs. C
Python vs. C - Energy consumption (relative)

To this day the paper is cited on many media outlets, LinkedIn, and even TikTok! It certainly shook up many people by creating a doom and gloom scenario that many digital products use one of the currently most inefficient languages.

Though many caveats exist in this claim like:

  • Most python code makes intensive compute through C libraries and not through Python directly
  • Actual programs compared seem to be more like 4x in difference due to the nature of a program to not be compute only but also a lot of I/O, syscalls etc [1].

However, not close to 7 years later and many Python versions and also implementations later we thought it is due for a re-visit of the paper.

Since in the paper Python 3.6 was used we will look at newer version like Python 3.9, Python 3.12, Mojo, RustPython and also PyPy 3.10 if Python has advanced either in the CPython reference implementation of if also different interpreters might ease the inefficiency a bit.

Agenda
What do we want to find out?

Research question
How has the energy efficiency changed with Python 3.9 and Python 3.12 and how does PyPy and Mojo compare to C?

Benchmarking with the Green Metrics Tool

We used the same benchmarks as in the original paper: The Computer Language Benchmark Game

Since the archived repository on Gitlab contains mutliple submitted variants we resorted to using the originally selected benchmarks by the study authors in their Github repository

The Green Metrics Tool makes it very easy to consume these benchmarks directly. We only added a container which contained the necessary Python version:

We just boot the container, execute the CLI command and let the Green Metrics Tool do it’s automated measurement magic. Find an example usage_scenario for Python 3.6 here.

During the run we are mainly looking at the CPU energy and the total machine energy. In the original paper they only looked at the CPU energy.

Pro Tip: If you do not know what the Green Metrics Tool is: It is our all-in-one open source professional software benchmarking and optimization solution. Find infos here

Results

LanguageBenchmarkvs. C [CPU Energy]vs. C [Machine Energy]
Python 3.6binary-trees60x56x
Python 3.6fannkuch-redux66x63x
Python 3.6fasta34x38x
Python 3.6TOTAL61x58x
Source: Measurements, charts and details


LanguageBenchmarkvs. C [CPU Energy]vs. C [Machine Energy]
Python 3.9binary-trees51x49x
Python 3.9fannkuch-redux72x68x
Python 3.9fasta30x33x
Python 3.9TOTAL63x61x

Source: Measurements, charts and details


LanguageBenchmarkvs. C [CPU Energy]vs. C [Machine Energy]
Python 3.12binary-trees33x33x
Python 3.12fannkuch-redux57x54x
Python 3.12fasta28x31x
Python 3.12TOTAL50x48x

Source: Measurements, charts and details


LanguageBenchmarkvs. C [CPU Energy]vs. C [Machine Energy]
PyPy 3.10binary-trees5x7x
PyPy 3.10fannkuch-redux21x25x
PyPy 3.10fasta22x18x
PyPy 3.10TOTAL18x21x

LanguageBenchmarkvs. C [CPU Energy]vs. C [Machine Energy]
Mojobinary-trees51x48x
Mojofannkuch-redux65x62x
Mojofasta32x36x
MojoTOTAL58x56x

DISCUSSION

What stands out with these results is that we cannot exactly reproduce the 75x difference between Python and C. Our data only shows a 60x difference.

These are the best python versions in descending order:

LanguageOverhead vs. C [Machine Energy]
PyPy 3.1021x
Python 3.1248x
Mojo56x
Python 3.658x
Python 3.961x

The reason for that is most likely that we use newer and different hardware. However what should be expected is that we at least have a similar offset for the singular tests, which is also not the case.

In the original paper the differences for the single tests comparing Python 3.6 with C are:

  • binary-trees: 45x (CPU Energy) => 25% less
  • fankuch-redux: 59x (CPU Energy) => 11% less
  • fasta: 38x (CPU Energy) => 11% more

So not only are the values off, also the tendency swaps direction for the fasta test.

We have no explanation for that at the moment.

For the TOTAL value of all these three tests combined at least there is an uncertainty what the authors here accumulated exactly. Specifically if it is just the average of the ratios (60+66+34/3) or if it is the sum of the total energies and then the ratio (which is what the Green Metrics Tool does).

In any case, it would not explain the differences in the singular tests, so we did not investiage here any further.

What we can see though is that Python definitely made an increase in efficiency from Python 3.6 to Python 3.12 (with a suprising bump for Python 3.9 :) )

Coming back to our initial research question we can attest that using Python today is around ~18% more efficient. That being sad you are probably at least around 48x times worse than C on a plain compute job :)

Moving to a different interpreter like PyPy though makes a strong improvment and overall and is more than 50% more efficient than Python 3.12. Which means it is only around 21x worse than C … in selective cases also down to 7x, which is pretty strong!

Mojo showed no relevant improvements which is mostly due to the fact that it cannot natively enhance Python code at the time of writing. It will just wrap the Python code and import it as a module and then run it with the native Python interpreter (libpython). See our discussion on their Github

Summary and further considerations

In this case study we have looked at Python and how it compares to C in a nostalgic look back on the original test setup from Greenlab and their paper Energy Efficiency across Programming Languages.

We have seen that Python has improved quite a bit (~18%) and different interpreters can remedy the slowness problem of the language quite a bit (PyPy).

Some numbers were hard to compare and their actual offset stays unknown. However it should not reduce the validity of the findings in this case study.

The general setup and absolute claims about wether or not Python is really 48x worse or 75x worse is also heavily debated on Reddit where the selection of the author to just use the fastest benchmark they could find in the repostiory of the The Computer Language Benchmark Game is criticized. People argue that usign a similar implementation should be more representative.

What exactly means we will leave up to you curious researchers and we encourage you to ping us if this article sparked your interest, you would like to ask questions, spotted a flaw or even want to reproduce our measurements. It could be a nice opportunity to give our free open source Green Metrics Tool a test drive ;)

Contact us on this article at info@green-coding.io

Sources

[1] Microsoft research on energy consumption of UI apps