Energy Efficiency of programming languages - Revisiting Python in 2024

Fri, May 31, 2024 - by Arne Tarara

In 2017 a paper was published in the Proceedings of 2017 ACM SIGPLAN called Energy Efficiency across Programming Languages

The paper compares different programming language on standardised algorithmical compute benchmarks and ranks them according to their energy efficiency.

One of the most loved and used languages today, Python, ranks very low in this paper having a 75x increased energy consumption over the low level language C.

Python vs. C - Energy consumption (relative)

To this day the paper is cited on many media outlets, LinkedIn, and even TikTok! It certainly shook up many people by creating a doom and gloom scenario that many digital products use one of the currently most inefficient languages.

Though many caveats exist in this claim like:

Most python code makes intensive compute through C libraries and not through Python directly
Actual programs compared seem to be more like 4x in difference due to the nature of a program to not be compute only but also a lot of I/O, syscalls etc [1].

However, not close to 7 years later and many Python versions and also implementations later we thought it is due for a re-visit of the paper.

Since in the paper Python 3.6 was used we will look at newer version like Python 3.9, Python 3.12, Mojo, RustPython and also PyPy 3.10 if Python has advanced either in the CPython reference implementation of if also different interpreters might ease the inefficiency a bit.

Agenda

What do we want to find out?

Benchmarking with the Green Metrics Tool

We used the same benchmarks as in the original paper: The Computer Language Benchmark Game

Since the archived repository on Gitlab contains mutliple submitted variants we resorted to using the originally selected benchmarks by the study authors in their Github repository

The Green Metrics Tool makes it very easy to consume these benchmarks directly. We only added a container which contained the necessary Python version:

We just boot the container, execute the CLI command and let the Green Metrics Tool do it’s automated measurement magic. Find an example usage_scenario for Python 3.6 here.

During the run we are mainly looking at the CPU energy and the total machine energy. In the original paper they only looked at the CPU energy.

Pro Tip: If you do not know what the Green Metrics Tool is: It is our all-in-one open source professional software benchmarking and optimization solution. Find infos here

Results

Language	Benchmark	vs. C [CPU Energy]	vs. C [Machine Energy]
Python 3.6	binary-trees	60x	56x
Python 3.6	fannkuch-redux	66x	63x
Python 3.6	fasta	34x	38x
Python 3.6	TOTAL	61x	58x

Source: Measurements, charts and details

Language	Benchmark	vs. C [CPU Energy]	vs. C [Machine Energy]
Python 3.9	binary-trees	51x	49x
Python 3.9	fannkuch-redux	72x	68x
Python 3.9	fasta	30x	33x
Python 3.9	TOTAL	63x	61x

Source: Measurements, charts and details

Language	Benchmark	vs. C [CPU Energy]	vs. C [Machine Energy]
Python 3.12	binary-trees	33x	33x
Python 3.12	fannkuch-redux	57x	54x
Python 3.12	fasta	28x	31x
Python 3.12	TOTAL	50x	48x

Source: Measurements, charts and details

Language	Benchmark	vs. C [CPU Energy]	vs. C [Machine Energy]
PyPy 3.10	binary-trees	5x	7x
PyPy 3.10	fannkuch-redux	21x	25x
PyPy 3.10	fasta	22x	18x
PyPy 3.10	TOTAL	18x	21x

Source: Measurements, charts and details
- 18x Difference in CPU Energy (Mean)
- 21x Difference in Total Machine Energy (Mean)

Language	Benchmark	vs. C [CPU Energy]	vs. C [Machine Energy]
Mojo	binary-trees	51x	48x
Mojo	fannkuch-redux	65x	62x
Mojo	fasta	32x	36x
Mojo	TOTAL	58x	56x

Source: Measurements, charts and details
- 58x Difference in CPU Energy (Mean)
- 56x Difference in Total Machine Energy (Mean)
- What => Change to 3.6 is significant!

DISCUSSION

What stands out with these results is that we cannot exactly reproduce the 75x difference between Python and C. Our data only shows a 60x difference.

These are the best python versions in descending order:

Language	Overhead vs. C [Machine Energy]
PyPy 3.10	21x
Python 3.12	48x
Mojo	56x
Python 3.6	58x
Python 3.9	61x

The reason for that is most likely that we use newer and different hardware. However what should be expected is that we at least have a similar offset for the singular tests, which is also not the case.

In the original paper the differences for the single tests comparing Python 3.6 with C are:

binary-trees: 45x (CPU Energy) => 25% less
fankuch-redux: 59x (CPU Energy) => 11% less
fasta: 38x (CPU Energy) => 11% more

So not only are the values off, also the tendency swaps direction for the fasta test.

We have no explanation for that at the moment.

For the TOTAL value of all these three tests combined at least there is an uncertainty what the authors here accumulated exactly. Specifically if it is just the average of the ratios (60+66+34/3) or if it is the sum of the total energies and then the ratio (which is what the Green Metrics Tool does).

In any case, it would not explain the differences in the singular tests, so we did not investiage here any further.

What we can see though is that Python definitely made an increase in efficiency from Python 3.6 to Python 3.12 (with a suprising bump for Python 3.9 :) )

Coming back to our initial research question we can attest that using Python today is around ~18% more efficient. That being sad you are probably at least around 48x times worse than C on a plain compute job :)

Moving to a different interpreter like PyPy though makes a strong improvment and overall and is more than 50% more efficient than Python 3.12. Which means it is only around 21x worse than C … in selective cases also down to 7x, which is pretty strong!

Mojo showed no relevant improvements which is mostly due to the fact that it cannot natively enhance Python code at the time of writing. It will just wrap the Python code and import it as a module and then run it with the native Python interpreter (libpython). See our discussion on their Github

Summary and further considerations

In this case study we have looked at Python and how it compares to C in a nostalgic look back on the original test setup from Greenlab and their paper Energy Efficiency across Programming Languages.

We have seen that Python has improved quite a bit (~18%) and different interpreters can remedy the slowness problem of the language quite a bit (PyPy).

Some numbers were hard to compare and their actual offset stays unknown. However it should not reduce the validity of the findings in this case study.

The general setup and absolute claims about wether or not Python is really 48x worse or 75x worse is also heavily debated on Reddit where the selection of the author to just use the fastest benchmark they could find in the repostiory of the The Computer Language Benchmark Game is criticized. People argue that usign a similar implementation should be more representative.

What exactly means we will leave up to you curious researchers and we encourage you to ping us if this article sparked your interest, you would like to ask questions, spotted a flaw or even want to reproduce our measurements. It could be a nice opportunity to give our free open source Green Metrics Tool a test drive ;)

Sources

[1] Microsoft research on energy consumption of UI apps

Research questionHow has the energy efficiency changed with Python 3.9 and Python 3.12 and how does PyPy and Mojo compare to C?

DISCUSSION

Sources

Newsletter

Research question
How has the energy efficiency changed with Python 3.9 and Python 3.12 and how does PyPy and Mojo compare to C?