Artificial Intelligence is on the rise and it is using huge amounts of resources. [1] [2] [3] People are still debating on how much, but it is clear that it is quite a lot. While the training of LLMs is a prevalent discussion the inference stage is hardly discussed. This is the process of getting a model to answer a prompt. Every query that is put to a model also takes up quite a considerable amount of resources. We started researching this topic quite some time back and wanted to see how much resources this really is and how difference models compare to each other and if different types of queries also take different amounts of energy. Coming out of this research we want to present: [...]
