# How fast can we get for the European Extremely Large Telescope? Performance optimizations on real-time hardware

### 2020 started recently so it’s time to look into the stars again. I will continue with my last year’s blog post “A new year has started. Let’s look into the stars!” and share with you my current challenges at work that are all about “How fast can we get for the European Extremely Large Telescope?”

As already described in my first blog post, my research is dealing with adaptive optics and, in particular, with the problem of atmospheric tomography for the European Extremely Large Telescope (ELT). Because the atmosphere is changing rapidly this problem has to be solved in real-time. In our case, real-time means within two milliseconds. These requirement is quite challenging for the ELT, because the telescope is large which leads to a huge amount of data that has to be processed. The challenge I’m dealing with is how to optimize the performance of solving the atmospheric tomography problem.
The first decision you have to make when dealing with performance optimizations is which algorithm you want to use. Within my research, I have to solve a system of equations which is a common problem in many real world applications. You can either choose a direct solver or an iterative one. In contrast to the direct solver, an iterative approach starts with an initial guess and iterates until it reaches a solution that is good enough. In general, iterative solvers are better for very large and sparse systems. Moreover, you have the possibility to use an iterative solver in a matrix-free fashion, i.e., avoid to store all the matrix entries which saves a lot of memory.
If you want to improve the performance of an existing algorithm you usually try to parallelize, i.e., execute steps in parallel. For example, when adding two vectors of length n every entry is completely independent from the others. Thus, you can execute the n additions in parallel and save a lot of time. For more complex algorithms deciding where and how to parallelize becomes a quite challenging task.
Another non-negligible aspect is the right hardware. A topic in which mathematicians are in general not well versed. This is one of the nice things about ROMSOC. Half of my PhD I’m working at the company Microgate with specialists in real-time hardware. This gives me the possibility to look into various fields, not only mathematics, and improve my interdisciplinary skills. At the moment we are running our algorithm on the high end GPU NVIDIA Tesla V100 at Microgate as well as on a high performance computing cluster called Radon1 from RICAM in Linz.

Altogether, performance optimization is a hard task. You never know in advance if you can gain enough speed for your underlying algorithm which can become quit gruelling after a while. Nevertheless, it is something really important for practical applications and something many mathematicians do not pay enough attention.