Daniel Gruner, SciNet HPC Consortium, University of Toronto
Daniel Gruner is the Chief Technical Officer at the SciNet High Performance Computing Consortium at the University of Toronto. SciNet runs the largest academic computer systems in Canada. Daniel has more than thirty years' experience in computational science and scientific computing, working with a variety of programming languages, parallel computing, scientific modelling, software architecture, windowing system GUI programming, administration of large Beowulf clusters and large shared-memory parallel computers, system administration, and networking.
He earned a bachelor's degree in chemistry and physics from Hebrew University of Jerusalem, and a doctorate in chemical physics from the University of Toronto.
Computation is now pervasive, affecting and helping all fields of research and human endeavour. Thus being in the middle of it, teaching researchers and enabling their work, is tremendously important.
So you have a big HPC resource - how do you make sure it is utilized effectively?
We all want a big cluster, fast network, huge storage... Then we need proper software, running
in parallel, heavily optimized. We get the cool science projects, and the eager students, who
try to run their jobs, manage their data, and get reasonable results. But it is not all smooth
sailing: there are lots of stumbling blocks, frustrated users, misused resources, filesystems
hammered to death...
The above is a common picture surely recognized by all of you. So what is missing?
Training and education, of course! This is so often overlooked, by data centres, by schools,
by professors who need their students to know all this stuff instinctively.
At SciNet we have made it our mission to help fix this, primarily by having a very extensive
education (a.k.a. "training") programme. I will describe our approach, and success stories, in