All entries for Tuesday 25 July 2023
July 25, 2023
A tricksy bit of C++ templating
Basic C++ Templates
My First Templates
Most people's first encounter with C++ templates is when they want to write a function which can take several types and do the same thing regardless of which. That is, something where the function body ends up the same, but the signatures differ. The templating engine lets you do this really easily, and better still, only the versions of the function you actually use will be created by the compiler which reduces the size of your compiled binary.
Let's make a couple of dummy functions to discuss:
template <typename T>
T myfunc(T);
template <typename T>
void myfunc(const & T);
The first of these is probably the simplest possible template function - we define a name for our 'type' parameter, calling it T, and use T where we would normally use a type such as int, double etc. The line-break is purely convention, but is the common way to write these. We can use T in the signature and/or the body, or we can not use it at all in certain circumstances. This is the 'template' for how the function should look, and if we call it from our code we'll get a version where T is substituted for an actual type, based on, in this case, the type of the argument we pass.
The second block defines a slightly more complex function, where we add some qualifiers to the T type argument, in this case making it a const reference. Note we can overload between T, const T, ref T etc just like we can with explicit types, int and int & being different functions.
Overload resolution between templated functions and between templated functions, non-templated functions and what are called template specialisations are complicated, but mostly can be thought of as going from more to less restrictive - int myfunc(int) would take priority over the templated version for example.
Templates go in Header Files
The fact that T is a placeholder for an actual type is a chunk of why template _definitions_ have to go in header files (excepting so called 'explicit instantiation' which wont discuss here) - the compiler needs the definition to compile the function when it encounters a call to it, and make sure that this function can compile for that type - for instance if the function body passes T to other functions, or calls member functions on it (is T is meant to be a class), are these available? If not, the function cannot be compiled. Since a header file is included (directly or indirectly) in every C++ file which needs it, the full definition is always available during compilation.
Or putting it another way, we can't compile a C++ file containing a templated definition on it's own because we don't know which types we're compiling for, and we certainly don't want to compile for every possible type, even if we could, as this would produce a ton of useless code.
By the way, because templates are not completely 'fleshed out' functions, they are implicitly exempted from the 'one-definition' rule, although properly using include guards etc is still a good idea!
Some More Specific Template Stuff
Template Specialisation
So what if we want to use a different version of a function for, say, integers and class types? Well, we can provide an integer _specialisation_ of the template, which will take priority over the general version, so like
template <typename T>
T myfunc(T in);
template <>
int myfunc(int in);
(Note we have to indicate this is a specialisation using the template<> bit. If we leave that out we get something slightly different and have to consider those overload resolution rules properly - lets not go into the details because they start to hurt).
Suppose in the generic version we use something like 'in.size()' - this wont compile for an integer type. But the compiler doesn't want to do extra work, and is programmed to let an explicit specialisation take priority over a general version, so, having found the specialisation, the compiler stops looking.
Thinking about this for a few moments, we see that a) if this wasn't the case these specialisations would be a bit useless and b) this is actually very powerful - we have potentially uncompilable code, but as long as we never _use_ it we're safe.
Templates For Specific Types
OK, so suppose we have a method that makes sense for int, double, float, long etc, all of the numeric types - we kinda want a template that only allows one of those. We can do it, although we wont show the details, because exactly how is messy and varies between C++11, 14, 17 and newer as this ability has got refined. The trick though, is creating a function template that only works for numeric types, by making something that tries to create an invalid type otherwise. Non-numeric types will fail to substitute, so trying to call our function with these fails - as we desire since this method doesn't make sense.
In some cases we might want to select between methods depending on something like 'is this type numeric' or is it an integer or other stuff. Assuming this technique does something similar to what we just described, we're going to get some options where certain types 'don't make sense' with potential function overloads, but we want to skip those and look at others instead.
This last bit is important - it is a fundamental principle in C++ that templates that "don't work out" don't count as compiler failure. This is "Substitution Failure Is Not An Error" or Sfinae (see other vid, link in description).
This is very powerful, pretty complicated and a big old mess, so lets leave the details alone! Instead, lets look at something neat but clever which we can do in the next section.
Decorator or Wrapper Classes
A Basic Wrapper Class - Function Pass-thru
Suppose we're writing a class which is a bit like a Python decorator - we have some class and we want to encase it in another which adds a bit of function. This is the "Decorator" pattern in design pattern terms too. A common idea is to decorate a single function with e.g. code to time it's execution. The crux is to have two things which are independent, and the ability to combine them - timing code for example would simply execute the function, timing as it goes, and return whatever it returns. It would not and should not care about what the function is, what it does, or how it does it.
A wrapper class is a similar idea, where we want to encase one class in a larger one, and in some sense 'pass through' it's functions. Perhaps we want to change the interface (function names, signatures etc) of a class to make it fit better with calling code (the Adapter pattern), or add defaults or something, and we do not want to edit the wrapped class to do so. Or again, suppose we want to decorate the class with some extra function.
As a pretty dumb but decently useful example, suppose we have some logger class for writing information to file or screen, and we wanted to add a timestamp (imagine our favorite system doesn't have this already). Naively, we might imagine just stringifying everything, then passing those to the logger, having prepended our extra info. But this is clumsy - why restrict ourselves to only taking strings? Better to write wrapper functions that take any type using a template. For instance (semi pseudocode)
class logger; // This comes from some library
class timestampedLogger{
logger myLogger;
string getTimeNow(){...}
template <typename T>
void write(T in){
logger.write(getTimeNow());
logger.write(in);
}
};
We insert a timestamp, but otherwise, we just funnel everything through. In some senses, we're making timestampedLogger substitutable for plain logger, BUT AT COMPILE TIME. Instead of creating a 'logger' we create our wrapped timestampedLogger, and implement all of the same functions, meaning we can simply swap it out. We're not inheriting from logger, which would make us substitutable at runtime, we're replacing it completely at compile time.
Compile time fiddling like this is what templates allow us to do, and do (fairly) cleanly.
Unknown Return Types
Now imagine a slightly more fiddly thing, which is something we've been playing with doing recently - suppose we have some data-storage classes, for example vector or array and we want to wrap them up in something which is aware of physical units (kg, m, s etc). This is very much a decorator pattern, with a slight twist which is that not only do we want to pass-through (certain) member functions of the storage type unchanged, we also add functions dealing with say multiplying one thing with another.
The motivation for this is a kind of enhanced type safety - a length and a time cannot be added together, so lets exploit the type system to enforce this. Ideally this happens at compile time, as our simulations are time-critical and we don't want needless runtime checks for something that, once programmed correctly, wont change. But we might be adding or changing the equations we're solving, so just manually checking things once is insufficient.
The details of doing this are actually pretty mucky and horrible (see the Github Repo for the details, but beware!), but it showed up an interesting tricky bit with wrappers like this - can we write a function which has a return type which is not fixed, nor deducible from its arguments directly, but which depends on the function we are wrapping, in particular what its return type is?
C++14 introduced the idea of an auto function return type. Why should the programmer have to specify something that the compiler can work out? Consider the following:
returnType getLogFilename(){return myLogger.getLogFilename();}
returnType might be a few things, but let's say it's a string. Or could it be a wstring (string supporting wide characters, such as accented or non-Latin)? Why should we have to go rummage in the logger class? And why should our intermediate, our wrapper, have to care if this changed in an update (leaving aside the myriad problems with changing interfaces...).
Obviously the return type of that can't be deduced (worked out) from the arguments - there are none! But it can be worked out from the call and the compiler can and will do this, so we don't have to, if instead of returnType, we put 'auto'.
A Templated Class
So one more twist - suppose we want to wrap one of multiple possible logger classes. Imagine one of these implements a method 'setToRed()' to make output red (e.g. in a terminal). If we write the following function
void setToRed(){myLogger.setToRed();};
this compiles fine for the class which implements that function, but if the logger we are wrapping does not, we get a compile error, whether or not we ever call this function. This is obvious - that code can't compile. So how can we make this available if and only if the backing class offers it?
We might hope that this:
template <typename T>
void setToRed(){myLogger.setToRed();};
would somehow do it - a templated function "isn't compiled until it's used" we think.
But it doesn't. I honestly don't understand the details of what is, and isn't allowed here, because compilers are complicated things and a lot of work goes into making template code compile as fast as possible, since it must be done every time.
But we canuse templating to do this and actually if we write our wrapper "properly" we'll get this ability for free! Before, I assumed we'd just swap out the 'logger' entity rarely, and want only one possible option at any time, so I assumed 'logger' was provided by a library, and I'd maybe use a typedef or using to make sure whatever logging class I was using had that name.
A proper wrapper class would be a templated class, where we can provide information on the logger type when we create it. So we'd do something like this:
template <typename L>
class timestampedLogger{
public:
L myLogger;
string getTimeNow(){...}
template <typename T>
void write(T in){
logger.write(getTimeNow());
logger.write(in);
}
void setToRed(){myLogger.setToRed();};
auto getLogFilename(){return myLogger.getLogFilename();}
};
We've included that 'auto' return type we mentioned, as well as an explicitly templated function - notice we have to name the parameters differently - T and L are both template types, but are unrelated.
Now to actually create one of these we have to do:
timestampedLogger<logger> myTLogger;
or we could offer a shortcut with a 'using' statement:
using tLogger = timestampedLogger<logger>;
Only if we use a function will the full body be handled, so as long as we don't use it, we can leave the setToRed here. We are now relying on the code which calls (or does not call) this function to know if it can or not - but if we're thinking of our wrapper as substitutable for a plain logger class, that was already the case, so there is no real disadvantage there.
With those two layers of templating we can write everything we need to wrap timestamping around our logger. We have to wrap every function we want to expose, but using templates and auto we can minimise how much information we're repeating about those functions - which we should always try to do. The compiler will work all that out once we substitute a real logger class type for that L parameter
Like That Type, But Different
For a brief final foray into the power, but also the attendant horror, of the templating system, lets imagine a more complicated decorator class, where our wrapping functions might change the type returned by functions they wrap.
The units checking project we're working on adds one more twist to this idea - how can we wrap our units wrapper onto something we're returning from a function whose return type we don't know? For instance, suppose we have scalars (single numbers), vectors (physics style so triplets of numbers) and tensors (ditto, so a 3x3 array). Suppose some function goes "down the hierarchy" so returns a lower rank for each, except scalars where it returns the same type.
That is how would we implement the following?
class scalar{
public:
int dat{0};
scalar()=default;
scalar(int val):dat(val){};
scalar exampleFunction(){return scalar(1);}
};
class vector{
public:
int dat[3];
vector()=default;
vector(int a, int b, int c){dat[0]=a; dat[1]=b; dat[2]=c;};
scalar exampleFunction(){return scalar(2);}
};
class tensor{
public:
int dat[9];
vector exampleFunction(){return vector(1, 2, 3);}
};
template <typename ST>
class unitsType{
public:
ST myData;
unitsType<???> exampleFunction(){unitsType<???> val; val.myData = myData.exampleFunction(); return val;}
};
What replaces those '???' ? We can't write anything in there ourselves because it depends on the type of the parameter ST, and crucially is NOT ST itself which would be simple. We can use auto in place of the return type explicitly, but not for the declaration of val. But the compiler knows on some level what type myData.exampleFunction()
will return, and we can ask it to help us by spitting that out:
using ReturnTypeExample = decltype((std::declval<ST>()).exampleFunction());
Here decltype is saying 'give me just the type information and declval is taking a type, ST, and creating a sort of compile-time instance of that type so that we can access a member function by name. So we have what we need: unitsType<ReturnTypeExample>
is the return value and the type we need to give to val in the body. Perfect.
But I Want to do it Myself
But there's one last twist - what if we have the same desire as before - to allow this for functions that ST may or may not implement? This is actually using the function at compile time to get that return value, AND there's no template substitution going on which might defer that until the wrapper function is actually used. The solution to this is a pretty horrible trick, and I DO NOT KNOW for sure it will work in all circumstances, but it solves this problem:
We have to add another layer of template, so that only when a specific instantiation uses the function will everything unroll and the function actually be needed. Lets name this possibly-nonexistent function nopeFunction() since nope, I didn't implement it...
So we do this:
template <typename Q>
using ReturnTypeNope = decltype((std::declval<Q>()).nopeFunction());
auto nopeFunction(){
unitsType<ReturnTypeNope<ST> > val;
val.myData = myData.nopeFunction();
return val;
}
This works - only when the nopeFunction() here is called does anything attempt to unpack its return type. The auto return type deduction can only happen when the body is compiled too.
As a final note, suppose we wanted things to be a bit clearer, or auto would not work for us for some reason. The following DOES NOT work:
unitsType<ReturnTypeNope<ST> > nopeFunction(){
unitsType<ReturnTypeNope<ST> > val;
val.myData = myData.nopeFunction();
return val;
}
because the return type tries to substitute when this line is reached. We need to keep that deferred layer, so we have to do:
template <typename Q>
unitsType<ReturnTypeNope<Q> > nopeFunction(){
unitsType<ReturnTypeNope<Q> > val; val.myData = myData.nopeFunction(); return val;}
(NOTE we don't have to name this Q, it has no relation to the other Q, it's just handy to re-use the letters for similar purposes where they don't overlap, to avoid running out).
This leaves one FINAL (promise) problem: when calling this wrapper, the type for Q cannot be deduced so we have to do ugly things in the calling code like this:
unitsType<scalar> myTyp;
myTyp.nopeFunction<scalar>();
We fix this with a final twist - a default value for the template parameter Q of ST. Note carefully: this is not defining them to be equal, it is setting a default. We get:
template <typename Q>
using ReturnTypeNope = decltype((std::declval<Q>()).nopeFunction());
template <typename Q=ST>
unitsType<ReturnTypeNope<Q> > nopeFunction(){
unitsType<ReturnTypeNope<Q> > val;
val.myData = myData.nopeFunction();
return val;
}
And now as long as the calling code doesn't call nopeFunction, nothing cares if our ST type implements it or not.
This is, frankly, a bit too deep for sanity, but it does work and is not exactly an edge use-case when dealing with decorators and wrappers.