Current Projects

Recent Posts
stencet v0.1.16
release
01 April 2013
rikitiki v0.1.67
release
01 April 2013
Working with tuples part II
blog
27 March 2013
stencet v0.0.6
release
24 March 2013

All Posts...

Working with tuples

16 March 2013 -
By

Synopsis

Efficiently do things like:

struct PrintFunctor {
  template <typename T> 
  size_t operator()( T& s) {
    std::cout << s << std::endl;
    return sizeof(T);
  }

  template <typename T> 
  size_t operator()(size_t i, T& s) {
    std::cout << i << ": " << s << std::endl;
    return sizeof(T) + i;
  }

};

int main(){
  auto tuple = std::make_tuple("Hello", 12, 'w');
  PrintFunctor functor;

  // Print 'Hello'	
  tupleExt::run(functor, 0, tuple); 

  // Print '12'
  size_t size = tupleExt::get(functor, 1, tuple);
    
  // Print 'is size 4' (Or whatever sizeof(int) is on your system).  
  std::cout << "Is size " << size << std::endl;  

  // Print 'Hello', '12', 'w' each on its own line. 
  tupleExt::iterate(functor, tuple); 

  // Print '0: Hello', '1: 12', '2: w' each on its own line. 
  tupleExt::iterate_i(functor, tuple); 

  // Print 'Hello', '12', 'w' each on its own line. 
  // but also returns a vector with all the sizes mapped. 
  std::vector<size_t> sizes = tupleExt::to_vector(functor, tuple);

  // Print 13 -- the addition of all the types in the tuple
  std::cout << tupleExt::fold(functor, tuple, 0) << std::endl;
}

Working with Tuples

Tuples can get a little annoying to work with in C++ if you aren’t used to working with tuples in a statically driven language. For instance, in python, this is perfectly acceptable:

tuple = ("Hello", 12, 'w')
for i in range(0, 3):
    print(tuple[i]);

Analogously, in C++:

auto tuple = std::make_tuple("Hello", 12, 'w');	
for(int i = 0;i < 3;i++)
	std::cout << std::get<i>(tuple) << std::endl;

Gives the following error:

the value of ‘i’ is not usable in a constant expression. 
note: ‘int i’ is not const

This highlights the troubles with tuples in a static language — C++ must know at compile time exactly what type that ‘std::get’ needs to return, and to do that it needs to know the value of ‘i’.

Remember: std::get isn’t a single function you are passing the variable N to. It’s multiple functions(one per element in the tuple) and you are specifying which one to use with N.

What we really need is something that knows how to print any given type in our tuple. Something like:

template <typename T> 
void print()( T& s) {
     std::cout << s << std::endl;
}

So much like std::get is a set of functions, one per N, this print function is a set of functions, one per type T given to it. What we want to do now is something like:

auto tuple = std::make_tuple("Hello", 12, 'w');	
for(int i = 0;i < 3;i++)
	std::cout << run(i, &print, tuple) << std::endl; // Will not work!

But since print isn’t a single function, we can’t dereference it quite like that. What we can do though, is use a functor which has the print definition in it:

struct PrintFunctor {
  template <typename T> 
  void operator()( T& s) {
      std::cout << s << std::endl;
  }
};

And then do:

auto tuple = std::make_tuple("Hello", 12, 'w');	
PrintFunctor print;
for(int i = 0;i < 3;i++) {
	// Will work, once we define the 'run' function.
	std::cout << run(print, i, tuple) << std::endl; 
}

The Run Function

And now we only need the run function.

The basic strategy for doing operations on tuples is to start at the 0th element and recursively walk through to the next element, to the last element. We can create our run function with this same pattern:

template <std::size_t N = 0, typename F, typename... Args>
  auto inline run(F& f, const size_t i, const std::tuple<Args...>& t) 
  -> typename std::enable_if<N == sizeof...(Args), void>::type 
  { }

template <std::size_t N = 0, typename F, typename... Args>
  auto inline run(F& f, const size_t i, const std::tuple<Args...>& t)
  -> typename std::enable_if<N < sizeof...(Args), void>::type 
  {
    if(i == N)
      f( std::get<N>(t) );    
    else
      run<N+1, F, Args...>(f, i, t);
  }

But what about ‘efficient’?

Since our run function starts at ‘0’, and tries each index in succession, it has a run-time of O(n). For a function that simply runs against one thing at an index, this is non-viable in general.

If we think about how this code looks after inlining, it should look like this:

if(i == 0)
  f( std::get<0>(t) );    
else if(i == 1)
  f( std::get<1>(t) );    
else if(i == 2)
  f( std::get<2>(t) );    
...
else if(i == sizeof...(Args) - 1)
  f( std::get<sizeof...(Args) - 1>(t) );    

If we saw this code like this, we’d want to replace that if chain with a switch statement; which would jump to the right function call immediately. In fact, one might suppose the compiler sees the above, and optimizes exactly into a switch statement which does a jump (constant time) based on i. We can easily test this — if run(print, 0, tuple) runs as fast as run(print, N-1, tuple), then the switch optimization happened.

We’ll test it with:

template <typename... T> static auto make_2(T&... t) RETURN( std::make_tuple(t..., t...))
template <typename... T> static auto make_4(T&... t) RETURN( make_2(t..., t...))
template <typename... T> static auto make_8(T&... t) RETURN( make_4(t..., t...))
template <typename... T> static auto make_16(T&... t) RETURN( make_8(t..., t...))
template <typename... T> static auto make_32(T&... t) RETURN( make_16(t..., t...))
template <typename... T> static auto make_64(T&... t) RETURN( make_32(t..., t...))
template <typename... T> static auto make_128(T&... t) RETURN( make_64(t..., t...))
template <typename... T> static auto make_256(T&... t) RETURN( make_128(t..., t...))
template <typename... T> static auto make_512(T&... t) RETURN( make_256(t..., t...))

int g = 0; // We use this so that our operator isn't a no op, and gcc can't optimize it out
      	   // entirely. 

struct TestFunctor {
  inline int operator()( const std::string& s) {
    return g+=s.size();
  }
};

int main(int argc, char** argv){
  TestFunctor f;
  std::string s("Testing");
  auto data = make_64(s); // 128 and up crash my gcc...
  int i = argc == 2 ? atoi(argv[1]) : 0; 
  for(int z = 0;z < 1000000000;z++)
     tupleExt::run(f, i, data);
  std::cerr << i << std::endl; // Print our index used
  std::cerr << g << std::endl; // Sanity check -- should be constant
  	       	    	       // regardless of i. 
}

On my machine (gcc 4.7.2), I get the following:

$ g++ -std=c++11 -O3 tupleTest.cc -o tupleTest
$ time ./tupleTest 0
0
-1884901888
2.144 secs
$ time ./tupleTest 63
63
-1884901888
33.103 secs

So we can conclude that it in fact does not optimize it for us. Notice that clang 3.2 gave me similar results. Now, if we could demonstrate that the compiler did the heavy lifting for us, there would be arguments for and against leaving our code like this. But two out of three mainstream compilers that can operate with variadic arguments this large don’t, so realistically we need to find another approach.

Implementing a jump table

If the compiler won’t do it for us, we’ll have to make a jump table ourselves.

The basic idea behind being able to do this is that, while each std::get function returns a different type, and each functor operation takes in a different type, the composition of both operations has the same signature — namely, it takes in a functor and a tuple, with no return type. (It is straight forward to implement one that returns a value, so long as the return types are all consistent/covariant — see ‘get’ in tuple_ext.h).

Our basic strategy is going to be this: Define an array of functions of our desired type — void(Functor&, std::tuple&) — with the same size as the number of elements in the tuple — sizeof…(Args). On the first pass through, populate this array with a function that handles each tuple element(Linear time, for those who care). On each subsequent time through, simply execute the function at the desired index in the array against the passed in tuple and functor.

template <typename F, typename FT, std::size_t N, typename... Args>
auto inline make_jumptable(FT* jump[sizeof...(Args)]) 
-> typename std::enable_if< N == sizeof...(Args), void>::type {}

template <typename F, typename FT, std::size_t N, typename... Args>
auto inline make_jumptable(FT* jump[sizeof...(Args)]) 
-> typename std::enable_if< N < sizeof...(Args), void>::type 
{
  jump[N] = [](F& f, const std::tuple<Args...>& t) { f( std::get<N>(t)); };  
  make_jumptable<F, FT, N+1, Args...>(jump);
}

template <typename F, typename... Args>
auto inline run(F& f, const size_t i, const std::tuple<Args...>& t) -> void
{
  using FT = void ( F&, const std::tuple<Args...>& );
  static FT* jump[sizeof...(Args)];
  static bool init = false;
  if(!init){
    make_jumptable<F, FT, 0, Args...>(jump);
    init = true;
    std::cout << "Test!" << std::endl;
  }
  jump[i](f, t);
}

So instead of ‘run’ being recursive, we created a ‘make_jumptable’ recursively defined function which fills in our jump table. Since this recursive bit is only called the first time, every time after that is constant. With this new code, our test runs return:

$ time ./tupleTest 0
0
-1884901888
3.183 secs
$ time ./tupleTest 63
63
-1884901888
3.168 secs

This confirms that it doesn’t matter the index of the desired element, we are running in constant time.

That isn’t actually sufficient to show that time access is constant — we also have to make sure that accessing the first element in a tuple with 1 element has roughly the same run-time as accessing the last element of one with sixty-four elements. Rerunning against an 8 tuple (under the assumption that anything smaller than 5 could be optimized specifically) returned similar results — over a couple of runs the times averaged out to be roughly the same.

Also, tests were ran against just directly using std::get the same number of times. The results being that our ‘run’ function is roughly half as fast as std::get. This is roughly in line with intuition though — empty virtual functions are about twice as slow as direct, and what we are doing is roughly analogous to virtual functions (if virtual template functions were supported, we’d be using them instead to do this).

In theory, the generated machine code could be faster. Since we are using an array of function pointers, there is no way for the compiler to inline the actual composed function call. But it isn’t by any means slow either.

Other Functions

From this, and the last post, we have the working for both direct access operators into a tuple and iterative operators into a tuple. From there we can define a whole host of functions.

Parting Words

It is worth noting, that just because you can do these things, doesn’t mean you should. If you need this sort of functionality, it is worthwhile to take a step back and evaluate whether you might want to use another language construct. There are times when tuple based functionality like this is very useful, but it is no replacement for class inheritance/virtual members.

Also you should understand the resource implications of heavy template use. We are creating a jump table and needed functions for every combination of tuple and functor used. For each type in your tuple, the functor is creating a function too. It is a very powerful set of constructs which give very fast execution times, however it comes at the cost of heavy code replication in the resulting assembly.

Source

Source from this post, including the demo at the top, can be found here. A header file with the given tuple transforms is here.

Corrections:

mttd pointed out that a few of the functors were operating with int, when size_t is more appropriate.

comments powered by Disqus