Saturday, July 8, 2017

Alteryx - Using the Runner Macro / Chaining Workflows / Caching Data How-To

In my conversations with people at Alteryx Inspire, I hear a couple of topics come up frequently: How can I cache data that's already been processed, and how can I schedule workflows to run consecutively?

My answer to both of those questions is the Runner macro.

Yes, there is the cache data tool - but I actually have to do something with that. I have to change the settings from input to output.

Using the method I'm describing here, that caching is automatic.

The first step is to plan out what you want to do with your data, where you want to cache it, how you want to process it. Then you divide those steps into logical sections, and make workflows that only address one section at a time.

In this example, all I'm doing in workflow 1 is creating a .yxdb from my data. That's always a good place to start, because if you have, for instance, a .csv file coming in, Alteryx will process that same data exponentially faster than if it remains a .csv. Yes, there is some processing time to do the conversion, but everything else will run faster after that.



Then, you can make as many other workflows off that yxdb as you like. Maybe in one workflow you do all your joins, and in another you create all your calculations.



The scenario where splitting up my workflows makes the most sense is when I have multiple disparate data sources. I might have to do a lot of processing to get all that data to be homogeneous. But I need to perform different kinds of processing on each of those data sets. So after I convert each data source to a yxdb, I'll make a workflow to process data source 1, another to process data source 2, etc. 

Finally, I'll make a workflow that re-unites all my other workflows.



The magic touch is to then create one last master workflow to run all the others, and this is where the Runner macro comes in. I add a Runner tool for each of my underlying workflows, separating them with Block Until Done tools.





Then, when I'm ready to process my data, I only need to run one workflow - the master workflow with the Runner tools - and everything else is automatic.

It makes it a LOT easier to fix things if they break (and we all know they do sometimes), and I don't have to sit and toy with the cache tool. I can hit run and walk away or do something else.

If you haven't tried this method, I encourage you to experiment. If you have any questions, feel free to comment or ping me directly on Twitter @thizviz

You can download the sample workflows here:

Happy analyzing!