These past few months I’ve seen a rapid rise in the use of Piwik as a webanalytics solution. My clients are choosing Piwik as an alternative to paid solutions like ComScore Digital Analytix or free solutions like Google Analytics. Benefits include the data storage location, ownership of data and the ability to access non aggregate data.
Piwik stores clickstream data like visitors, visits and pageviews but is also able to store information like onsite search queries and eCommerce data. Data is stored within a MySQL database which can be hosted wherever you want giving you full control over data ownership and data storage location. I’ve seen implementations with over >100M interactions a year and I am currently working within an organization implementing a 50M interactions a year setup.
Analyzing the data gathered by Piwik can be done through their web interface which allows you to generate reports and segment the data, but as a Tableau enthusiast I was also curious if I could connect directly to the underlying MySQL database. It turns out you can!
Piwik has excellent documentation on their data model available on their website. There is a description for each table and field in the database and what the relationship is between the various tables. An excellent starting point for your analysis.
Connecting Tableau to the Piwik MySQL database
You can use the default MySQL connector available within Tableau to access the Piwik database. All you have to do is fill out the database url or ip, provide the database credentials and your good to go!
Note that you are accessing a live database. Each change to your worksheet causes one or more calls to the database which could have a severe impact on performance of your database server. I usually create an extract with incremental updates (in this example on server_time field) and publish that to Tableau Server which in turn updates the data every night when my Piwik database server is under minimum load. At this point you could also setup filters to reduce the size of your dataset and extract.
Creating a top 10 most viewed pages
Piwik is storing all interactions registered on the site in the table piwik_log_link_visit_action and stores all other information available on these actions in the table piwik_log_action. Creating a left join between these tables allows you to combine the id and the amount of interactions of an action with the url of the requested page and/or it’s page title. I am using a left join on idaction_url from piwik_log_link_visit_action with idaction from piwik_log_action. This allows me to create a list with the top performing url’s within my site. You could also join with the field idaction_name which allows you to create a list with page titles.
After creating the connection you have to move all fields from Measures to Dimensions with an exception for custom_float and time_spent_ref_action.
Create a filter on site_id from piwik_log_link_visit_action to only select pages from the site you are interested in. This is only needed if you have more than 1 site within your Piwik implementation (you could also do a left join with the piwik_site table on the field idsite to see the name and url of the selected site).
Piwik stores all actions including pageviews, events, external links, downloads en mailto links so we’ll have to filter down to only include pageviews. We can do this by applying a filter on the type field from the piwik_log_action table. In this case we only want to include actions with value 1.
Put the sum of the Number of Records from the measures pane (you can duplicate and rename this if you want) on Columns and the name field from the piwik_log_action table on Rows. You should now get a horizontal bar chart. Sort this top down.
To limit the view to a Top 10 we need to create a boolean calculated field with INDEX()<=10 and drag this to the filter shelf only accepting the value true. Make sure this field is calculated Table (Down).
Other visualizations and analysis
In the next few weeks I will write some more guides on how to create often used webanalytics visualizations. I would love to hear if you have any comments or suggestions on how to improve the above described method or if you have any visualizations or analysis you would like to see build with Piwik and Tableau.