I have an app that sends data to Google Analytics. I am interested in accessing and storing this data on a Hadoop cluster. I am guessing this raw data will be in the form of
To get GA data click by click you can make queries in a way that gives you the ability to join data together.
First you need to prepare the data in GA. So with each hit you send, add some hashed value or the clientId + some timestamp into a custom dimension. This will give you the ability to join each query result.
E.g. (this is how we do it at Scitylana) This script below hooks into GA's tracking script and makes sure each hit contains a key for later stitching of query results
Of course now you need to make some script that joins all the results you have taken out of GA.