I have an app that sends data to Google Analytics. I am interested in accessing and storing this data on a Hadoop cluster. I am guessing this raw data will be in the form of
since we're supposed to answer the original question, there is no way to get actual raw Google Analytics logs other than by duplicating the server call system.
In other words, you need to use a modified copy of the analytics.js script to point to a hosted webserver that can collect server calls.
Long story short, you want your site to capture hits to http://www.yourdatacollectionserver.com/collect?v=1&t=pageview[...] instead of http://www.google-analytics.com/collect?v=1&t=pageview[...]
This is easily deployed using a tag manager such as Google's GTM, along with normal Google Analytics tags.
That will effectively create log entries in your web server which you can process using an ETL or Snowplow or Splunk or your favorite Python/perl/Ruby text parsing engine.
It is then up to you to process the actual raw logs into something manageable. And before you ask, this is not retroactive.