How to transform an SQL table into a list of row sequences using BigQuery and Apache Beam?

谁说我不能喝 提交于 2019-12-13 02:55:09

问题


I have a very large table where each row represents an abstraction called a Trip. Trips consist of numeric columns such as vehicle id, trip id, start time, stop time, distance traveled, driving duration, etc. So each Trip is a 1D vector of floating point values.

I want to transform this table, or list of vectors, into a list of Trip sequences where Trips are grouped into sequences by vehicle id and are in order according to start time. The sequence length needs to be limited to a specific size such as 256 but there can / should be multiple sequences with the same VehicleId.

Example:
(sequence length = 4)

[  
(Vehicle1, [Trip1, Trip2, Trip3, Trip4]),  
(Vehicle1, [Trip5, Trip6, Trip7]),  
(Vehicle2, [Trip1, Trip2, Trip3, Trip4])  
]

I'm trying to model driving patterns based on these Trips using a sequence-based model such as an LSTM / Transformer. Imagine each Trip as a word embedding and each sequence of trips as a sentence. Somehow I need to construct these sentences through a combination of BigQuery / Apache Beam functions (or any other recommended tools) since we're talking about hundreds of gigabytes of data. I'm fairly new to both tools so any help would be greatly appreciated.


回答1:


Below is for BigQuery Standard SQL

#standardSQL
SELECT trip.vehicle_id, ARRAY_AGG(trip ORDER BY trip.start_time) trips
FROM (
  SELECT trip, DIV(ROW_NUMBER() OVER(PARTITION BY vehicle_id ORDER BY start_time) - 1, 4) grp   
  FROM `project.dataset.table` trip
)
GROUP BY trip.vehicle_id, grp

Above assumes ordering of trips by start_time and sequence length = 4
Also, it returns vehicle_id as a part of trip info in array - like in below example

Row vehicle_id  trips.vehicle_id    trips.trip_id   trips.start_time    trips.stop_time  
1   Vehicle1    Vehicle1            Trip1           1                   2    
                Vehicle1            Trip2           2                   3    
                Vehicle1            Trip3           3                   4    
                Vehicle1            Trip4           4                   5    
2   Vehicle1    Vehicle1            Trip5           5                   6    
                Vehicle1            Trip6           6                   6    
                Vehicle1            Trip7           7                   6    
3   Vehicle2    Vehicle2            Trip1           2                   3    
                Vehicle2            Trip2           3                   4    
                Vehicle2            Trip3           4                   5    
                Vehicle2            Trip4           5                   6    

To eliminate this - try below

#standardSQL
SELECT vehicle_id,
  ARRAY( 
    SELECT AS STRUCT * EXCEPT(vehicle_id)
    FROM UNNEST(trips)
    ORDER BY start_time
  ) trips
FROM (
  SELECT trip.vehicle_id, ARRAY_AGG(trip ORDER BY trip.start_time) trips
  FROM (
    SELECT trip, DIV(ROW_NUMBER() OVER(PARTITION BY vehicle_id ORDER BY start_time) - 1, 4) grp   
    FROM `project.dataset.table` trip
  )
  GROUP BY trip.vehicle_id, grp
)


Row vehicle_id  trips.trip_id   trips.start_time    trips.stop_time  
1   Vehicle1    Trip1           1                   2    
                Trip2           2                   3    
                Trip3           3                   4    
                Trip4           4                   5    
2   Vehicle1    Trip5           5                   6    
                Trip6           6                   6    
                Trip7           7                   6    
3   Vehicle2    Trip1           2                   3    
                Trip2           3                   4    
                Trip3           4                   5    
                Trip4           5                   6    


来源:https://stackoverflow.com/questions/58699663/how-to-transform-an-sql-table-into-a-list-of-row-sequences-using-bigquery-and-ap

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!