问题
Related issue: How to create dummy variable columns for thousands of categories in Google BigQuery
I have a table of list of weighted edges which is a list of user-item rating, it looks like this:
| userId | itemId | rating
| 001 | 001 | 5.0
| 001 | 002 | 4.0
| 002 | 001 | 4.5
| 002 | 002 | 3.0
I want to convert this weighted edge list into a adjacency matrix:
| userId | item001 | item002
| 001 | 5.0 | 4.0
| 002 | 4.5 | 3.0
According to this post, we can do it in two steps, the first step is to extract the matrix entry's value to generate a query, and second step is to run the query which is generated from 1st step.
But my question is how to extract the rating value and use the rating value in the IF()
statement? My intuition is to put a nested query inside the IF()
statement such like:
IF(itemId = blah,
(select rating
from mytable
where
userId = blahblah
and itemId = blah),
0)
But this query looks too expensive, can someone give me an example?
Thanks
回答1:
Unless I am missing something - it is quite similar to the post you referenced
Step 1 - generate query
SELECT 'SELECT userID, ' +
GROUP_CONCAT_UNQUOTED(
'SUM(IF(itemId = "' + STRING(itemId) + '", rating, 0)) AS item' + STRING(itemId)
)
+ ' FROM YourTable GROUP BY userId'
FROM (
SELECT itemId
FROM YourTable
GROUP BY itemId
)
Step 2 - run generated query
SELECT
userID,
SUM(IF(itemId = "001", rating, 0)) AS item001,
SUM(IF(itemId = "002", rating, 0)) AS item002
FROM YourTable
GROUP BY userId
Result as expected
userID item001 item002
001 5.0 4.0
002 4.5 3.0
来源:https://stackoverflow.com/questions/37825142/build-adjacency-matrix-from-list-of-weighted-edges-in-bigquery