I\'m trying to replicate, roughly, the dplyr package from R using Python/Pandas (as a learning exercise). Something I\'m stuck on is the \"piping\" functionality.
In
While I can't help mentioning that using dplyr in Python might the closest thing to having in dplyr in Python (it has the rshift operator, but as a gimmick), I'd like to also point out that the pipe operator might only be necessary in R because of its use of generic functions rather than methods as object attributes. Method chaining gives you essentially the same without having to override operators:
dataf = (DataFrame(mtcars).
filter('gear>=3').
mutate(powertoweight='hp*36/wt').
group_by('gear').
summarize(mean_ptw='mean(powertoweight)'))
Note wrapping the chain between a pair of parenthesis lets you break it into multiple lines without the need for a trailing \ on each line.