Functional pipes in python like %>% from R's magritrr

后端未结

关注

 14  2206

In R (thanks to magritrr) you can now perform operations with a more functional piping syntax via %>%. This means that instead of coding this: <

Example Code:

run.py

import macropy.activate 
# Activates macropy, modules using macropy cannot be imported before this statement
# in the program.
import target
# import the module using macropy

target.py

from fpipe import macros, fpipe
from macropy.quick_lambda import macros, f
# The `from module import macros, ...` must be used for macropy to know which 
# macros it should apply to your code.
# Here two macros have been imported `fpipe`, which does what you want
# and `f` which provides a quicker way to write lambdas.

from math import sqrt

# Using the fpipe macro in a single expression.
# The code between the square braces is interpreted as - str(sqrt(12))
print fpipe[12 | sqrt | str] # prints 3.46410161514

# using a decorator
# All code within the function is examined for `x | y` constructs.
x = 1 # global variable
@fpipe
def sum_range_then_square():
    "expected value (1 + 2 + 3)**2 -> 36"
    y = 4 # local variable
    return range(x, y) | sum | f[_**2]
    # `f[_**2]` is macropy syntax for -- `lambda x: x**2`, which would also work here

print sum_range_then_square() # prints 36

# using a with block.
# same as a decorator, but for limited blocks.
with fpipe:
    print range(4) | sum # prints 6
    print 'a b c' | f[_.split()] # prints ['a', 'b', 'c']

And finally the module that does the hard work. I've called it fpipe for functional pipe as its emulating shell syntax for passing output from one process to another.

fpipe.py

from macropy.core.macros import *
from macropy.core.quotes import macros, q, ast

macros = Macros()

@macros.decorator
@macros.block
@macros.expr
def fpipe(tree, **kw):

    @Walker
    def pipe_search(tree, stop, **kw):
        """Search code for bitwise or operators and transform `a | b` to `b(a)`."""
        if isinstance(tree, BinOp) and isinstance(tree.op, BitOr):
            operand = tree.left
            function = tree.right
            newtree = q[ast[function](ast[operand])]
            return newtree

    return pipe_search.recurse(tree)

0 讨论(0)

夕颜

2020-11-29 16:53

Pipes are a new feature in Pandas 0.16.2.

Example:

import pandas as pd
from sklearn.datasets import load_iris

x = load_iris()
x = pd.DataFrame(x.data, columns=x.feature_names)

def remove_units(df):
    df.columns = pd.Index(map(lambda x: x.replace(" (cm)", ""), df.columns))
    return df

def length_times_width(df):
    df['sepal length*width'] = df['sepal length'] * df['sepal width']
    df['petal length*width'] = df['petal length'] * df['petal width']

x.pipe(remove_units).pipe(length_times_width)
x

NB: The Pandas version retains Python's reference semantics. That's why length_times_width doesn't need a return value; it modifies x in place.

0 讨论(0)

死守一世寂寞

2020-11-29 16:54
If you just want this for personal scripting, you might want to consider using Coconut instead of Python.

Coconut is a superset of Python. You could therefore use Coconut's pipe operator |>, while completely ignoring the rest of the Coconut language.

For example:
```
def addone(x):
    x + 1

3 |> addone
```
compiles to
```
# lots of auto-generated header junk

# Compiled Coconut: -----------------------------------------------------------

def addone(x):
    return x + 1

(addone)(3)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
慢半拍i

2020-11-29 16:54
The pipe functionality can be achieved by composing pandas methods with the dot. Here is an example below.

Load a sample data frame:
```
import seaborn    
iris = seaborn.load_dataset("iris")
type(iris)
# <class 'pandas.core.frame.DataFrame'>
```
Illustrate the composition of pandas methods with the dot:
```
(iris.query("species == 'setosa'")
     .sort_values("petal_width")
     .head())
```
You can add new methods to panda data frame if needed (as done here for example):
```
pandas.DataFrame.new_method  = new_method
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
伪装坚强ぢ

2020-11-29 16:57
PyToolz [doc] allows arbitrarily composable pipes, just they aren't defined with that pipe-operator syntax.

Follow the above link for the quickstart. And here's a video tutorial: http://pyvideo.org/video/2858/functional-programming-in-python-with-pytoolz
```
In [1]: from toolz import pipe

In [2]: from math import sqrt

In [3]: pipe(12, sqrt, str)
Out[3]: '3.4641016151377544'
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

栀梦

2020-11-29 17:02

One alternative solution would be to use the workflow tool dask. Though it's not as syntactically fun as...

var
| do this
| then do that

...it still allows your variable to flow down the chain and using dask gives the added benefit of parallelization where possible.

Here's how I use dask to accomplish a pipe-chain pattern:

import dask

def a(foo):
    return foo + 1
def b(foo):
    return foo / 2
def c(foo,bar):
    return foo + bar

# pattern = 'name_of_behavior': (method_to_call, variables_to_pass_in, variables_can_be_task_names)
workflow = {'a_task':(a,1),
            'b_task':(b,'a_task',),
            'c_task':(c,99,'b_task'),}

#dask.visualize(workflow) #visualization available. 

dask.get(workflow,'c_task')

# returns 100

After having worked with elixir I wanted to use the piping pattern in Python. This isn't exactly the same pattern, but it's similar and like I said, comes with added benefits of parallelization; if you tell dask to get a task in your workflow which isn't dependant upon others to run first, they'll run in parallel.

If you wanted easier syntax you could wrap it in something that would take care of the naming of the tasks for you. Of course in this situation you'd need all functions to take the pipe as the first argument, and you'd lose any benefit of parallization. But if you're ok with that you could do something like this:

def dask_pipe(initial_var, functions_args):
    '''
    call the dask_pipe with an init_var, and a list of functions
    workflow, last_task = dask_pipe(initial_var, {function_1:[], function_2:[arg1, arg2]})
    workflow, last_task = dask_pipe(initial_var, [function_1, function_2])
    dask.get(workflow, last_task)
    '''
    workflow = {}
    if isinstance(functions_args, list):
        for ix, function in enumerate(functions_args):
            if ix == 0:
                workflow['task_' + str(ix)] = (function, initial_var)
            else:
                workflow['task_' + str(ix)] = (function, 'task_' + str(ix - 1))
        return workflow, 'task_' + str(ix)
    elif isinstance(functions_args, dict):
        for ix, (function, args) in enumerate(functions_args.items()):
            if ix == 0:
                workflow['task_' + str(ix)] = (function, initial_var)
            else:
                workflow['task_' + str(ix)] = (function, 'task_' + str(ix - 1), *args )
        return workflow, 'task_' + str(ix)

# piped functions
def foo(df):
    return df[['a','b']]
def bar(df, s1, s2):
    return df.columns.tolist() + [s1, s2]
def baz(df):
    return df.columns.tolist()

# setup 
import dask
import pandas as pd
df = pd.DataFrame({'a':[1,2,3],'b':[1,2,3],'c':[1,2,3]})

Now, with this wrapper, you can make a pipe following either of these syntactical patterns:

# wf, lt = dask_pipe(initial_var, [function_1, function_2])
# wf, lt = dask_pipe(initial_var, {function_1:[], function_2:[arg1, arg2]})

like this:

# test 1 - lists for functions only:
workflow, last_task =  dask_pipe(df, [foo, baz])
print(dask.get(workflow, last_task)) # returns ['a','b']

# test 2 - dictionary for args:
workflow, last_task = dask_pipe(df, {foo:[], bar:['string1', 'string2']})
print(dask.get(workflow, last_task)) # returns ['a','b','string1','string2']

0 讨论(0)