Tableau error “All Fields must be aggregate or constant” when invoking TabPy SCRIPT_REAL

主宰稳场 提交于 2020-02-23 08:17:08

问题


I am calling a TabPy server via a calculated field in a Tableau worksheet to run a hypothesis test: does the rate of Bookings vary significantly by Group?

I have a table such as:

     Group  Bookings
0        A         1
1        A         0
3998     B         1
3999     B         0

In Python, on the same server (using the python 2.7 docker image) the test I want is simply:

from scipy.stats import fisher_exact
df_cont_tbl = pd.crosstab(df['Group'], df['Bookings'])
prop_test = fisher_exact(df_cont_tbl)
print 'Fisher exact test: Odds ratio = {:.2f}, p-value = {:.3f}'.format(*prop_test)

Returns: Fisher exact test: Odds ratio = 1.21, p-value = 0.102

I connected Tableau to the TabPy server and can execute a hello-world calculated field. For example, I get 42 back with the calculated field: SCRIPT_REAL("return 42", ATTR([Group]),ATTR([Bookings]) )

However, I try to invoke the stats function above with a calculated field to extract the p-value:

SCRIPT_REAL("
import pandas as pd
from scipy.stats import fisher_exact
df_cont_tbl = pd.crosstab(_arg1, _arg2)
prop_test = fisher_exact(df_cont_tbl)
return prop_test[1]
", [Group], [Bookings] )

I get the notification: The calculation contains errors with the drop-down All fields must be aggregate or constant when using table calculation functions or fields from multiple data sources

I tried wrapping the inputs with ATTR(), as in:

SCRIPT_REAL("
import pandas as pd
from scipy.stats import fisher_exact
df_cont_tbl = pd.crosstab(_arg1, _arg2)
prop_test = fisher_exact(df_cont_tbl)
return prop_test[1]
",ATTR([Group]), ATTR([Bookings])
)

Which changes the notification to "The calculation is valid" but returns a Pandas ValueError from the server:

An error occurred while communicating with the External Service.
Error processing script
Error when POST /evaluate: Traceback
Traceback (most recent call last):
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/tabpy_server/tabpy.py", line 467, in post
result = yield self.call_subprocess(function_to_evaluate, arguments)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/tornado/gen.py", line 1008, in run
value = future.result()
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/tornado/concurrent.py", line 232, in result
raise_exc_info(self._exc_info)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/tornado/gen.py", line 1014, in run
yielded = self.gen.throw(*exc_info)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/tabpy_server/tabpy.py", line 488, in call_subprocess
ret = yield future
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/tornado/gen.py", line 1008, in run
value = future.result()
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/concurrent/futures/_base.py", line 400, in result
return self.__get_result()
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/concurrent/futures/_base.py", line 359, in __get_result
reraise(self._exception, self._traceback)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/concurrent/futures/_compat.py", line 107, in reraise
exec('raise exc_type, exc_value, traceback', {}, locals_)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/concurrent/futures/thread.py", line 61, in run
result = self.fn(*self.args, **self.kwargs)
File "<string>", line 5, in _user_script
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/pandas/tools/pivot.py", line 479, in crosstab
df = DataFrame(data)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/pandas/core/frame.py", line 266, in __init__
mgr = self._init_dict(data, index, columns, dtype=dtype)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/pandas/core/frame.py", line 402, in _init_dict
return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/pandas/core/frame.py", line 5398, in _arrays_to_mgr
index = extract_index(arrays)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/pandas/core/frame.py", line 5437, in extract_index
raise ValueError('If using all scalar values, you must pass'
ValueError: If using all scalar values, you must pass an index
Error type : ValueError
Error message : If using all scalar values, you must pass an index

Example dataset:

To generate the CSV I am connecting to:

import os
import pandas as pd
import numpy as np
from collections import namedtuple

OUTPUT_LOC = os.path.expanduser('~/TabPy_demo/ab_test_demo_results.csv')

GroupObs = namedtuple('GroupObs', ['name','n','p'])

obs = [GroupObs('A',3000,.10),GroupObs('B',1000,.13)] 
# note true odds ratio = (13/87)/(10/90) = 1.345

np.random.seed(2019)

df = pd.concat( [ pd.DataFrame({'Group': grp.name,
                                'Bookings':  pd.Series(np.random.binomial(n=1, 
                                                            p=grp.p, size=grp.n))
                              }) for grp in obs
                  ],ignore_index=True )

df.to_csv(OUTPUT_LOC,index=False)

回答1:


Old question, but perhaps this will help someone else. There are a couple of issues. First is in relation to the way the data is passed to the pd.crosstab. Tableau passes a list of values to the tabpy server so wrap this in an array to fix your error you are getting.

SCRIPT_REAL(
  "
   import pandas as pd
   import numpy as np
   from scipy.stats import fisher_exact
   df_cont_tbl = pd.crosstab(np.array(_arg1), np.array(_arg2))
   prop_test = fisher_exact(df_cont_tbl)
   return prop_test[1]
   ", 
   attr([Group]), attr([Bookings])

   )

Another problem is the way the table calculation is being performed. You want to send tabpy two lists of information each as long as your table. In the default case tableau wants to calculate at the row level which is not going to work.

I included the row count F1 into the csv that I built the workbook on and made sure to calculate the python value along this function.

Now when you put F1 into the worksheet it will return the P-value as many times as you have rows, A workaround for this is to wrap your calculation in another calculation to only return the value if it is the first row and place this in your worksheet.

Now you can place this into a worksheet.



来源:https://stackoverflow.com/questions/56400669/tableau-error-all-fields-must-be-aggregate-or-constant-when-invoking-tabpy-scr

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!