In pyspark 1.6.2, I can import col
function by
from pyspark.sql.functions import col
but when I try to look it up in the Githu
As pointed out by @zero323, there are several spark functions that have wrappers generated at runtime by adding to the globals dict, then adding those to __all__
. As pointed out by @vincent-claes referencing the functions using the function
path (as F
or as something else, I prefer something more descriptive) can make it so the imports don't show an error in PyCharm. However, as @nexaspx alluded to in a comment on that answer, that shifts the warning to the usage line(s). As mentioned by @thomas pyspark-stubs can be installed to improve the situation.
But, if for some reason adding that package is not an option (maybe you are using a docker image for your environment and can't add it to the image right now), or it isn't working, here is my workaround: first, add an import for just the generated wrapper with an alias, then disable the inspection for just that import. This allows all the usages to still have inspections for other functions in the same statement, reduces the warning points to just one, and then ignores that one warning.
from pyspark.sql import functions as pyspark_functions
# noinspection PyUnresolvedReferences
from pyspark.sql.functions import col as pyspark_col
# ...
pyspark_functions.round(...)
pyspark_col(...)
If you have several imports, group them like so to have just one noinspection
:
# noinspection PyUnresolvedReferences
from pyspark.sql.functions import (
col as pyspark_col, count as pyspark_count, expr as pyspark_expr,
floor as pyspark_floor, log1p as pyspark_log1p, upper as pyspark_upper,
)
(this is how PyCharm formatted it when I used the Reformat File
command).
While we're on the subject of how to import pyspark.sql.functions
, I recommend not importing the individual functions from pyspark.sql.functions
to avoid shadowing Python builtins which can lead to obscure errors, as @SARose states.