databricks | 易学教程

How to stop a notebook streaming job gracefully?

阅读更多关于 How to stop a notebook streaming job gracefully?

问题 I have a streaming application which is running into a Databricks notebook job (https://docs.databricks.com/jobs.html). I would like to be able to stop the streaming job gracefully using the stop() method of the StreamingQuery class which is returned by the stream.start() method. That of course requires to either have access to the mentioned streaming instance or to access the context of the running job itself. In this second case the code could look as next: spark.sqlContext.streams.get(

Where is the Delta table location stored?

阅读更多关于 Where is the Delta table location stored?

问题 We just migrated to Databricks Delta from parquet using Hive metastore. So far everything seems to work fine, when I try to print out the location of the new Delta table using DESCRIBE EXTENDED my_table the location is correct although it is different than the one found in the hiveMetastore database. When I access the hiveMetastore database I can successfully identify the target table (also provider is correctly set to Delta). To retrieve the previous information I am executing a join between

Saving Matplotlib Output to DBFS on Databricks

阅读更多关于 Saving Matplotlib Output to DBFS on Databricks

问题 I'm writing Python code on Databricks to process some data and output graphs. I want to be able to save these graphs as a picture file (.png or something, the format doesn't really matter) to DBFS. Code: import pandas as pd import matplotlib.pyplot as plt df = pd.DataFrame({'fruits':['apple','banana'], 'count': [1,2]}) plt.close() df.set_index('fruits',inplace = True) df.plot.bar() # plt.show() Things that I tried: plt.savefig("/FileStore/my-file.png") [Errno 2] No such file or directory: '

How to extract a single (column/row) value from a dataframe using PySpark?

阅读更多关于 How to extract a single (column/row) value from a dataframe using PySpark?

问题 Here's my spark code. It works fine and returns 2517. All I want to do is to print "2517 degrees"...but I'm not sure how to extract that 2517 into a variable. I can only display the dataframe but not extract values from it. Sounds super easy but unfortunately I'm stuck! Any help will be appreciated. Thanks! df = sqlContext.read.format("csv").option("header", "true").option("inferSchema", "true").option("delimiter", "\t").load("dbfs:/databricks-datasets/power-plant/data") df

How to extract a single (column/row) value from a dataframe using PySpark?

阅读更多关于 How to extract a single (column/row) value from a dataframe using PySpark?

get datatype of column using pyspark

阅读更多关于 get datatype of column using pyspark

问题 We are reading data from MongoDB Collection . Collection column has two different values (e.g.: (bson.Int64,int) (int,float) ). I am trying to get a datatype using pyspark. My problem is some columns have different datatype. Assume quantity and weight are the columns quantity weight --------- -------- 12300 656 123566000000 789.6767 1238 56.22 345 23 345566677777789 21 Actually we didn't defined data type for any column of mongo collection. When I query to the count from pyspark dataframe

get datatype of column using pyspark

阅读更多关于 get datatype of column using pyspark

Saving spark dataframe from azure databricks' notebook job to azure blob storage causes java.lang.NoSuchMethodError

阅读更多关于 Saving spark dataframe from azure databricks' notebook job to azure blob storage causes java.lang.NoSuchMethodError

问题 I have created a simple job using notebook in azure databricks. I am trying to save a spark dataframe from notebook to azure blob storage. Attaching the sample code import traceback from pyspark.sql import SparkSession from pyspark.sql.types import StringType # Attached the spark submit command used # spark-submit --master local[1] --packages org.apache.hadoop:hadoop-azure:2.7.2, # com.microsoft.azure:azure-storage:3.1.0 ./write_to_blob_from_spark.py # Tried with com.microsoft.azure:azure

Saving spark dataframe from azure databricks' notebook job to azure blob storage causes java.lang.NoSuchMethodError

阅读更多关于 Saving spark dataframe from azure databricks' notebook job to azure blob storage causes java.lang.NoSuchMethodError

Scala & DataBricks: Getting a list of Files

阅读更多关于 Scala & DataBricks: Getting a list of Files

问题 I am trying to make a list of files in an S3 bucket on Databricks within Scala, and then split by regex. I am very new to Scala. The python equivalent would be all_files = map(lambda x: x.path, dbutils.fs.ls(folder)) filtered_files = filter(lambda name: True if pattern.match(name) else False, all_files) but I want to do this in Scala. From https://alvinalexander.com/scala/how-to-list-files-in-directory-filter-names-scala import java.io.File def getListOfFiles(dir: String):List[File] = { val d