ST_geomfromtext function using Spark / java

后端 未结 2 1997
南旧
南旧 2020-12-20 03:39

Since the ST_GeomFromText is not the part of org.apache.spark.sql.functions so it will not recognise it internally.I need to first define the UDF for this function. means I

相关标签:
2条回答
  • 2020-12-20 04:06

    Similar question-

    1. GeoSpark librairy using Spark Java
    2. From ResultSet to Spark dataframe using Java
    3. GeoSpark using Spark / Java
    4. Undefined function: 'ST_GeomFromText' Using Spark / Java

    I think, you haven't followed the GeoSparkSQL-Overview/#quick-start thoroughly-

    1. As per the quick start you need to Add GeoSpark-core and GeoSparkSQL into your project POM.xml or build.sbt
    <!-- Geo spark lib doc - https://datasystemslab.github.io/GeoSpark/api/sql/GeoSparkSQL-Overview/#quick-start-->
            <dependency>
                <groupId>org.datasyslab</groupId>
                <artifactId>geospark-sql_2.3</artifactId>
                <version>1.3.1</version>
            </dependency>
            <!-- https://mvnrepository.com/artifact/com.vividsolutions/jts -->
            <dependency>
                <groupId>com.vividsolutions</groupId>
                <artifactId>jts</artifactId>
                <version>1.13</version>
            </dependency>
            <!-- https://mvnrepository.com/artifact/org.datasyslab/geospark-viz -->
            <dependency>
                <groupId>org.datasyslab</groupId>
                <artifactId>geospark-viz_2.3</artifactId>
                <version>1.3.1</version>
            </dependency>
            <dependency>
                <groupId>org.datasyslab</groupId>
                <artifactId>geospark</artifactId>
                <version>1.3.1</version>
            </dependency>
    
    1. Declare your Spark Session
    SparkSession sparkSession = SparkSession.builder()
                    .config("spark.serializer", KryoSerializer.class.getName())
                    .config("spark.kryo.registrator", GeoSparkKryoRegistrator.class.getName())
                    .master("local[*]")
                    .appName("myGeoSparkSQLdemo")
                    .getOrCreate();
    
    1. Register all the functions from geospark-sql_2.3 to the sparkSession so that it can be used directly spark-sql
    // register all functions from geospark-sql_2.3 to sparkSession
    GeoSparkSQLRegistrator.registerAll(sparkSession);
    

    Now Here is the working example-

       SparkSession sparkSession = SparkSession.builder()
                    .config("spark.serializer", KryoSerializer.class.getName())
                    .config("spark.kryo.registrator", GeoSparkKryoRegistrator.class.getName())
                    .master("local[*]")
                    .appName("myGeoSparkSQLdemo")
                    .getOrCreate();
    
            // register all functions from geospark-sql_2.3 to sparkSession
            GeoSparkSQLRegistrator.registerAll(sparkSession);
            try {
                System.out.println(sparkSession.catalog().getFunction("ST_Geomfromtext"));
                // Function[name='ST_GeomFromText', className='org.apache.spark.sql.geosparksql.expressions.ST_GeomFromText$', isTemporary='true']
            } catch (Exception e) {
                e.printStackTrace();
            }
            // https://datasystemslab.github.io/GeoSpark/api/sql/GeoSparkSQL-Function/
            Dataset<Row> dataframe = sparkSession.sql("select ST_GeomFromText('POINT(-7.07378166 33.826661)')");
            dataframe.show(false);
            dataframe.printSchema();
            /**
             * +---------------------------------------------+
             * |st_geomfromtext(POINT(-7.07378166 33.826661))|
             * +---------------------------------------------+
             * |POINT (-7.07378166 33.826661)                |
             * +---------------------------------------------+
             */
    
            // using longitude and latitude column from existing dataframe
            Dataset<Row> df = sparkSession.sql("select -7.07378166 as longitude, 33.826661 as latitude");
            df.withColumn("ST_Geomfromtext ",
                    expr("ST_GeomFromText(CONCAT('POINT(',longitude,' ',latitude,')'))"))
            .show(false);
            /**
             * +-----------+---------+-----------------------------+
             * |longitude  |latitude |ST_Geomfromtext              |
             * +-----------+---------+-----------------------------+
             * |-7.07378166|33.826661|POINT (-7.07378166 33.826661)|
             * +-----------+---------+-----------------------------+
             */
    
    0 讨论(0)
  • 2020-12-20 04:20

    I think you should use a library like GeoSpark for that. I don't see that the function ST_Geomfromtext is there but it works for other formats like WKT https://datasystemslab.github.io/GeoSpark/api/sql/GeoSparkSQL-Constructor/#st_geomfromwkt. There are lots of other options and functions already implemented on geometrical data types, which I believe they will make your life much easier to calculate areas, crossing points, intersections, etc (for example) if you have to do it.

    I am not sure what DB are you using (Postgis, SQL Server Spacial, etc.) but the definition of that function ST_Geomfromtext may slightly differ among them but WKT should be same as it's a standard definition of geometry

    Hope this helps

    0 讨论(0)
提交回复
热议问题