coalesce

Cast string to number, interpreting null or empty string as 0

匿名 (未验证) 提交于 2019-12-03 03:04:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I have a Postgres table with a string column carrying numeric values. I need to convert these strings to numbers for math, but I need both NULL values as well as empty strings to be interpreted as 0 . I can convert empty strings into null values : # select nullif('',''); nullif -------- (1 row) And I can convert null values into a 0 : # select coalesce(NULL,0); coalesce ---------- 0 (1 row) And I can convert strings into numbers : # select cast('3' as float); float8 -------- 3 (1 row) But when I try to combine these techniques, I get errors:

MySQL GROUP_CONCAT vs. COALESCE concerning NULL values

匿名 (未验证) 提交于 2019-12-03 03:03:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: UPDATE I just noticed that in the server the column table3.note values are NULL and on my local machine they are empty strings. After this embarassing discovery I made some testing and everything works the same on both platforms. And this is what they produce if I have two cells and the second one contains an actual value (the first is NULL ): //1st GROUP_CONCAT(COALESCE(`table3`.`note`, '') SEPARATOR ';') AS `table3_note` //var_dump(): array(2) { [0]=> string(0) "" [1]=> string(4) "Test" } //2nd GROUP_CONCAT(`table3`.`note`) SEPARATOR ';')

MySQL greatest value in row?

匿名 (未验证) 提交于 2019-12-03 02:23:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I'm using MySQL with PHP. This is like my table: (I'm using 3 values, but there are more) id | 1 | 2 | 3 ---+---+---+---- 1 | 3 |12 |-29 2 | 5 |8 |8 3 | 99|7 |NULL I need to get the greatest value's column name in a certain row. It should get: id | maxcol ---+------- 1 | 2 2 | 2 3 | 1 Are there any queries that will do this? I've been trying, but I can't get it to work right. 回答1: Are you looking for something like the GREATEST function? For example: SELECT id, GREATEST(col1, col2, col3) FROM tbl WHERE ... Combine it with a CASE statement to

How to implement coalesce efficiently in R

匿名 (未验证) 提交于 2019-12-03 02:08:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: Background Several SQL languages (I mostly use postgreSQL) have a function called coalesce which returns the first non null column element for each row. This can be very efficient to use when tables have a lot of NULL elements in them. I encounter this in a lot of scenarios in R as well when dealing with not so structured data which has a lot of NA's in them. I have made a naive implementation myself but it is ridiculously slow. coalesce <- function(...) { apply(cbind(...), 1, function(x) { x[which(!is.na(x))[1]] }) } Example a <- c(1, 2, NA

Write single CSV file using spark-csv

匿名 (未验证) 提交于 2019-12-03 01:38:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I am using https://github.com/databricks/spark-csv , I am trying to write a single CSV, but not able to, it is making a folder. Need a Scala function which will take parameter like path and file name and write that CSV file. 回答1: It is creating a folder with multiple files, because each partition is saved individually. If you need a single output file (still in a folder) you can repartition (preferred if upstream data is large, but requires a shuffle): df .repartition(1) .write.format("com.databricks.spark.csv") .option("header", "true")

Changing a SUM returned NULL to zero

折月煮酒 提交于 2019-12-03 01:12:06
I have a stored procedure as follows: CREATE PROC [dbo].[Incidents] (@SiteName varchar(200)) AS SELECT ( SELECT SUM(i.Logged) FROM tbl_Sites s INNER JOIN tbl_Incidents i ON s.Location = i.Location WHERE s.Sites = @SiteName AND i.[month] = DATEADD(mm, DATEDIFF(mm, 0, GetDate()) -1,0) GROUP BY s.Sites ) AS LoggedIncidents 'tbl_Sites contains a list of reported on sites. 'tbl_Incidents contains a generated list of total incidents by site/date (monthly) 'If a site doesn't have any incidents that month it wont be listed. The problem I'm having is that a site doesn't have any Incidents this month

pyspark学习笔记

匿名 (未验证) 提交于 2019-12-02 23:49:02
通过spark指定最终存储文件的个数,以解决例如小文件的问题,比hive方便,直观 有两种方法,repartition,coalesce,并且,这两个方法针对RDD和DataFrame都有 repartition和coalesce的区别: repartition(numPartitions:Int):RDD[T] coalesce(numPartitions:Int,shuffle:Boolean=false):RDD[T] 他们两个都是RDD的分区进行重新划分,repartition只是coalesce接口中shuffle为true的简易实现,(假设RDD有N个分区,需要重新划分成M个分区) N<M。一般情况下N个分区有数据分布不均匀的状况,利用HashPartitioner函数将数据重新分区为M个,这时需要将shuffle设置为true。 如果N>M并且N和M相差不多,(假如N是1000,M是100)那么就可以将N个分区中的若干个分区合并成一个新的分区,最终合并为M个分区,这时可以将shuff设置为false,在shuffle为false的情况下,如果M>N时,coalesce为无效的,不进行shuffle过程,父RDD和子RDD之间是窄依赖关系。 如果N>M并且两者相差悬殊,这时如果将shuffle设置为false,父子RDD是窄依赖关系,他们同处在一个stage中

在hive中使用COALESCE进行空值处理

房东的猫 提交于 2019-12-02 01:53:37
COALESCE (expression_1, expression_2, ...,expression_n)依次参考各参数表达式,遇到非null值即停止并返回该值。如果所有的表达式都是空值,最终将返回一个空值。 如果需要对某列为空时赋予默认值,常使用COALESCE(a,10)  a为列名,10为默认值 来源: https://www.cnblogs.com/fanhuazhixia/p/11724331.html