distinct

Efficiently merge string arrays in .NET, keeping distinct values

落花浮王杯 提交于 2019-12-03 08:08:45
问题 I'm using .NET 3.5. I have two string arrays, which may share one or more values: string[] list1 = new string[] { "apple", "orange", "banana" }; string[] list2 = new string[] { "banana", "pear", "grape" }; I'd like a way to merge them into one array with no duplicate values: { "apple", "orange", "banana", "pear", "grape" } I can do this with LINQ: string[] result = list1.Concat(list2).Distinct().ToArray(); but I imagine that's not very efficient for large arrays. Is there a better way? 回答1:

Spark DataFrame: count distinct values of every column

匿名 (未验证) 提交于 2019-12-03 07:50:05
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 由 翻译 强力驱动 问题: The question is pretty much in the title: Is there an efficient way to count the distinct values in every column in a DataFrame? The describe method provides only the count but not the distinct count, and I wonder if there is a a way to get the distinct count for all (or some selected) columns. 回答1: Multiple aggregations would be quite expensive to compute, thus I'd advise you to use approximation distinct count: val df = Seq (( 1 , 3 , 4 ),( 1 , 2 , 3 ),( 2 , 3 , 4 ),( 2 , 3 , 5 )). toDF ( "col1" , "col2" , "col3" ) val exprs = df

SQL Distinct keyword bogs down performance?

匿名 (未验证) 提交于 2019-12-03 07:36:14
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I have received a SQL query that makes use of the distinct keyword. When I tried running the query it took at least a minute to join two tables with hundreds of thousands of records and actually return something. I then took out the distinct and it came back in 0.2 seconds. Does the distinct keyword really make things that bad? EDIT: here's the query SELECT Distinct c.username, o.orderno, o.totalcredits, o.totalrefunds, o.recstatus, o.reason from management.contacts c join management.orders o on (c.custID = o.custID) where o.recDate > to

Using distinct on a column and doing order by on another column gives an error

妖精的绣舞 提交于 2019-12-03 05:58:30
I have a table: abc_test with columns n_num, k_str. This query doesnt work: select distinct(n_num) from abc_test order by(k_str) But this one works: select n_num from abc_test order by(k_str) How do DISTINCT and ORDER BY keywords work internally that output of both the queries is changed? Abhishek Bhandari As far as i understood from your question . distinct :- means select a distinct(all selected values should be unique). order By :- simply means to order the selected rows as per your requirement . The problem in your first query is For example : I have a table ID name 01 a 02 b 03 c 04 d 04

Oracle 11g SQL to get unique values in one column of a multi-column query

时间秒杀一切 提交于 2019-12-03 05:46:20
问题 Given a table A of people, their native language, and other columns C3 .. C10 represented by ... Table A PERSON LANGUAGE ... bob english john english vlad russian olga russian jose spanish How do I construct a query which selects all columns of one row for each distinct language? Desired Result PERSON LANGUAGE ... bob english vlad russian jose spanish It doesn't matter to me which row of each distinct language makes the result. In the result above, I chose the lowest row number of each

How to make a “distinct” join with MySQL

别说谁变了你拦得住时间么 提交于 2019-12-03 05:42:56
I have two MySQL tables (product and price history) that I would like to join: Product table: Id = int Name = varchar Manufacturer = varchar UPC = varchar Date_added = datetime Price_h table: Id = int Product_id = int Price = int Date = datetime I can perform a simple LEFT JOIN: SELECT Product.UPC, Product.Name, Price_h.Price, Price_h.Date FROM Product LEFT JOIN Price_h ON Product.Id = Price_h.Product_id; But as expected if I have more than one entry for a product in the price history table, I get one result for each historical price. How can a structure a join that will only return one

Efficient Count Distinct with Apache Spark

﹥>﹥吖頭↗ 提交于 2019-12-03 05:30:34
问题 100 million customers click 100 billion times on the pages of a few web sites (let's say 100 websites). And the click stream is available to you in a large dataset. Using the abstractions of Apache Spark, what is the most efficient way to count distinct visitors per website? 回答1: visitors.distinct().count() would be the obvious ways, with the first way in distinct you can specify the level of parallelism and also see improvement in the speed. If it is possible to set up visitors as a stream

DISTINCT with PARTITION BY vs. GROUPBY

会有一股神秘感。 提交于 2019-12-03 05:19:27
问题 I have found some SQL queries in an application I am examining like this: SELECT DISTINCT Company, Warehouse, Item, SUM(quantity) OVER (PARTITION BY Company, Warehouse, Item) AS stock I'm quite sure this gives the same result as: SELECT Company, Warehouse, Item, SUM(quantity) AS stock GROUP BY Company, Warehouse, Item Is there any benefit (performance, readability, additional flexibility in writing the query, maintainability, etc.) of using the first approach over the later? 回答1: Performance:

How to quickly select DISTINCT dates from a Date/Time field, SQL Server

£可爱£侵袭症+ 提交于 2019-12-03 04:36:42
问题 I am wondering if there is a good-performing query to select distinct dates (ignoring times) from a table with a datetime field in SQL Server. My problem isn't getting the server to actually do this (I've seen this question already, and we had something similar already in place using DISTINCT). The problem is whether there is any trick to get it done more quickly. With the data we are using, our current query is returning ~80 distinct days for which there are ~40,000 rows of data (after

Javascript json data grouping [closed]

烈酒焚心 提交于 2019-12-03 04:22:59
问题 This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center. Closed 7 years ago . Sorry if this has been asked before, but I couldn't find a good example of what I'm trying to accomplish. Maybe I'm just not searching for the right