uniq

文本处理器 - wc cut sort uniq

爷,独闯天下 提交于 2019-11-27 04:47:38
wc   word count统计文本文件中的字符个数   用法: wc filename   行数 字符个数 文件大小 (字节) 文件名   -l   -w   -c cut   用来做文件分隔   -d 指定分隔符(delimiter)     -d[ :]不行   -f 指定输出的列数据:     -f2     -f1-3     -f1,3   --output-delimiter='xx'   [cut的局限性:1、-d指定分隔符,不能同时指定多个2、不能做高级的格式化输出;所以我要熟练掌握awk] sort   排序,默认查看第一个字符(包括数字字母以及空格和特殊字符),以ASCII码来排序(大小写不是)   -f 忽略大小写的差异, 例如A与a视为编码相同;   -b忽略最前面的空格符部分;   -M以月份的名字来排序,例如JAN DEC等等的排序方法;   -n使用纯数字进行排序,默认是以文件形态来排序的;   -r反向排序;   -u就是uniq,相同的数据中,仅出现一行代表;   -t分隔符,默认是用tab键分割;   -k以那个区间(field)来进行排序的意思 uniq   连续且相同的命令,才被视为重复   建议,先排序,后去重 tar 归档-》 对目录   -c统计某些字符重复次数(重要)   sort filename | uniq -c 来源:

文本处理工具 – wc,cut,sort,uniq

两盒软妹~` 提交于 2019-11-27 04:46:44
wc命令: word count 统计文本中的字符个数 -l:行数 -w:单词数 -c:字节数 cut命令: 用来做文件分隔 -d DELIMETER:指明分隔符; -f 指定输出的数据 #:第#个字段 #,#[,#]:离散的多个字段,例如1,3,6 #-#:连续多个字段,例如1-6 混合使用:1-3,7 --output-delimiter=‘xx’:指定以什么字符串输出; sort命令: 排序,默认查看第一个字符(包括数字字母以及空格和特殊符号),以ASCII码来排序 -f:忽略字符的大小写 -r:逆序 -t DELIMETER:指定字段分隔符; -k #:以指定的字段为标准排序; -n:以数值大小进行排序 -u:uniq,排序后去重; uniq命令: 连续且相同的命令,才被视为重复 -d, --repeated:仅显示重复的行; -u, --unique:仅显示不曾重复的行; * -c, --count:统计某些字符重复的次数 sort FINENAME | uniq -c 来源: https://www.cnblogs.com/azuressy/p/11344106.html

Rails 3, ActiveRecord, PostgreSQL - “.uniq” command doesn't work?

两盒软妹~` 提交于 2019-11-27 02:39:52
问题 I have following query: Article.joins(:themes => [:users]).where(["articles.user_id != ?", current_user.id]).order("Random()").limit(15).uniq and gives me the error PG::Error: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list LINE 1: ...s"."user_id" WHERE (articles.user_id != 1) ORDER BY Random() L... When I update the original query to Article.joins(:themes => [:users]).where(["articles.user_id != ?", current_user.id]).order("Random()").limit(15)#.uniq so the error

Sort and keep a unique duplicate which has the highest value

China☆狼群 提交于 2019-11-26 23:32:50
问题 I have a file like the one shown below, I want to keep the combinations between the first and second field which has the highest value on the third field(the ones with the arrows, arrows are not included in the actual file) . 1 1 10 1 1 12 <- 1 2 6 <- 1 3 4 <- 2 4 32 2 4 37 2 4 39 2 4 40 <- 2 45 12 2 45 15 <- 3 3 12 3 3 15 3 3 17 3 3 19 <- 3 15 4 3 15 9 <- 4 17 25 4 17 28 4 17 32 4 17 36 <- 4 18 4 <- in order to have and output like this: 1 1 12 1 2 6 1 3 4 2 4 40 2 45 15 3 3 19 3 15 9 4 17

Is there a way to &#39;uniq&#39; by column?

浪子不回头ぞ 提交于 2019-11-26 14:02:32
I have a .csv file like this: stack2@example.com,2009-11-27 01:05:47.893000000,example.net,127.0.0.1 overflow@example.com,2009-11-27 00:58:29.793000000,example.net,255.255.255.0 overflow@example.com,2009-11-27 00:58:29.646465785,example.net,256.255.255.0 ... I have to remove duplicate e-mails (the entire line) from the file (i.e. one of the lines containing overflow@example.com in the above example). How do I use uniq on only field 1 (separated by commas)? According to man , uniq doesn't have options for columns. I tried something with sort | uniq but it doesn't work. sort -u -t, -k1,1 file -u

Remove duplicate lines without sorting [duplicate]

风流意气都作罢 提交于 2019-11-26 06:56:14
问题 This question already has an answer here: How to delete duplicate lines in a file without sorting it in Unix? 8 answers I have a utility script in Python: #!/usr/bin/env python import sys unique_lines = [] duplicate_lines = [] for line in sys.stdin: if line in unique_lines: duplicate_lines.append(line) else: unique_lines.append(line) sys.stdout.write(line) # optionally do something with duplicate_lines This simple functionality (uniq without needing to sort first, stable ordering) must be

访问网站ip地址统计过滤与Linux缺少编译环境解决

家住魔仙堡 提交于 2019-11-26 04:03:08
【访问网站ip地址统计(已去重)实用查询】 (1)统计IP访问量 awk '{print $1}' access.log |sort|uniq |wc -l (2)统计IP重复次数 awk '{print $1}' access.log |sort|uniq -c (3)统计访问量次数最多IP,前10名 awk '{print $1}' access.log |sort|uniq -c|head -n 10 (4)日志中找出访问次数最多的几个分钟 awk '{print $4}' access.log|cut -c 14-18 |sort|uniq -c|sort -nr|head (5)日志中找到访问最多的页面 awk '{print $7}' access.log |sort|uniq -c|sort -nr|head 【缺少编译环境解决】 如果安装出现在下面的错误是缺少编译环境,安装编译源码所需的工具和库。 “ ./configure: error: C compiler cc is not found ” 解决:yum install gcc gcc-c++ ncurses-devel perl 来源: 51CTO 作者: 天使不会KU 链接: https://blog.51cto.com/13520779/2153766

Is there a way to &#39;uniq&#39; by column?

偶尔善良 提交于 2019-11-26 03:36:41
问题 I have a .csv file like this: stack2@example.com,2009-11-27 01:05:47.893000000,example.net,127.0.0.1 overflow@example.com,2009-11-27 00:58:29.793000000,example.net,255.255.255.0 overflow@example.com,2009-11-27 00:58:29.646465785,example.net,256.255.255.0 ... I have to remove duplicate e-mails (the entire line) from the file (i.e. one of the lines containing overflow@example.com in the above example). How do I use uniq on only field 1 (separated by commas)? According to man , uniq doesn\'t

How to select unique elements

余生长醉 提交于 2019-11-26 01:44:26
问题 I would like to extend the Array class with a uniq_elements method which returns those elements with multiplicity of one. I also would like to use closures to my new method as with uniq . For example: t=[1,2,2,3,4,4,5,6,7,7,8,9,9,9] t.uniq_elements # => [1,3,5,6,8] Example with closure: t=[1.0, 1.1, 2.0, 3.0, 3.4, 4.0, 4.2, 5.1, 5.7, 6.1, 6.2] t.uniq_elements{|z| z.round} # => [2.0, 5.1] Neither t-t.uniq nor t.to_set-t.uniq.to_set works. I don\'t care of speed, I call it only once in my

web服务器日志里面访问次数最多的IP

本小妞迷上赌 提交于 2019-11-26 00:46:52
apache日志里面访问次数最多的IP 假设apache日志格式为: 118.78.199.98 – - [09/Jan/2010:00:59:59 +0800] “GET /Public/Css/index.css HTTP/1.1″ 304 – “ http://www.a.cn/common/index.php ” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; GTB6.3)” 问题1:在apachelog中找出访问次数最多的10个IP。 awk '{print $1}' apache_log |sort |uniq -c|sort -nr|head -n 10 awk 首先将每条日志中的IP抓出来,如日志格式被自定义过,可以 -F 定义分隔符和 print指定列; sort进行初次排序,为的使相同的记录排列到一起; upiq -c 合并重复的行,并记录重复次数。 head进行前十名筛选; sort -nr按照数字进行倒叙排序。 我参考的命令是: 显示10条最常用的命令 sed -e "s/| //n/g" ~/.bash_history | cut -d ' ' -f 1 | sort | uniq -c | sort -nr | head 问题2:在apache日志中找出访问次数最多的几个分钟。 awk