uniq | 易学教程

文本处理器 - wc cut sort uniq

阅读更多关于文本处理器 - wc cut sort uniq

wc 　　word count统计文本文件中的字符个数　　用法： wc filename 　　行数字符个数文件大小 (字节) 文件名　　-l 　　-w 　　-c cut 　　用来做文件分隔　　-d 指定分隔符（delimiter）　　　　-d[ :]不行　　-f 指定输出的列数据: 　　　　-f2 　　　　-f1-3 　　　　-f1,3 　　--output-delimiter='xx' 　　[cut的局限性：1、-d指定分隔符，不能同时指定多个2、不能做高级的格式化输出；所以我要熟练掌握awk] sort 　　排序，默认查看第一个字符（包括数字字母以及空格和特殊字符），以ASCII码来排序（大小写不是）　　-f 忽略大小写的差异，例如A与a视为编码相同；　　-b忽略最前面的空格符部分；　　-M以月份的名字来排序，例如JAN DEC等等的排序方法；　　-n使用纯数字进行排序，默认是以文件形态来排序的；　　-r反向排序；　　-u就是uniq，相同的数据中，仅出现一行代表；　　-t分隔符，默认是用tab键分割；　　-k以那个区间（field）来进行排序的意思 uniq 　　连续且相同的命令，才被视为重复　　建议，先排序，后去重 tar 归档-》对目录　　-c统计某些字符重复次数（重要）　　sort filename | uniq -c 来源：

文本处理工具 – wc,cut,sort,uniq

阅读更多关于文本处理工具 – wc,cut,sort,uniq

wc命令： word count 统计文本中的字符个数 -l：行数 -w：单词数 -c：字节数 cut命令：用来做文件分隔 -d DELIMETER：指明分隔符； -f 指定输出的数据 #：第#个字段 #，#[,#]：离散的多个字段，例如1，3，6 #-#：连续多个字段，例如1-6 混合使用：1-3,7 --output-delimiter=‘xx’：指定以什么字符串输出； sort命令：排序，默认查看第一个字符（包括数字字母以及空格和特殊符号），以ASCII码来排序 -f：忽略字符的大小写 -r：逆序 -t DELIMETER：指定字段分隔符； -k #：以指定的字段为标准排序； -n：以数值大小进行排序 -u：uniq，排序后去重； uniq命令：连续且相同的命令，才被视为重复 -d, --repeated：仅显示重复的行； -u, --unique：仅显示不曾重复的行； * -c, --count：统计某些字符重复的次数 sort FINENAME | uniq -c 来源： https://www.cnblogs.com/azuressy/p/11344106.html

Rails 3, ActiveRecord, PostgreSQL - “.uniq” command doesn't work?

阅读更多关于 Rails 3, ActiveRecord, PostgreSQL - “.uniq” command doesn't work?

问题 I have following query: Article.joins(:themes => [:users]).where(["articles.user_id != ?", current_user.id]).order("Random()").limit(15).uniq and gives me the error PG::Error: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list LINE 1: ...s"."user_id" WHERE (articles.user_id != 1) ORDER BY Random() L... When I update the original query to Article.joins(:themes => [:users]).where(["articles.user_id != ?", current_user.id]).order("Random()").limit(15)#.uniq so the error

Sort and keep a unique duplicate which has the highest value

阅读更多关于 Sort and keep a unique duplicate which has the highest value

问题 I have a file like the one shown below, I want to keep the combinations between the first and second field which has the highest value on the third field(the ones with the arrows, arrows are not included in the actual file) . 1 1 10 1 1 12 <- 1 2 6 <- 1 3 4 <- 2 4 32 2 4 37 2 4 39 2 4 40 <- 2 45 12 2 45 15 <- 3 3 12 3 3 15 3 3 17 3 3 19 <- 3 15 4 3 15 9 <- 4 17 25 4 17 28 4 17 32 4 17 36 <- 4 18 4 <- in order to have and output like this: 1 1 12 1 2 6 1 3 4 2 4 40 2 45 15 3 3 19 3 15 9 4 17

Is there a way to 'uniq' by column?

阅读更多关于 Is there a way to 'uniq' by column?

I have a .csv file like this: stack2@example.com,2009-11-27 01:05:47.893000000,example.net,127.0.0.1 overflow@example.com,2009-11-27 00:58:29.793000000,example.net,255.255.255.0 overflow@example.com,2009-11-27 00:58:29.646465785,example.net,256.255.255.0 ... I have to remove duplicate e-mails (the entire line) from the file (i.e. one of the lines containing overflow@example.com in the above example). How do I use uniq on only field 1 (separated by commas)? According to man , uniq doesn't have options for columns. I tried something with sort | uniq but it doesn't work. sort -u -t, -k1,1 file -u

Remove duplicate lines without sorting [duplicate]

阅读更多关于 Remove duplicate lines without sorting [duplicate]

问题 This question already has an answer here: How to delete duplicate lines in a file without sorting it in Unix? 8 answers I have a utility script in Python: #!/usr/bin/env python import sys unique_lines = [] duplicate_lines = [] for line in sys.stdin: if line in unique_lines: duplicate_lines.append(line) else: unique_lines.append(line) sys.stdout.write(line) # optionally do something with duplicate_lines This simple functionality (uniq without needing to sort first, stable ordering) must be

访问网站ip地址统计过滤与Linux缺少编译环境解决

阅读更多关于访问网站ip地址统计过滤与Linux缺少编译环境解决

【访问网站ip地址统计（已去重）实用查询】（1）统计IP访问量 awk '{print $1}' access.log |sort|uniq |wc -l （2）统计IP重复次数 awk '{print $1}' access.log |sort|uniq -c （3）统计访问量次数最多IP，前10名 awk '{print $1}' access.log |sort|uniq -c|head -n 10 （4）日志中找出访问次数最多的几个分钟 awk '{print $4}' access.log|cut -c 14-18 |sort|uniq -c|sort -nr|head （5）日志中找到访问最多的页面 awk '{print $7}' access.log |sort|uniq -c|sort -nr|head 【缺少编译环境解决】如果安装出现在下面的错误是缺少编译环境，安装编译源码所需的工具和库。 “ ./configure: error: C compiler cc is not found ” 解决：yum install gcc gcc-c++ ncurses-devel perl 来源： 51CTO 作者：天使不会KU 链接： https://blog.51cto.com/13520779/2153766

Is there a way to 'uniq' by column?

阅读更多关于 Is there a way to 'uniq' by column?

问题 I have a .csv file like this: stack2@example.com,2009-11-27 01:05:47.893000000,example.net,127.0.0.1 overflow@example.com,2009-11-27 00:58:29.793000000,example.net,255.255.255.0 overflow@example.com,2009-11-27 00:58:29.646465785,example.net,256.255.255.0 ... I have to remove duplicate e-mails (the entire line) from the file (i.e. one of the lines containing overflow@example.com in the above example). How do I use uniq on only field 1 (separated by commas)? According to man , uniq doesn\'t

How to select unique elements

阅读更多关于 How to select unique elements

问题 I would like to extend the Array class with a uniq_elements method which returns those elements with multiplicity of one. I also would like to use closures to my new method as with uniq . For example: t=[1,2,2,3,4,4,5,6,7,7,8,9,9,9] t.uniq_elements # => [1,3,5,6,8] Example with closure: t=[1.0, 1.1, 2.0, 3.0, 3.4, 4.0, 4.2, 5.1, 5.7, 6.1, 6.2] t.uniq_elements{|z| z.round} # => [2.0, 5.1] Neither t-t.uniq nor t.to_set-t.uniq.to_set works. I don\'t care of speed, I call it only once in my

web服务器日志里面访问次数最多的IP

阅读更多关于 web服务器日志里面访问次数最多的IP

apache日志里面访问次数最多的IP 假设apache日志格式为： 118.78.199.98 – - [09/Jan/2010:00:59:59 +0800] “GET /Public/Css/index.css HTTP/1.1″ 304 – “ http://www.a.cn/common/index.php ” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; GTB6.3)” 问题1：在apachelog中找出访问次数最多的10个IP。 awk '{print $1}' apache_log |sort |uniq -c|sort -nr|head -n 10 awk 首先将每条日志中的IP抓出来，如日志格式被自定义过，可以 -F 定义分隔符和 print指定列； sort进行初次排序，为的使相同的记录排列到一起； upiq -c 合并重复的行，并记录重复次数。 head进行前十名筛选； sort -nr按照数字进行倒叙排序。我参考的命令是：显示10条最常用的命令 sed -e "s/| //n/g" ~/.bash_history | cut -d ' ' -f 1 | sort | uniq -c | sort -nr | head 问题2：在apache日志中找出访问次数最多的几个分钟。 awk

订阅 uniq