uniq | 易学教程

bash tail on a live log file, counting uniq lines with same date/time

阅读更多关于 bash tail on a live log file, counting uniq lines with same date/time

问题 I'm looking for a good way to tail on a live log file, and display number of lines with the same date/time. Currently this is working: tail -F /var/logs/request.log | [cut the date-time] | uniq -c BUT the performance is not good enough. There is a delay of more than one minute, and it output in bulks of few lines each time. Any idea? 回答1: Your problem is most likely related to buffering in your system, not anything intrinsically wrong with your line of code. I was able to create a test

Using awk to get the maximum value of a column, for each unique value of another column

阅读更多关于 Using awk to get the maximum value of a column, for each unique value of another column

问题 So I have a file such as: 10 1 abc 10 2 def 10 3 ghi 20 4 elm 20 5 nop 20 6 qrs 30 3 tuv I would like to get the maximum value of the second column for each value of the first column, i.e.: 10 3 ghi 20 6 qrs 30 3 tuv How can I do using awk or similar unix commands? 回答1: You can use awk : awk '$2>max[$1]{max[$1]=$2; row[$1]=$0} END{for (i in row) print row[i]}' file Output: 10 3 ghi 20 6 qrs 30 3 tuv Explanation: awk command uses an associative array max with key as $1 and value as $2 . Every

linux下grep分析apache日志的命令集合

阅读更多关于 linux下grep分析apache日志的命令集合

linux下grep分析apache日志的命令集合，不可错过的好文章，有了这些命令，秒杀江湖中大部分的apache日志分析。实例：月份英文简写英文全称一月Jan.January 二月Feb.February 三月Mar.March 四月Apr.April 五月May.May 六月June.June 七月July.July 八月Aug.Aguest 九月Sept.September 十月Oct.October 十一月Nov.November 十二月Dec.December 日志分析整理 Grep 日志整理 1.分析日志文件下 2012-05-04 访问页面最高的前20个 URL 并排序 cat access.log |grep '04/May/2012'| awk '{print $11}'|sort|uniq -c|sort -nr|head -20 查询受访问页面的URL地址中含有 www.abc.com 网址的 IP 地址 cat access_log | awk '($11~/\www.abc.com/){print $1}'|sort|uniq -c|sort -nr 2. 获取访问最高的10个IP地址同时也可以按时间来查询 cat linewow-access.log|awk '{print $1}'|sort|uniq -c|sort -nr|head

calling uniq and sort in different orders in shell

阅读更多关于 calling uniq and sort in different orders in shell

问题 is there a difference in the order of uniq and sort when calling them in a shell script? i’m talking here about time- and space-wise. grep 'somePattern' | uniq | sort vs. grep 'somePattern' | sort | uniq a quick test on a 140 k lines textfile showed a slight speed improvement (5.5 s vs 5.0 s) for the first method (get uniq values and then sort) i don’t know how to measure memory usage though the question now is: does the order make a difference? or is it dependent on the returned greplines

Replacing an SQL query with unix sort, uniq and awk

阅读更多关于 Replacing an SQL query with unix sort, uniq and awk

We currently have some data on an HDFS cluster on which we generate reports using Hive. The infrastructure is in the process of being decommissioned and we are left with the task of coming up with an alternative of generating the report on the data (which we imported as tab separated files into our new environment) Assuming we have a table with the following fields. Query IPAddress LocationCode Our original SQL query we used to run on Hive was (well not exactly.. but something similar) select COUNT(DISTINCT Query, IPAddress) as c1, LocationCode as c2, Query as c3 from table group by Query,

常用文本处理命令

阅读更多关于常用文本处理命令

目录一、awk 基本句式过滤记录指定分隔符特殊关键字：正则输出到不同的文件和环境变量的交互二、grep 三、sed 四、sort和uniq 五、实战处理以下文件内容,将域名取出并进行计数排序,如处理: awk例子 Linux中很多文本工具都使用到了正则表达式，正则表达式可以极大的简化linux系统管理工作，因为网上有很多正则相关的教程，所以这里不再讲述，我当时看的是菜鸟的正则表达式，看个一下午在实验几遍基本就会了，除了正向肯定预查，反向肯定预查这几个比较复杂一些，其他都是非常简单的，很多时候记不住也可以查询网上对着写，并不需要你实时记住。这里主要谈谈awk等用到正则表达式的文本处理工具。一、awk awk的指令必须包含在单引号中。基本句式 awk -F'指定输入分隔符' 'BEGIN{做一些初始化工作} 一些过滤条件 {针对每行的工作}... END{最后收尾工作}' 中间的处理块可以有多个，通过过滤条件单每行都会走一遍过滤条件，其中BEGIN和END边只会执行一遍过滤记录 awk '$3==0 && $6=="LISTEN" ' netstat.txt awk '$3==0 && $6=="LISTEN" || NR==1 ' netstat.txt 指定分隔符 awk -F: '{print $1,$3,$6}' /etc/passwd 等价于

How to delete duplicate lines in a file…AWK, SED, UNIQ not working on my file

阅读更多关于 How to delete duplicate lines in a file…AWK, SED, UNIQ not working on my file

I find many ways to do this, AWK , SED , UNIQ , but none of them are working on my file. I want to delete duplicate lines. Here is an example of part of my file: KTBX KFSO KCLK KTBX KFSO KCLK PAJZ PAJZ NOTE: I had to manually add line feeds when I cut and pasted from the file...for some reason it was putting all the variables on one line. Makes me think that my 44,000 line text file actually has only "1" line? Is there a way to modify it so I can delete dups? philshem You can see all non-printed characters with this command: od -c oldfile If all your records are on one line, you can use sed to

using Linux cut, sort and uniq

阅读更多关于 using Linux cut, sort and uniq

问题 I have a list with population, year, and county and I need to cut the list, and then find the number of uniq counties. The list starts off like this: #Population, Year, County 3900, 1969, Beaver 3798, 1970, Beaver 3830, 1971, Beaver 3864, 1972, Beaver 3993, 1973, Beaver 3976, 1974, Beaver 4064, 1975, Beaver There is much more to this list, and many more counties. I have to cut out the county column, sort it, and then output the number of uniq counties. I tried this command: cut -c3- list.txt

“Illegal Byte sequence” error while using shell commands in mac bash terminal

阅读更多关于 “Illegal Byte sequence” error while using shell commands in mac bash terminal

问题 Getting "illegal byte sequence" error while trying to extract non English characters from a large file in MacOS bash shell. This is the script that I am trying to use: sed 's/[][a-z,0-9,A-Z,!@#\$%^&*(){}":/_-|. -][\;''=?]*//g' < $1 >Abhineet_extract1.txt; sed 's/$.$/\1\ /g' <Abhineet_extract1.txt | sort | uniq |tr -d '\n' >&1; rm Abhineet_extract1.txt; and here is the error that I am getting: uniq: stdin: Illegal byte sequence '+? 回答1: It seems that a UTF-8 locale is causing Illegal byte

[Linux] 获取出日志中的邮箱shell

阅读更多关于 [Linux] 获取出日志中的邮箱shell

需求是把所有的日志中邮箱获取出来,根据指定关键字过滤,邮箱的格式是\txxx@xxx\t的格式,日志的存放是按照日期作为目录 #!/bin/bash logBasePath="/data1/mailLog/app/kafka/"; monthYearDay=`date -d "1 day ago" +"%Y-%m-%d"`; #安卓每日邮箱个数 logPath="${logBasePath}${monthYearDay}/api-mail-sina-com-cn.log"; tmpFile="/tmp/${monthYearDay}.android.email.log"; echo "start android email..."; cat $logPath|grep '2026078627'|grep -oP '\\t[^\\]+@.*?\\t'|sed 's/\\t//g'|uniq|sort -u > $tmpFile; echo $tmpFile; androidEmailNum=`wc -l ${tmpFile}`; #IOS每日邮箱个数 tmpFile="/tmp/${monthYearDay}.ios.email.log"; echo "start ios email..."; cat $logPath|grep '2503566089'|grep -oP '\\t[