anonymize

Anonymizing data / replacing names

◇◆丶佛笑我妖孽 提交于 2019-12-24 14:13:49
问题 Normally I anonymize my data by using hashlib and using the .apply(hash) function. Now im trying a new approach, imagine I have to following df called 'data': contributor -- amount payed eric -- 10 frank -- 28 john -- 49 frank -- 77 barbara -- 31 Which I want to anonymize by turning the names all into 'person1', 'person2' etc, like this: contributor -- amount payed person1 -- 10 person2 -- 28 person3 -- 49 person2 -- 77 person4 -- 31 So my first though was summarizing the name column so the

Which algorithm for hashing name, firstName and birth-date of a person

拜拜、爱过 提交于 2019-12-13 12:34:08
问题 I have to save the combination of lastname, firstname and birth-date of a person as a hash. This hash is later used to search for the same person with the exactly same properties. My question is, if SHA-1 is a meaningfull algorithm for this. As far as I understand SHA-1, there is virtually no possibility that two different persons (with different attributes) will ever get the same hash-value. Is this right? 回答1: If you want to search for a person knowing only those credentials, you could

Replacing strings in file, using patterns from another file

女生的网名这么多〃 提交于 2019-12-11 07:29:40
问题 I need to replace many strings in many files using another file with patterns (like a database of strings). It is for anonymizing files. Example: File #1: "Administrator";"512";"Built-in account for administering the computer/domain";"False";"False";"Administrator";"True";"True";"True";"S-1-5-21-3445027559-693823181-3401817782-500";"User";"OK";"23. 1. 2012 9:41:34";"20. 1. 2012 16:01:33";"10";"True";* File #2 (the pattern file): Guest;user1 Administrator;user2 system;user3 The output in File

Anonymization of Account Numbers in 2TB of CSV's

回眸只為那壹抹淺笑 提交于 2019-12-10 20:13:19
问题 I have ~2TB of CSV's where the first 2 columns contains two ID numbers . These need to be anonymized so the data can be used in academic research. The anonymization can be (but does not have to be) irreversible. These are NOT medical records, so I do not need the fanciest cryptographic algorithm. The Question: Standard hashing algorithms make really long strings, but I will have to do a bunch of ID-matching (i.e. 'for subset of rows in data containing ID XXX, do...)' to process the anonymized

Which algorithm for hashing name, firstName and birth-date of a person

丶灬走出姿态 提交于 2019-12-06 04:57:51
I have to save the combination of lastname, firstname and birth-date of a person as a hash. This hash is later used to search for the same person with the exactly same properties. My question is, if SHA-1 is a meaningfull algorithm for this. As far as I understand SHA-1, there is virtually no possibility that two different persons (with different attributes) will ever get the same hash-value. Is this right? If you want to search for a person knowing only those credentials, you could store the SHA-1 in the database(or MD5 for speed, unless you have like a quadrillion people to sample). The hash

Anonymize IP logging in nginx?

江枫思渺然 提交于 2019-11-30 04:38:48
To respect the privacy of my users I'm trying to anonymize their ip addresses in nginx log files. One way to do this would be defining a custom log format, like so: log_format noip '127.0.0.1 - [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent" $request_time'; This method has two downsides: I can't distinguish between two users and can't use geo location tools. The best thing would be to 'shorten' the ip address ( 87.12.23.55 would become 87.12.23.1 ). Is there a possibillity to achieve this using nginx config scripting? Thanks. Mike Bretz Even if there

Anonymize IP logging in nginx?

天大地大妈咪最大 提交于 2019-11-29 00:03:28
问题 To respect the privacy of my users I'm trying to anonymize their ip addresses in nginx log files. One way to do this would be defining a custom log format, like so: log_format noip '127.0.0.1 - [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent" $request_time'; This method has two downsides: I can't distinguish between two users and can't use geo location tools. The best thing would be to 'shorten' the ip address ( 87.12.23.55 would become 87.12.23.1 ).