问题
I found a "fun" project online for f# and the idea behind it is to find the number of substrings within a given string.
Here's the prompt:
Description:
You are given a DNA sequence:
a string that contains only characters 'A', 'C', 'G', and 'T'.
Your task is to calculate the number of substrings of sequence,
in which each of the symbols appears the same number of times.
Example 1:
For sequence = "ACGTACGT", the output should be 6
All substrings of length 4 contain each symbol exactly once (+5),
and the whole sequence contains each symbol twice (+1).
Example 2:
For sequence = "AAACCGGTTT", the output should be 1
Only substring "AACCGGTT" satisfies the criterion above: it contains each symbol twice.
Input: String, a sequence that consists only of symbols 'A', 'C', 'G', and 'T'.
Length constraint: 0 < sequence.length < 100000.
Output: Integer, the number of substrings where each symbol appears equally many times.
I'm not exactly sure where to go with this, or more specifically what to do. I've looked around on the internet to try and find what I'm supposed to do and I've only found the following code (I added the input variable, var variable, and changed the show "things" to input then the substring to search for (i hope that makes sense)):
open System
let countSubstring (where :string) (what : string) =
match what with
| "" -> 0
| _ -> (where.Length - where.Replace(what, @"").Length) / what.Length
[<EntryPoint>]
let main argv =
let input = System.Console.ReadLine();
let var = input.Length;
Console.WriteLine(var);
let show where what =
printfn @"countSubstring(""%s"", ""%s"") = %d" where what (countSubstring where what)
show input "ACGT"
show input "CGTA"
show input "GTAC"
show input "TACG"
0
Anyways, if anyone can help me with this, it would be greatly appreciated.
Thanks in advance
回答1:
First declare a function numberACGT
that from a string returns 1 if the number of characters A is the same as C, G and T and 0 otherwise. For this, declare an array N of 4 integers initialized to 0 and run throw the string, incrementing the corresponding counter. In late compare array elements between them.
Then for each sub-string (fixed length multiple of 4) call numberACGT
and add to counter count
(initialized to 0 at the beginning)
let numberACGT (aString:string) =
let N = Array.create 4 (0:int)
let last = aString.Length - 1
for i = 0 to last do
match aString.[i] with
| 'A' -> N.[0] <- N.[0] + 1
| 'C' -> N.[1] <- N.[1] + 1
| 'G' -> N.[2] <- N.[2] + 1
| _ -> N.[3] <- N.[3] + 1
if (N.[0] = N.[1]) && (N.[1] = N.[2]) && (N.[2] = N.[3]) then 1 else 0
let numberSubStrings (aString:string) =
let mutable count = 0
let len = aString.Length
for k = 1 to len / 4 do //only multiple of 4
for pos = 0 to len - 4*k do
count <- count + numberACGT (aString.[pos..pos+4*k-1])
count
I hope that it is fast enough.
[<EntryPoint>]
let main argv =
let stopWatch = System.Diagnostics.Stopwatch.StartNew()
let input = Console.ReadLine() in
printf "%i " (numberSubStrings input)
stopWatch.Stop()
let g = Console.ReadLine()
0
Result:
62 4.542700
An new version in O(n²):
let numberSubStringsBis (aString:string) =
let mutable count = 0
let len = aString.Length
for pos = 0 to len - 1 do
let mutable a = 0
let mutable c = 0
let mutable g = 0
let mutable t = 0
let mutable k = pos
while k + 3 <= len - 1 do
for i in [k..k+3] do
match aString.[i] with
| 'A' -> a <- a + 1
| 'C' -> c <- c + 1
| 'G' -> g <- g + 1
| _ -> t <- t + 1
k <- k + 4
if a=c && c=g && g=t then count <- count + 1
count
回答2:
Here is a solution that generates all substrings that have length divisible by four and then counts how many of those have equal amount of symbols. Note that if the length of a substring is not divisible by four it cannot have equal amount of four different symbols.
let hasEqualAmountOfSymbols (substring : string) =
let symbolAppearances =
['A'; 'C'; 'G'; 'T']
|> List.map (fun symbol ->
substring
|> Seq.filter ((=) symbol)
|> Seq.length)
symbolAppearances
|> List.pairwise
|> List.forall (fun (x, y) -> x = y)
let countSubstrings input =
let potentialSubstrings =
let lastIndex = String.length input - 1
[ for i in 0 .. lastIndex do
for j in i + 3 .. 4 .. lastIndex do
yield input.Substring(i, j - i + 1) ]
potentialSubstrings
|> List.filter hasEqualAmountOfSymbols
|> List.length
countSubstrings "ACGTACGT" // -> 6
countSubstrings "AAACCGGTTT" // -> 1
来源:https://stackoverflow.com/questions/40010725/how-do-i-find-a-substring-within-a-string-in-f