How do I find a substring within a string in F#?

那年仲夏 提交于 2020-01-03 13:31:06

问题


I found a "fun" project online for f# and the idea behind it is to find the number of substrings within a given string.

Here's the prompt:

Description:
You are given a DNA sequence:
a string that contains only characters 'A', 'C', 'G', and 'T'.
Your task is to calculate the number of substrings of sequence,
in which each of the symbols appears the same number of times.

Example 1:
For sequence = "ACGTACGT", the output should be 6
All substrings of length 4 contain each symbol exactly once (+5),
and the whole sequence contains each symbol twice (+1).

Example 2:
For sequence = "AAACCGGTTT", the output should be 1
Only substring "AACCGGTT" satisfies the criterion above: it contains each symbol twice.


Input: String, a sequence that consists only of symbols 'A', 'C', 'G', and 'T'.
Length constraint: 0 < sequence.length < 100000.

Output: Integer, the number of substrings where each symbol appears equally many times.

I'm not exactly sure where to go with this, or more specifically what to do. I've looked around on the internet to try and find what I'm supposed to do and I've only found the following code (I added the input variable, var variable, and changed the show "things" to input then the substring to search for (i hope that makes sense)):

open System

let countSubstring (where :string) (what : string) =
match what with
| "" -> 0
| _ -> (where.Length - where.Replace(what, @"").Length) / what.Length


[<EntryPoint>]
let main argv =

let input = System.Console.ReadLine();
let var = input.Length;
Console.WriteLine(var);
let show where what =
    printfn @"countSubstring(""%s"", ""%s"") = %d" where what (countSubstring where what)
show input "ACGT"
show input "CGTA"
show input "GTAC"
show input "TACG"
0

Anyways, if anyone can help me with this, it would be greatly appreciated.

Thanks in advance


回答1:


First declare a function numberACGT that from a string returns 1 if the number of characters A is the same as C, G and T and 0 otherwise. For this, declare an array N of 4 integers initialized to 0 and run throw the string, incrementing the corresponding counter. In late compare array elements between them.

Then for each sub-string (fixed length multiple of 4) call numberACGT and add to counter count (initialized to 0 at the beginning)

let numberACGT (aString:string) =
    let N = Array.create 4 (0:int)
    let last = aString.Length - 1 
    for i = 0 to last do
        match aString.[i] with
        | 'A' -> N.[0] <- N.[0] + 1
        | 'C' -> N.[1] <- N.[1] + 1
        | 'G' -> N.[2] <- N.[2] + 1
        | _ -> N.[3] <- N.[3] + 1
    if (N.[0] = N.[1]) && (N.[1] = N.[2]) && (N.[2] = N.[3]) then 1 else 0 

let numberSubStrings (aString:string) =
    let mutable count = 0
    let len = aString.Length 
    for k = 1 to len / 4 do //only multiple of 4
        for pos = 0 to len - 4*k do
            count <- count + numberACGT (aString.[pos..pos+4*k-1])
    count

I hope that it is fast enough.

[<EntryPoint>]
let main argv = 
  let stopWatch = System.Diagnostics.Stopwatch.StartNew()
  let input =  Console.ReadLine() in
    printf "%i  " (numberSubStrings input)
  stopWatch.Stop()
  let g =  Console.ReadLine()
  0

Result:

62    4.542700

An new version in O(n²):

let numberSubStringsBis (aString:string) =
    let mutable count = 0 
    let len = aString.Length 
    for pos = 0 to len - 1 do
        let mutable a = 0 
        let mutable  c = 0 
        let mutable g = 0 
        let mutable t = 0 
        let mutable k = pos 
        while k + 3 <= len - 1 do
            for i in [k..k+3] do
                match aString.[i] with
                | 'A' -> a <- a + 1
                | 'C' -> c <- c + 1
                | 'G' -> g <- g + 1
                | _ -> t <- t + 1
            k <- k + 4 
            if a=c && c=g && g=t then count <- count + 1               
    count



回答2:


Here is a solution that generates all substrings that have length divisible by four and then counts how many of those have equal amount of symbols. Note that if the length of a substring is not divisible by four it cannot have equal amount of four different symbols.

let hasEqualAmountOfSymbols (substring : string) =
    let symbolAppearances =
        ['A'; 'C'; 'G'; 'T']
        |> List.map (fun symbol ->
            substring
            |> Seq.filter ((=) symbol)
            |> Seq.length)
    symbolAppearances
    |> List.pairwise
    |> List.forall (fun (x, y) -> x = y)


let countSubstrings input =
    let potentialSubstrings =
        let lastIndex = String.length input - 1
        [ for i in 0 .. lastIndex do
            for j in i + 3 .. 4 .. lastIndex do
                yield input.Substring(i, j - i + 1) ]
    potentialSubstrings
    |> List.filter hasEqualAmountOfSymbols
    |> List.length


countSubstrings "ACGTACGT" // -> 6
countSubstrings "AAACCGGTTT" // -> 1


来源:https://stackoverflow.com/questions/40010725/how-do-i-find-a-substring-within-a-string-in-f

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!