How do I parse a .srt subtitle file

六眼飞鱼酱① 提交于 2020-06-17 15:29:26

问题


I'm trying to load and parse a .srt subtitle file in VB.net. It is a very simple text file, but I'm having difficulty.

Here is the structure:

Hide   Copy Code
1
00:00:01,600 --> 00:00:04,200
English (US)

2
00:00:05,900 --> 00:00:07,999
This is a subtitle in American English
Sometimes subtitles have 2 lines

3
00:00:10,000 --> 00:00:14,000
Adding subtitles is very easy to do

  • A number
  • Followed by start and end time
  • followed by the text which can be 1 or multiple lines

What I'm really trying to do is find the length in time of the subtitle file - meaning finding the last end time for the subtitle file. I'm creating a program that hard codes subtitles to a video file so I need to know how long the video should be based on the length of the subtitle file.

The outcome I'm looking for is:

After reading a .srt file to know the "length" in time of the .srt file - meaning the last time code. In the example above it would be: 00:00:14,000 that's the last time the subtitle is displayed.


回答1:


You can do it easily with LINQ and File.Readlines

Dim SrtTimeCode As String = ""
Dim lastTimeLine As String = File.ReadLines(FILE_NAME) _
    .LastOrDefault(Function(s) s.Contains(" --> "))

If lastTimeLine IsNot Nothing Then
    SrtTimeCode = lastTimeLine.Split(New String() {" --> "}, StringSplitOptions.None)(1)
End If

Note that File.ReadLines keeps only the current line in memory when enumerating the lines. It does not store the whole file. This scales better with big files.




回答2:


Also, that can be achieved through the Regular Expressions

Imports System.IO
Imports System.Text.RegularExpressions
'...

Private Sub TheCaller()
    Dim srtFile As String = "English.srt"
    Dim endTime = "Not Found!"

    If File.Exists(srtFile) Then
        Dim patt As String = ">.(\d\d:\d\d:\d\ds?,s?\d{3})"
        'Get the last match, --> 00:00:14,000 in your example:
        Dim lastMatch = File.ReadLines(srtFile).
            LastOrDefault(Function(x) Regex.IsMatch(x, patt))

        If lastMatch IsNot Nothing Then
            endTime = Regex.Match(lastMatch, patt).Groups(1).Value
        End If
    End If

    Console.WriteLine(endTime)
End Sub

The output is regex101:

00:00:14,000

If you want to get rid of the milliseconds part, then use the following pattern instead:

Dim patt As String = ">.(\d\d:\d\d:\d\d)"

and you will get regex101:

00:00:14



回答3:


Comments and explanations in-line.

Private Sub OpCode()
    'Using Path.Combine you don't have to worry about if the backslash is there or not
    Dim theFile1 = Path.Combine(Application.StartupPath(), ListBox1.SelectedItem.ToString)
    'A streamreader needs to be closed and disposed,File.ReadAllLines opens the file, reads it, and closes it.
    'It returns an array of lines
    Dim lines = File.ReadAllLines(theFile1)
    Dim LastLineIndex = lines.Length - 1
    Dim lastLine As String = lines(LastLineIndex)
    'You tried to parse the entire line. You only want the first character
    Do Until Integer.TryParse(lastLine.Substring(0, 1), Nothing)
        LastLineIndex -= 1
        lastLine = lines(LastLineIndex)
    Loop
    'The lower case c tells the compiler that the preceding string is really a Char.
    Dim splitLine = lastLine.Split(">"c)
    'Starting at index 1 because there is a space between > and 0
    Dim SrtEndTimeCode As String = splitLine(1).Substring(1, 12)
    MessageBox.Show(SrtEndTimeCode)
End Sub



回答4:


Well I guess I got it - it's probably not the best code, but it works:

Here's what's going on in the code: I have a Listbox with .srt files The code takes the .srt file and puts it in a textbox Then it parses it starting with the last line and goes back up to 20 lines (to give room for extra line breaks at end of file etc. Then it looks for the first line that only has an integer (meaning the last line) then it looks for the line after that which is the timecode then it takes the part on the right which is the end code And that is the "length" of the .srt file

   Dim appPath As String = Application.StartupPath() ' app path
        Dim theFile1 As String

        theFile1 = appPath & "\" & ListBox1.SelectedItem.ToString 'this is where i have the .srt files

        Dim FILE_NAME As String = theFile1

        Dim TextLine As String

        If System.IO.File.Exists(FILE_NAME) = True Then

            Dim objReader As New System.IO.StreamReader(FILE_NAME)

            Do While objReader.Peek() <> -1

                TextLine = TextLine & objReader.ReadLine() & vbNewLine

            Loop

            TextBox7.Text = TextLine ' load .srt into textbox

        Else

            MessageBox.Show("File Does Not Exist")

        End If
        Dim SrtTimeCode As String
        SrtTimeCode = ""

        If TextBox7.Lines.Any = True Then ' only execute if textbox has lines

            Dim lastLine As String

            For i = 1 To 20 'Check from the end of text file back 20 lines for final subtitle chunk
                lastLine = TextBox7.Lines(TextBox7.Lines.Length - i)

                If Integer.TryParse(lastLine, vbNull) Then   ' if the last line is found

                    SrtTimeCode = TextBox7.Lines(TextBox7.Lines.Length - i + 1) 'the last timecode has been found - now it needs to be split

                    GoTo TheEnd
                End If


            Next i
        End If


theEnd:
        Dim ChoppedSRTTimeCodeFinal As String
        Dim test As String = SrtTimeCode
        Dim ChoppedSRTTimeCode As String = test.Substring(test.IndexOf(">"c) + 1)


        'ChoppedSRTTimeCodeFinal = ChoppedSRTTimeCode.Substring(test.IndexOf(","c) + 1)
        ChoppedSRTTimeCodeFinal = ChoppedSRTTimeCode.Substring(0, ChoppedSRTTimeCode.IndexOf(","))

        MsgBox(ChoppedSRTTimeCodeFinal) ' this is the final timecode parsed


来源:https://stackoverflow.com/questions/59326128/how-do-i-parse-a-srt-subtitle-file

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!