Reading a character string of unknown length

依然范特西╮ 提交于 2019-12-20 09:43:10

问题


I have been tasked with writing a Fortran 95 program that will read character input from a file, and then (to start with) simply spit it back out again. The tricky part is that these lines of input are of varying length (no maximum length given) and there can be any number of lines within the file.

I've used

    do
      read( 1, *, iostat = IO ) DNA    ! reads to EOF -- GOOD!!
      if ( IO < 0 ) exit               ! if EOF is reached, exit do
      I = I + 1
      NumRec = I                       ! used later for total no. of records
      allocate( Seq(I) )
      Seq(I) = DNA
      print*, I, Seq(I)
      X = Len_Trim( Seq(I) )           ! length of individual sequence
      print*, 'Sequence size: ', X
      print*
    end do

However, my initial statements list

    character(100), dimension(:), allocatable :: Seq
    character(100)  DNA

and the appropriate integers etc.

I guess what I'm asking is if there is any way to NOT list the size of the character strings in the first instance. Say I've got a string of DNA that is 200+ characters, and then another that is only 25, is there a way that the program can just read what there is and not need to include all the additional blanks? Can this be done without needing to use len_trim, since it can't be referenced in the declaration statements?


回答1:


To progressively read a record in Fortran 95, use non-advancing input. For example:

CHARACTER(10) :: buffer
INTEGER :: size
READ (unit, "(A)", ADVANCE='NO', SIZE=size, EOR=10, END=20) buffer

will read up to 10 characters worth (the length of buffer) each time it is called. The file position will only advance to the next record (the next line) once the entire record has been read by a series of one or more non-advancing reads.

Barring an end of file condition, the size variable will be defined with the actual number of characters read into buffer each time the read statement is executed.

The EOR and END and specifiers are used to control execution flow (execution will jump to the appropriately labelled statement) when end of record or end of file conditions occur respectively. You can also use an IOSTAT specifier to detect these conditions, but the particular negative values to use for the two conditions are processor dependent.

You can sum size within a particular record to work out the length of that particular record.

Wrap such a non-advancing read in a loop that appropriately detects for end of file and end of record and you have the incremental reading part.

In Fortran 95, the length specification for a local character variable must be a specification expression - essentially an expression that can be safely evaluated prior to the first executable statement of the scope that contains the variable's declaration. Constants represent the simplest case, but a specification expression in a procedure can involve dummy arguments of that procedure, amongst other things.

Reading the entire record of arbitrary length in is then a multi stage process:

  • Determine the length of the current record by using a series of incremental reads. These incremental reads for a particular record finish when the end of record condition occurs, at which time the file position will have moved to the next record.
  • Backspace the file back to the record of interest.
  • Call a procedure, passing the length of the current record as a dummy argument. Inside that procedure have an character variable whose length is given by the dummy argument.
  • Inside that called procedure, read the current record into that character variable using normal advancing input.
  • Carry out further processing on that character variable!

Note that each record ends up being read twice - once to determine its length, the second to actually read the data into the correctly "lengthed" character variable.

Alternative approaches exist that use allocatable (or automatic) character arrays of length one. The overall strategy is the same. Look at the code of the Get procedures in the common ISO_VARYING_STRING implementation for an example.

Fortran 2003 introduces deferred length character variables, which can have their length specified by an arbitrary expression in an allocate statement or, for allocatable variables, by the length of the right hand side in an assignment statement. This (in conjunction with other "allocatable" enhancements) allows the progressive read that determines the record length to also build the character variable that holds the contents of the record. Your supervisor needs to bring his Fortran environment up to date.




回答2:


Here's a function for Fortran 2003, which sets an allocatable string (InLine) of exactly the length of the input string (optionally trimmed), or returns .false. if end of file

function ReadLine(aunit, InLine, trimmed) result(OK)
integer, intent(IN) :: aunit
character(LEN=:), allocatable, optional :: InLine
logical, intent(in), optional :: trimmed
integer, parameter :: line_buf_len= 1024*4
character(LEN=line_buf_len) :: InS
logical :: OK, set
integer status, size

OK = .false.
set = .true.
do
    read (aunit,'(a)',advance='NO',iostat=status, size=size) InS
    OK = .not. IS_IOSTAT_END(status)
    if (.not. OK) return
    if (present(InLine)) then
        if (set) then
            InLine = InS(1:size)
            set=.false.
        else
            InLine = InLine // InS(1:size)
        end if
    end if
    if (IS_IOSTAT_EOR(status)) exit
end do
if (present(trimmed) .and. present(InLine)) then
    if (trimmed) InLine = trim(adjustl(InLine))
end if

end function ReadLine

For example to do something with all lines in a file with unit "aunit" do

 character(LEN=:), allocatable :: InLine

 do while (ReadLine(aunit, InLine))
   [.. something with InLine]
 end do



回答3:


I have used the following. Let me know if it is better or worse than yours.

!::::::::::::::::::::: SUBROUTINE OR FUNCTION :::::::::::::::::::::::::::::::::::::::                                                                                                                                   
!__________________ SUBROUTINE lineread(filno,cargout,ios) __________________________                                                                                                                                   
subroutine lineread(filno,cargout,ios)                                                                                                                                                                                  
Use reallocate,ErrorMsg,SumStr1,ChCount                                                                                                                                                                                 
! this subroutine reads                                                                                                                                                                                                 
! 1. following row in a file except a blank line or the line begins with a !#*                                                                                                                                          
! 2. the part of the string until first !#*-sign is found or to end of string                                                                                                                                           
!                                                                                                                                                                                                                       
! input Arguments:                                                                                                                                                                                                      
! filno (integer)             input file number                                                                                                                                                                         
!                                                                                                                                                                                                                       
! output Arguments:                                                                                                                                                                                                     
! cargout (character)     output chArActer string, converted so that all unecessay spaces/tabs/control characters removed.                                                                                              

implicit none                                                                                                                                                                                                           
integer,intent(in)::filno                                                                                                                                                                                               
character*(*),intent(out)::cargout                                                                                                                                                                                      
integer,intent(out)::ios                                                                                                                                                                                                
integer::nlen=0,i,ip,ich,isp,nsp,size                                                                                                                                                                                   
character*11,parameter::sep='=,;()[]{}*~'                                                                                                                                                                               
character::ch,temp*100                                                                                                                                                                                                  
character,pointer::crad(:)                                                                                                                                                                                              

nullify(crad)                                                                                                                                                                                                           
cargout=''; nlen=0; isp=0; nsp=0; ich=-1; ios=0                                                                                                                                                                         
Do While(ios/=-1) !The eof() isn't standard Fortran.                                                                                                                                                                    
READ(filno,"(A)",ADVANCE='NO',SIZE=size,iostat=ios,ERR=9,END=9)ch ! start reading file                                                                                                                                  
! read(filno,*,iostat=ios,err=9)ch;                                                                                                                                                                                     
    if(size>0.and.ios>=0)then                                                                                                                                                                                           
     ich=iachar(ch)                                                                                                                                                                                                     
    else                                                                                                                                                                                                                
     READ(filno,"(A)",ADVANCE='no',SIZE=size,iostat=ios,EOR=9); if(nlen>0)exit                                                                                                                                          
    end if                                                                                                                                                                                                              
    if(ich<=32)then        ! tab(9) or space(32) character                                                                                                                                                              
        if(nlen>0)then                                                                                                                                                                                                  
     if(isp==2)then                                                                                                                                                                                                       
        isp=0;                                                                                                                                                                                                            
     else                                                                                                                                                                                                                 
        isp=1;                                                                                                                                                                                                            
     end if                                                                                                                                                                                                               
eend if; cycle;                                                                                                                                                                                                         
    elseif(ich==33.or.ich==35.or.ich==38)then !if char is comment !# or continue sign &                                                                                                                                 
     READ(filno,"(A)",ADVANCE='yes',SIZE=size,iostat=ios,EOR=9)ch; if(nlen>0.and.ich/=38)exit;                                                                                                                          
    else                                                                                                                                                                                                                
     ip=scan(ch,sep);                                                                                                                                                                                                   
     if(isp==1.and.ip==0)then; nlen=nlen+1; crad=>reallocate(crad,nlen); nsp=nsp+1; endif                                                                                                                               
     nlen=nlen+1; crad=>reallocate(crad,nlen); crad(nlen)=ch;                                                                                                                                                           
     isp=0; if(ip==1)isp=2;                                                                                                                                                                                             
    end if                                                                                                                                                                                                              
end do                                                                                                                                                                                                                  
9 if(size*ios>0)call ErrorMsg('Met error in reading file in [lineread]',-1)                                                                                                                                             
! ios<0: Indicating an end-of-file or end-of-record condition occurred.                                                                                                                                                 
if(nlen==0)return                                                                                                                                                                                                       
!write(6,'(a,l)')SumStr1(crad),eof(filno)                                                                                                                                                                               
!do i=1,nlen-1; write(6,'(a,$)')crad(i:i); end do; if(nlen>0)write(6,'(a)')crad(i:i)                                                                                                                                    
 cargout=SumStr1(crad)                                                                                                                                                                                                  
 nsp=nsp+1; i=ChCount(SumStr1(crad),' ',',')+1;                                                                                                                                                                         
if(len(cargout)<nlen)then                                                                                                                                                                                               
 call ErrorMsg(SumStr1(crad)// " is too long!",-1)                                                                                                                                                                      
!elseif(i/=nsp.and.nlen>=0)then                                                                                                                                                                                         
! call ErrorMsg(SumStr1(crad)// " has unrecognizable data number!",-1)                                                                                                                                                  
end if                                                                                                                                                                                                                  
end subroutine lineread                                                                                                                                                                                                 



回答4:


I'm using Fortran 90 to do this:

X = Len_Trim( Seq(I) )           ! length of individual sequence
write(*,'(a<X>)') Seq(I)(1:X)

You can simply declare Seq to be a large character string and then trim it as your write it out. I don't know how kosher this solution is but it certainly works for my purpose. I know that some compilers do not support "variable format expressions", but there are various workarounds to do the same thing almost as simply.

GNU Fortran variable expression workaround.



来源:https://stackoverflow.com/questions/14765382/reading-a-character-string-of-unknown-length

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!