Python numpy ndarray skipping lines from text

自作多情 提交于 2020-01-06 13:28:54

问题


Based on this answer, I am using the changethis method

import numpy as np
import os

def changethis(pos):
    appex = sfile[pos[1]-1][:pos[2]] + '*' + file[pos[1]-1][pos[2]+len(pos[0]):]
    file[pos[1]-1] = appex

pos = ('stack', 3, 16)
sfile = np.genfromtxt('in.cpp',dtype='str',delimiter=os.linesep)
changethis(pos)
print(file)

where the in.cpp is a source file which contains the following:

/* Multi-line 
comment
*/

#include <iostream>
#include <fstream>

using namespace std;

int main (int argc, char *argv[]) {
  int linecount = 0;
  double array[1000], sum=0, median=0, add=0;
  string filename;
  if (argc <= 1)
      {
          cout << "Error" << endl;
          return 0;
      }

I get the output:

['using namespace std;' 'int main (int argc, char *argv[]) {'
 'int linecount = *' 'double array[1000], sum=0, median=0, add=0;'
 'string filename;' 'if (argc <= 1)' '{' 'cout << "Error" << endl;'
 'return 0;' '}']

Notice that the lines of the multi-line comment, the include statements and the empty-lines are missing from the ndarray.

I do not understand why this happens since the delimiter is set to account for each change-of-line character.

Any suggestions on how the output to be:

['/* Multi-line' 'comment' '*/' '' '#include <iostream>',
 '' '#include <fstream>' '' 'using namespace std;'
 '' 'int main (int argc, char *argv[]) {'
 'int linecount = *' 'double array[1000], sum=0, median=0, add=0;'
 'string filename;' 'if (argc <= 1)' '{' 'cout << "Error" << endl;'
 'return 0;' '}']

回答1:


Again sorry for the use of genfromtxt, didn't understood your intentions, just tried to provide a possible solution for the problem. As a follow up for that particular solution (others have been provided) you can just do:

import numpy as np
import os

def changethis(pos):
    # Notice file is in global scope
    appex = file[pos[1]-1][:pos[2]] + '*' + file[pos[1]-1][pos[2]+len(pos[0]):]
    file[pos[1]-1] = appex

pos = ('stack', 3, 16)
file = np.array([i for i in open('in.txt','r')]) # instead of genfromtext.
changethis(pos)
print(file)

, which resulted in:

['/* Multi-line \n' 'comment\n' '*/\n*' '\n' '#include <iostream>\n'
 '#include <fstream>\n' '\n' 'using namespace std;\n' '\n'
 'int main (int argc, char *argv[]) {\n' '  int linecount = 0;\n'
 '  double array[1000], sum=0, median=0, add=0;\n' '  string filename;\n'
 '  if (argc <= 1)\n' '      {\n' '          cout << "Error" << endl;\n'
 '          return 0;\n' '      }']

EDIT: Also another relevant point mentioned by another user is the scope I was using for file. I did not mean to tell you to do stuff in global scope, I meant to explain that the function was working because file was in global scope. In any case you can create a function to hold the scope:

import numpy as np
import os

def changeallthese(poslist,path):
    def changethis(pos):
        appex = file[pos[1]-1][:pos[2]-1] + '*' + file[pos[1]-1][pos[2]-1+len(pos[0]):]
        file[pos[1]-1] = appex
    file = np.array([str(i) for i in open(path,'r')])
    for i in poslist:
        changethis(i)
    return file

poslist = [('stack', 3, 16),('stack', 18, 1),('/* Multi-line', 1, 1)]
file =   changeallthese(poslist,'in.txt')
print(file)

, which results in:

['* \n' 'comment\n' '*/\n*' '\n' '#include <iostream>\n'
 '#include <fstream>\n' '\n' 'using namespace std;\n' '\n'
 'int main (int argc, char *argv[]) {\n' '  int linecount = 0;\n'
 '  double array[1000], sum=0, median=0, add=0;\n' '  string filename;\n'
 '  if (argc <= 1)\n' '      {\n' '          cout << "Error" << endl;\n'
 '          return 0;\n' '* }']

To write an array to file you can either use the normal file writing system in Python:

fid = open('out.txt','w')
fid.writelines(file)
fid.close()

, or use a function from numpy (but I'm not sure if it will add more endlines or not so be careful):

np.savetxt('out.txt',file,fmt='%s')



回答2:


If you want a list of strings representing the lines of a file, open the file and use readlines():

with open('in.cpp') as f:
    lines = f.readlines()

# Have changethis take the list of lines as an argument
changethis(lines, pos)

Don't use np.genfromtxt; that's a tabular data parser with all sorts of behavior you don't want, such as treating # as a line comment marker.

Depending on what you intend to do with this list, you can probably even avoid needing an explicit list of lines. Also, file is a bad choice of variable name (it hides the built-in file), and changethis should really take the list as an argument instead of a global variable. In general, the earlier answer you got was pretty terrible.




回答3:


If the file is not too big:

import numpy as np
import os

def changethis(linelist,pos):
    appex = linelist[pos[2]-1][:pos[3]] + pos[1] + linelist[pos[2]-1][pos[3]+len(pos[0]):]
    linelist[pos[2]-1] = appex

pos = ('Multi','Three', 1, 3)

with open('in.cpp','r')  as f:
    lines=f.readlines()
    changethis(lines,pos)
print(''.join(lines))

readlines turns your file into a list of lines(which is memory-inefficient and slow, but does the job. If less than 1k lines it should be fine).

The function takes a list of lines as input, in addition to pos. I also modified the function to replce pos[0] with pos[1] instead of a * at line pos[2] and after character pos[3].

I get this as output:

/* Three-line 
comment
*/

#include <iostream>
#include <fstream>

using namespace std;

int main (int argc, char *argv[]) {
  int linecount = 0;
  double array[1000], sum=0, median=0, add=0;
  string filename;
  if (argc <= 1)
      {
          cout << "Error" << endl;
          return 0;
      }


来源:https://stackoverflow.com/questions/36924519/python-numpy-ndarray-skipping-lines-from-text

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!