Replace a single character in a Numpy list of strings

孤人 提交于 2021-01-28 11:20:55

问题


I have a Numpy array of datetime64 objects that I need to convert to a specific time format yyyy-mm-dd,HH:MM:SS.SSS Numpy has a function called datetime_as_string that outputs ISO8601 (yyyy-mm-ddTHH:MM:SS.SSS) time, which is extremely close to what I want, the only difference being there is a T where I want a comma.

Is there a way to quickly swap the "T" for a ","? Here is an example data set:

offset = np.arange(0, 1000)
epoch = np.datetime64('1970-01-01T00:00:00.000')
time_objects = epoch + offset.astype('timedelta64[ms]')
time_strings = np.datetime_as_string(time_objects)

I have had success using a lambda and a list comprehension, but it seems awkward switching back and forth from a Python list to a Numpy array.

f = lambda x: x[:10] + ',' + x[11:]
np.array([f(x) for x in time_strings])

I know in some cases lambdas can be applied "direct" to a Numpy array, but it doesn't work in this case. f(time_strings) produces a TypeError. Any thoughts?

I know I could convert back to a Python datetime (which is the direction I'm coming from) or use Pandas. But the datetime_as_string function is really fast and I'd like to stick to Numpy solution.

--- Conclusions based on answers ---
It turns out that Paul's view casting black magic was 75x faster than my list comprehension, and 100x faster than np.char.replace(). Here are the results from the three methods (all were initialized with the above dataset, but with 1000000 elements).

start = time.time()
time_strings[..., None].view('U1')[..., 10] = ','
print(time.time() - start)
0.016000747680664062 seconds

start = time.time()
f = lambda x: x[:10] + ',' + x[11:]
time_strings = np.array([f(x) for x in time_strings])
print(time.time() - start, 'seconds')
1.1740672588348389 seconds

start = time.time()
time_strings = np.char.replace(time_strings,'T',',')
print(time.time() - start, 'seconds')
1.4980854988098145 seconds

回答1:


You could use viewcasting to get access to individual characters:

time_strings[...,None].view('U1')[...,10] = ','

changes time_strings in-place.




回答2:


In [309]: np.char.replace(time_strings,'T',',')                                 
Out[309]: 
array(['1970-01-01,00:00:00.000', '1970-01-01,00:00:00.001',
       '1970-01-01,00:00:00.002', '1970-01-01,00:00:00.003',
       '1970-01-01,00:00:00.004', '1970-01-01,00:00:00.005',
       '1970-01-01,00:00:00.006', '1970-01-01,00:00:00.007',
       ....

But @PaulPanzer's inplace is much faster (even it is a bit more obscure):

In [316]: %%timeit temp=time_strings.copy() 
     ...: temp[...,None].view('U1')[...,10] = ','                                                                      
8.48 µs ± 34.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [317]: timeit np.char.replace(time_strings,'T',',')                          
1.23 ms ± 1.12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


来源:https://stackoverflow.com/questions/58227354/replace-a-single-character-in-a-numpy-list-of-strings

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!