Extract tiles from tiled TIFF and store in numpy array

走远了吗. 提交于 2021-01-29 02:05:40

问题


My overall goal is to crop several regions from an input mirax (.mrxs) slide image to JPEG output files.

Here is what one of these images looks like:

enter image description here

Note that the darker grey area is part of the image, and the regions I ultimately wish to extract in JPEG format are the 3 black square regions.

Now, for the specifics:

I'm able to extract the color channels from the mirax image into 3 separate TIFF files using vips on the command line:

vips extract_band INPUT.mrxs OUTPUT.tiff[tile,compression=jpeg] C --n 1

Where C corresponds to the channel number (0-2), and each output file is about 250 MB in size.

The next job is to somehow recognize and extract the regions of interest from the images, so I turned to several python imaging libraries, and this is where I encountered difficulties.

When I try to load any of the TIFFs using OpenCV using:

i = cv2.imread('/home/user/input_img.tiff',cv2.IMREAD_ANYDEPTH) 

I get an error error: (-211) The total matrix size does not fit to "size_t" type in function setSize

I managed to get a little more traction with Pillow, by doing:

from PIL import Image
tiff = Image.open('/home/user/input_img.tiff')
print len(tiff.tile)
print tiff.tile[0]
print tiff.info

which outputs:

636633
('jpeg', (0, 0, 128, 128), 8, ('L', ''))
{'compression': 'jpeg', 'dpi': (25.4, 25.4)}

However, beyond loading the image, I can't seem to perform any useful operations; for example doing tiff.tostring() results in a MemoryError (I do this in an attempt to convert the PIL object to a numpy array) I'm not sure this operation is even valid given the existence of tiles.

From my limited understanding, these TIFFs store the image data in 'tiles' (of which the above image contains 636633) in a JPEG-compressed format.

It's not clear to me, however, how would one would extract these tiles for use as regular JPEG images, or even whether the sequence of steps in the above process I outlined is a potentially useful way of accomplishing the overall goal of extracting the ROIs from the mirax image.

If I'm on the right track, then some guidance would be appreciated, or, if there's another way to accomplish my goal using vips/openslide without python I would be interested in hearing ideas. Additionally, more information about how I could deal with or understand the TIFF files I described would also be helpful.

The ideal situations would include:

1) Some kind of autocropping feature in vips/openslide which can generate JPEGs from either the TIFFs or original mirax image, along the lines of what the following command does, but without generated tens of thousands of images:

vips dzsave CMU-1.mrxs[autocrop] pyramid

2) Being able to extract tiles from the TIFFs and store the data corresponding to the image region as a numpy array in order to detect the 3 ROIs using OpenCV or another methd.


回答1:


I would use the vips Python binding, it's very like PIL but can handle these huge images. Try something like:

from gi.repository import Vips

slide = Vips.Image.new_from_file(sys.argv[1])
tile = slide.extract_area(left, top, width, height)
tile.write_to_file(sys.argv[2])

You can also extract areas on the command-line, of course:

$ vips extract_area INPUT.mrxs OUTPUT.tiff left top width height

Though that will be a little slower than a loop in Python. You can use crop as a synonym for extract_area.

openslide attaches a lot of metadata to the image describing the layout and position of the various subimages. Try:

$ vipsheader -a myslide.mrxs 

And have a look through the output. You might be able to calculate the position of your subimages from that. I would also ask on the openslide mailing list, they are very expert and very helpful.

One more thing you could try: get a low-res overview, corner-detect on that, then extract the tiles from the high-res image. To get a low-res version of your slide, try:

$ vips copy myslide.mrxs[level=7] overview.tif

Level 7 is downsampled by 2 ** 7, so 128x.



来源:https://stackoverflow.com/questions/29988739/extract-tiles-from-tiled-tiff-and-store-in-numpy-array

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!