Blog: Hello Umang!
Yes you can! In fact, I only considered pixels in the same image for this model. Actually, it was what I did too. I preprocessed DICOM files, which is a format used by many medical equipment vendors, into several images for each exam.
You’d preferably need to choose a lossless format, so compression doesn’t affect your training too much. TIFF can be lossy or lossless, you’ll need check. *
To do so, you’ll only need to create a list of pairs of inputs and outputs. You can select an appropriate random algorithm to shuffle this list, then follow the mainstream approach (split dataset, train, test, validate).
Other people use 3D kernels for this kind of problem. It is not so trivial to do it in 3D with image vonly because of memory management, but totally possible. If you want to give a try, I would recommend create a description file indicating layers. Its file could be something very simple as a list of the images in yhe correct order. So you can map which image is above or below then you map it into a 3D matrix, similar as the data structure was before.
* I digress: actually you can use lossy images, but when introducing it into medical exams you will really need to understand how it affects your data, which by itself is not so obvious as it seems. It can make your training harder than it should. One can argue that this would help within the learning process as it implies that the network would need to learn more obvious patterns. But I strongly disagree. If you want this kind of behavior. You can use several other techiques such as drop-out, masking inputs, etc… For this case, I adapted to work with my own equipment, so I allowed myself to resize the images only enough to not loose any features. If storage was a problem I would rather save the inputs as an lossless format anyway and save outputs as RLE due to the nature of its content. TLDR; I guess I wouldn’t accept my results of cancer exams to have wrong diagnosis due to storage problem.
*I digress(2): if you gonna give it a try with same dataset, you’ll need to check your outputs. As I mentioned in another response, during the preprocess phase I removed about half of the pairs which had empty response. It is necessary to do so because of the way the outputs were collected. For example, it is useless to search for the liver above the lungs. So many liver segmentation outputs had only empty responses. This proportion was just so high that shuffling the list of image pairs could get a very biased batch of samples.