Machine Learning

Machine learning (which is essentially a form of artificial intelligence) will likely become a significant part of pathology research over the next few decades. While computers lack the intuition and improvisational abilities of humans (at least, at the moment), computers can assess and analyze data much more quickly and effectively than the human mind, particularly with large data sets.

For example, a patient with a salivary gland carcinoma will have thousands to millions (depending on how one counts them) data points that a computer can use. This might include laboratory tests, radiographs (which are composed of pixels, each its own data point), and clinical outcomes. If a digital slide is scanned, it can also serve as a collection of data points that can be analyzed alongside the clinical data.

What machine learning does is allow a computer to learn without any sort of explicit programming. In order to make this webpage, I had to explicitly write exactly what I want the computer to do. What machine learning does is give a computer a desired outcome and allow it to determine the best way to achieve that outcome. Pathologists do this on a daily basis—we look for certain histological characteristics and patterns that tend to be associated with a better or worse prognosis. Computers may be able to do it better because they can assess the quantitative data associated with scanned images. While a pathologist might make a qualitative judgement about a tumor front being composed of nests and islands, a computer might use some relationship between nuclear size, spacing between nuclei, or space between nests and the larger tumor nest. It might even use a complicated formula like (nuclear size)2/(√(2π(mean nuclear distance from mass))). Of course, this means that a computer often cannot communicate why it finds the results it does (the so-called "black box" of machine learning), or that a human might be able to use the same formula in practice.

Cnn 1

One way machine learning is commonly used today is through the use of convolutional neural networks (CNNs, see above), which are designed similarly to how the human optic neurons work. Images received by photoreceptors in the eye undergo neural processing wherein neurons for receptors in close proximity interact with each other to identify patterns. It makes sense, since receptors in close proximity are going to be assessing a similar part of the visual field, and the information for those receptors are contextually related. Convolutional neural networks act in a similar fashion, allowing for the identification of shapes and patterns in an image. In fact, neural networks are how companies like Facebook can recognize your face in an image, or how Google can search for pictures online that don't have any tags.

As the computer scans over an image, it picks up a user-defined block of pixels out of an image (for example, 10 x 10 pixels), skipping over a user-defined amount of pixels (called "stride"), and processes that block. It applies mathematical formulae to those pixels to create a sort of filter, similar to what one might do in a graphics editing software like Photoshop. The way these filters interact with data can allow for things like edge detection, which is important for identifying cellular structures (see how bright that mitotic figure is at 8 o'clock?). From there, a user can down-sample the data (basically shrinking the block by half to 5 x 5 in our example, called "pooling"), and can run more formulae if desired. From there, it goes into a neural network where each piece of data is weighted differently, and additional layers with different weights using different sets of data can be added. It's a bit confusing; one might check out this excellent interactive tool to help understand how these systems are designed, and this tool to understand how the data is transformed. For the first link, design your network, hit "play," and the computer starts going through learning tests, which involve trial and error to see which parameters fit the data best. This strategy does require enough examples of the object the machine is trying to classify; your face can be recognized by Facebook because they have billions of images of faces, and often hundreds of a single person.

My involvement with this research is the identification of mitotic figures using breast carcinomas. This could conceivably allow for improved quantitative tools that could be used in digital histology software (either for diagnosis or for research). Our work is still ongoing, and this page will be updated once we have our work published.