Biological Data Science and Why Context is King

August 13, 2019
Biological Data Science is like picking pink pandas out of a crowd

If I showed you a picture of a cat and told you it was a pink panda bear, would you believe me? How would you go about validating that what I was telling you was actually true? You can do this inherently because you have a built-in network of neurons (not to be confused with neural networks) where you can decide on your own whether an algorithm’s image classification is correct or not. In the case of ImageNet, this is about a 94% accuracy. In biological data science, this isn’t possible and thats why it relies so heavily on expertise and context.

Biological data science makes it easier to classify cats
Stanford CS231n

Biological Data Science and Validating Results

What would you do if I showed you a gene sequence for blonde hair and told you it coded for freckles? Would you take my word for it? I mean, we just met, and on the internet for that matter. Do you already trust me that much?!

No, likely you would have to rely on a second algorithm, most likely BLAST, to tell you what the genetic sequence was that I presented you.

biological data science makes figuring out gene sequences easier.

This is why biological data science is so dependent on domain expertise and context. If you can’t trust the algorithm that is producing results, then you can’t trust your results.

With the explosion of biological data being collected everyday, its even more important to pair domain expertise and biological context with data science skills. While its easy to look at a huge data set of genomic information and just run with analyses, this can often lead to major errors.

Take, for example, the use of a n x m matrix of n features and m patients. If you were to take this matrix and run a 2-D convolutional neural network, you would be implying that the samples next to each other have a relationship. In the case of independent people, this just isn’t true.

Because of the importance of context, make sure you pair biological data science work with a healthy dose of domain expertise to make sure you are doing cool things in the right way.


