Biological Data Science and Why Context is King
If I showed you a picture of a cat and told you it was a pink panda bear, would you believe me? How would you go about validating that what I was telling you was actually true? You can do this inherently because you have a built-in network of neurons (not to be confused with neural networks) where you can decide on your own whether an algorithm’s image classification is correct or not. In the case of ImageNet, this is about a 94% accuracy. In biological data science, this isn’t possible and
Biological Data Science and Validating Results
What would you do if I showed you a gene sequence for blonde hair and told you it coded for freckles? Would you take my word for it? I mean, we just met, and on the internet for that matter. Do you already trust me that much?!
No, likely you would have to rely on a second algorithm, most likely BLAST, to tell you what the genetic sequence was that I presented you.
This is why biological data science is so dependent on domain expertise and context. If you can’t trust the algorithm that is producing results, then you can’t trust your results.
With the explosion of biological data being collected
Take, for example, the use of a n x m matrix of n features and m patients. If you were to take this matrix and run a 2-D convolutional neural network, you would be implying that the samples next to each other have a relationship. In the case of independent people, this just isn’t true.
Because of the importance of context, make sure you pair biological data science work with a healthy dose of domain expertise to make sure you are doing cool things in the right way.