Yep, garbage in, garage out. I'm a data scientist and you'd be surprised to know how many of my peers think you can solve all kinds of problems simply by having a mountain of data, without questioning how that data relates to the underlying real processes that generate it. If there's a ton of noise in the data generating process or it is…
Yep, garbage in, garage out. I'm a data scientist and you'd be surprised to know how many of my peers think you can solve all kinds of problems simply by having a mountain of data, without questioning how that data relates to the underlying real processes that generate it. If there's a ton of noise in the data generating process or it is far removed from the systems you want to understand or improve, you're SOL. You can spend heaps of money and resources to create only confusion.
This is pretty disheartening. Even as a psych undergrad we had the importance of validity, reliability and the importance of research design hammered into our brains. When did people decide that anything called research is just hunky dory?
Unfortunately in tech most of the attention is given to sexy new tools, and much less attention is given to boring old fundamentals like statistical validity. Incidentally stats is harder too and requires more critical thinking than just hammering a problem with compute power. You can get through a CS degree and become a "data scientist" with only the barest minimum of statistics fundamentals. I studied math and got a solid dose of stats with it, so I was relatively better off in that way, but not everyone comes to data science or ML via that route. Now you have the leading "AI" labs doing "benchmarks" on data that was probably in their training sets.
Yep, garbage in, garage out. I'm a data scientist and you'd be surprised to know how many of my peers think you can solve all kinds of problems simply by having a mountain of data, without questioning how that data relates to the underlying real processes that generate it. If there's a ton of noise in the data generating process or it is far removed from the systems you want to understand or improve, you're SOL. You can spend heaps of money and resources to create only confusion.
This is pretty disheartening. Even as a psych undergrad we had the importance of validity, reliability and the importance of research design hammered into our brains. When did people decide that anything called research is just hunky dory?
Unfortunately in tech most of the attention is given to sexy new tools, and much less attention is given to boring old fundamentals like statistical validity. Incidentally stats is harder too and requires more critical thinking than just hammering a problem with compute power. You can get through a CS degree and become a "data scientist" with only the barest minimum of statistics fundamentals. I studied math and got a solid dose of stats with it, so I was relatively better off in that way, but not everyone comes to data science or ML via that route. Now you have the leading "AI" labs doing "benchmarks" on data that was probably in their training sets.