In an age of social media and big data, we need to keep our heads above water when it comes to data science, said Michael Hennessey, a professor of economics at the University of Wisconsin-Madison.
“If you want to do good science, you should be looking at the data,” he told Business Insider.
“It’s really not a matter of how much data you have, but how much you can put into it.”
It’s easy to see how this can be tricky, and Hennesys research into the problem of ‘good data’ comes up often.
“You have to have the right data,” Henneses said.
“The data can be very valuable.
You can see that the health of your population is going up, but the data can’t tell you if it’s a good or bad thing.”
The data can also be quite expensive.
Hennes, who specializes in health care, told Business Insights that the typical cost of a research project is around $300,000.
This doesn’t include the time, money and research resources involved.
Huesy, who studies the business of data, said this kind of research can also require the kind of data that’s difficult to acquire.
“That can be costly, and that can be difficult to collect, and the costs can be really high,” he said.
If you want data to be valuable, you have to put it into good form, Hennesies advice for data scientists.
“People need to understand that the value is going to be in the quality, not in the quantity,” he added.
But Hennesy says there are a few things that data scientists can do to make their data more valuable.
“I would say that the most important thing is to use a good data set,” he explained.
“Data sets are really good at capturing the essence of what we are trying to understand, and in this case we are interested in a particular disease or a particular population.”
The quality of data also needs to be high, Henesys advice goes.
“In a lot of cases, you are looking for a certain number of statistically significant correlations.
But if you are not careful, you can miss something,” he says.
“We are trying so hard to make sure that the data are as consistent and valid as possible.
If we don’t have the data, it’s really hard to know what to do.”
This is not the first time Hennes has come across the concept of ‘quality over quantity’.
In 2009, he said he noticed that many medical journals weren’t using high-quality data sets, so he decided to write a paper to show why.
“Most of the people who were using this [low-quality] data set had not read a lot about it,” he wrote.
“They were not reading it, and they didn’t know anything about it.”
He also noticed that the quality of the studies was low, with a large number of studies with very few conclusions.
“A large part of the reason for this is that we do not use the data that is available in the field,” Hensys said.
He said it was also important to use good data sets that were not necessarily representative of the population they studied.
“When you study this topic you are trying not to take data from people with high risk, people with very different backgrounds and the like, but from people who have similar health problems,” he continued.
“So I don’t want to say that you shouldn’t use data from the population with high mortality, but I want to avoid using data from that population.”
But it was the quality and quantity of the information in the studies that made them useful, and this was also why Hennesos study had some strong conclusions.
It showed that the risk of heart disease was higher among people who had a high BMI, and it also showed that people with diabetes were at higher risk of developing heart disease.
“What we saw is that people who do not have diabetes and people who don’t develop diabetes are at higher risks,” Hneses said, explaining that this might have something to do with the high amount of fat in the body.
But the data needed to be robust, and good data was required to make those conclusions.
Hinsys also said that the researchers had found some positive effects of using a good set of data.
“This study was quite comprehensive,” he noted.
“For example, it looked at all the studies and it was very detailed.
So if you were to look at this with a statistical tool that looked at one study, you would see some of the findings are not true.”
But this was not a straightforward study, as it was written in the context of a lot more data.
Hensies suggested that data should be used to inform decisions about treatments, rather than simply to find out if a treatment worked or not.
“My suggestion would be that you use the best data possible,” he suggested.