Texas A&M researchers awarded $2.4 million grant to efficiently reduce size of data sets

The U.S. Department of Energy awarded $13.7 million to nine projects as part of the Advanced Scientific Computing Research program
Published: Oct. 7, 2021 at 5:09 PM CDT
Email This Link
Share on Pinterest
Share on LinkedIn

COLLEGE STATION, Texas (KBTX) - A research team at Texas A&M was just awarded a $2.4 million grant to develop efficient ways to reduce the size of massive scientific data sets.

The team, led by Texas A&M associate professor in the Department of Electrical and Computer Engineering Byung-Jun Yoon, says scientific facilities often generate exabytes of data these days. One exabyte is equal to one billion gigabytes. Locating the important data they need can be like finding a needle in a haystack, Yoon says.

Yoon says people tend to think the more data someone has, the better it is for achieving their goal, which is not always the case. That’s why the team is working on a mechanism that can get rid of the unnecessary data without compromising what’s needed.

”We call our data compression approach an objective-driven data compression approach because the first thing is we want to define a metric that can be used for finding out how data compression would affect our final goal,” Yoon said. “We want to measure the impact of data compression on the things we’re interested in detecting or achieving.”

Based on this metric, they can figure out the amount of compression they can use before it impacts the final scientific goal, Yoon says.

“We are focusing on an objective and making sure that any data compression technique that we’re developing is not degrading the achievement of that objective,” Yoon said.

Yoon compares what his team is doing to audio compression, which, when done well, reduces the size of music files without sacrificing audio quality. He says their process is much more subtle because the objectives of scientific endeavors often differ.

“We can start from CD-quality music, which is about 700 megabytes for, let’s say, 70 minutes worth of music, and we can compress it to about 70 megabytes, one-tenth of the original size,” Yoon said. “Not many people think that MP3 or AAC is that bad, and it’s because it is compressed. But it’s in such a way that the quantities of interest, the one’s we actually care about, are not affected too much by that data compression.”

Copyright 2021 KBTX. All rights reserved.