Higinio O. Maycotte is CEO of Umbel, a company that uses data to increase the understanding media companies have of their audiences and online advertising revenue. Knight Foundation supports Umbel through its Enterprise Fund. Below, Maycotte writes about the theme of the first Knight News Challenge of 2014: How can we strengthen the Internet for free expression and innovation? Photo credit: Flickr user Kris Krug. The amount of data collected on the Internet is overwhelming. Facebook alone collects 500 terabytes a day. As of 2013, there are 667 exabytes of data flowing over the Internet annually. And these numbers, as hard as they are to wrap our heads around, are only going to continue to increase — rapidly. RELATED LINKS "Towards a stronger Internet" by John Bracken and Chris Sopher on Knightblog.org "Our future's Internet strengthened today" by Jenny Toomey on KnightBlog.org "A $2.75 million challenge to create a more open Internet" by Mark Surman on KnightBlog.org "Creating safe spaces for innovation on the Internet" by Kwasi Asare on KnightBlog.org "Refusing to unlearn a free and open Internet" by Shazna Nessa on KnightBlog.org "4 most common News Challenge questions answered" by John Bracken on KnightBlog.org "Restoring equilibrium to the web" by Tyler Fisher on KnightBlog.org In the journalism sphere, massive data collection has produced data journalist roles. These writers and editors use data collected by third-party agencies to create some of the most viral images on the Web. Anytime The Atlantic publishes a map of the states with the highest poverty levels, they use big data. Anytime The New York Times publishes a quiz about where your accent comes from, they use big data. These stories and photos get shared hundreds of thousands of times and are driving much needed traffic to publishers. This is about much more than an interesting listsicle. Data journalism is about taking big data concepts, visualizing them for the audience and showing readers who they are — or at least, who the data says they are. And this is revolutionary. For the first time in the history of journalism, writers and editors no longer need rely on surveys taken by only a few or numbers published by biased corporations. Now, thanks to the massive amounts of data collected by Facebook, Google, Apple, Amazon and so on — data journalism can be precise, not estimating the sentiment of the country based on a very small sample size. But there’s danger here, too. Consumers, readers, are very often unaware or very poorly educated on the data collection processes that directly involve them. Many of these big corporations, the ones collecting so much personal data about where readers live, what they buy, where they work and who they are in a relationship with, sell their data points to others, allowing people to personalize ads, content and online experience – with very little, if any, consent from the user.