by John Dodge

Was Wikipedia the Start of Big Data?

Oct 18, 20134 mins
Big Data

How many times a week do you use it?

Wikipedia has a wonderful definition of Big Data. It’s complete and full of examples, all backed up by 89 source references.  

If you think about it, Wikipedia is Big Data. Thanks to Wikipedia (and search engines), you can find expansive definitions and spellings for just about everything.   

For all it’s vastness, I chuckled when I saw it cannot keep up with itself. It’s own definition says it has 365 million users and is the #6 popular web site in the world. It’s own page for “most popular web sites” also ranks it #6. 

But the appeal for donations plastered on all its pages says it has 500 million users and is now the #5 web site. Time to update its own definitions, I would think. 

Dear Wikipedia readers: We are the small non-profit that runs the #5 website in the world. We have only 175 staff but serve 500 million users, and have costs like any other top site: servers, power, programs, and staff. Wikipedia is something special. It is like a library or a public park. It is like a temple for the mind, a place we can all go to think and learn. To protect our independence, we’ll never run ads. We take no government funds. We survive on donations averaging about $30. Now is the time we ask. If everyone reading this gave $3, our fundraiser would be done within an hour. If Wikipedia is useful to you, take one minute to keep it online and ad-free another year. Please help us forget fundraising and get back to Wikipedia. Thank you.

For journalists like me, Wikipedia is a godsend (I donated). I cannot explicitly attest to the overall accuracy of its 30 million articles in 287 languages, but the ones I have accessed over the years have generally been on the mark. I have heard few challenges to its accuracy although researchers and students are discouraged from using it as a primary source.

It’s just too easy to use Wikipedia just like it was to do math with electronic calculators in the 1970s. We knowed how that worked out.  

Here’s what the respected science journal “Nature” said about Wikipedia’s accuracy in 2005:

“Wikipedia is often cited for factual inaccuracies and misrepresentations. However, a non-scientific report in the journal Nature in 2005 suggested that for some scientific articles Wikipedia came close to the level of accuracy of Encyclopædia Britannica and had a similar rate of “serious errors.”

I have some sympathy about how current its definitions are given the challenge of keeping up. It need look no further than the pages about itself to understand the impossible race against time.   

Besides citing producers of Big Data like Walmart, the Large Hadron Collider, the human genome and Facebook, the Wikipedia Big Data page offers critiques, sizes up the market, runs down research-related projects and architectures. It’s more a white paper sans vendor bias than a definition! Indeed, Wikipedia is more encyclopedia than dictionary.

I have read the “editing” section of Wikipedia’s definition of itself several times and I am still not sure how these encyclopedic segments come together so coherently given that they are compiled by outside contributors. Consider:

“In a departure from the style of traditional encyclopedias, Wikipedia is open to outside editing. This means that, with the exception of particularly sensitive for vandalism-prone pages that are “protected” to some degree,[25] the reader of an article can edit the text without needing approval, doing so with a registered account or even anonymously.”

Now in its 13th year, Wikipedia is an amazing demonstration of how Big Data works for everyone.

By the way, there’s 175 staffers at Wikipedia and I bet more than a few know a thing or two about Big Data.