Blippex Go to Blippex.org

Second database dump

Today we are releasing the second database dump of the Blippex database!
This second dump is more than twice as big as the first one, containing more than 4 millions URLs.

The format is the same, it is a BSON dump of our MongoDB database, you can find it on github or download it here direct (577MB, extracted over 1GB).

The database format looks like this (dummy data):

{
	"_id": "b919f02c8f053c41e8ee86311ca9b0f6,
	"url": "https://www.example.com/",
	"host": "www.example.com",
	"root": "example.com",
	"title": "Example Title"
	"time_spent": [
		{
			"sec": 45,
			"seen_at": ISODate("2013-06-23T00: 41: 44.0Z")
		},
		{
			"sec": 5,
			"seen_at": ISODate("2013-07-01T14: 41: 44.0Z")
		}
	]
}

Compared to our “real” database there is one thing missing. Due to copyright issues we do not publish the text of a web page, because we are not eager to spend our money on lawyers instead of developing Blippex.
We are once again curious to see what people will do with it!

Don’t forget to check out our Blippex Hearbeat to see what is going on at Blippex!

Published on 27 Aug 2013 Let's make search human again Get the browser extension to contribute to Blippex