Data Resources to Feed the Machine

Here’s some data resources to feed your hungry ML models or other applications:
Remember to check the data licenses for commercial friendly terms 🙂 Viva la profit 🙂

Ideally we can leave the web scraping and data gathering to others – at least one layer of separation and saving so much time.

The Granddaddy:
“We build and maintain an open repository of web crawl data that can be accessed and analyzed by anyone.”
Open Super-large Crawled Aggregated coRpus”
Lots of categorized data goodies here
Open Data Sources”
This dataset could need extra scrutiny and cross referencing for accuracy and facts-made-up/skewed bias,
but still a lot of potentially useful stuff still