A Better Personal Information and Insight Pipeline

I have some reoccurring frustrations that revolve around several information and insight areas.

I have tried out several news apps, and Zite was pretty decent before they were bought out and replaced by Flipboard, which isn’t as nice.
I have this issue where I run out of fresh articles to go through; with all the content that is available I should hardly ever run out of content. Even if it is less and less related, it should just keep going and going, feeding me me related items.
And there are things that get in the way such as the inability to copy text from an article so I can use to to search further on something that I want to learn more about. That is a basic thing that just shouldn’t be a problem.
What would be nice in a news feed solution is pre-parsing the article for terms and linking on long-touch/double-tap out to relevant articles wherever there is a name or concept etc.

On the whole Gmail is pretty good and does a good job at things like spam filtering, but my major issue with it is search and analysis.
Now Google’s inbox app is pretty good and headed in the right direction for extracting insights. What I really would like is a power user dashboard that has things like timelines and significant word/phrase analysis. To see what I was doing at certain time periods based on my email, and the ability to create summaries.
I know these things are starting to get into business intelligence products, but it would be very nice to have it all integrated.

General Research and Learning
And another broader frustration I have is not being able to learn things as efficiently as I would like when I go to research about one subject or another. Google may be the best at general web search, but their products could be way way better.
The technology exists, its just getting it out into the consumer product level. And they are headed in the right direction with summery answers for question searches and image categorization tags (those are really cool).

Which brings me to the solution side of things. I think to bring all this together and better feed my insatiable appetite for information and insights, the best option from and expediency and professional skill development side of things, is to start rolling my own solutions with tools that are widely available.

Elasticsearch is one thing that can be one of the primary cornerstone technologies, and to really go farther, network graph visualizations, some traditional machine learning, and perhaps even some neural networks.
Using Python of course. 🙂 As a major data platform language, and it’s high level nature to make development more efficient.

Once I get past the aggregation and basic analysis stage, I’ll have to delve more deeply exploring cloud machine learning solutions and limit custom work to things that aren’t available, at least for free or cheaply.

I have been working more with Elasticsearch recently, at my day job, and personally I recently did setup on my cloud virtual server installing Elasticsearch directly rather than the Docker container it was in; it was giving me issues and I decided it was more trouble than it was worth.
And I secured Kibana through port forwarding with authentication, and limiting the native port to localhost only, and Elasticsearch is set to local host only. I may explore Kibi some more (https://siren.solutions/kibi/), and especially their plugins (https://siren.solutions/searchplugins/).
I will have to put an API in place to serve out the results; I’m thinking Flask looks to be a good solution for a thin API framework.
Also the thought occurred to me at one point I could build a Elasticsearch extension that used the natural language processing library SpaCy (https://spacy.io/), which will be a pretty useful NLP analysis tool. I need to play around with that library some more.
D3 could play a prominent role in the visualization side of things.
A pretty big project, but could be pretty useful from a number of angles.

I need to start a Redmine project and start scoping out my wish list of features. 🙂
But there are some basic things I could do like indexing my email in Elasticsearch.
The news data pipeline is more complicated; I would really like to have access to my own copy of the Internet. 🙂
Ideally I would scan and extract insights and links without storing actual content.

This doesn’t purely have to be about scratching my information itch, there is also the professional development in the direction I would like to head anyway, but it would certainly be empowering not to have to rely on everyone else’s implementations and features, especially since I am not wealthy. 🙂
But if I was I could do it anyway and build my own implementation team. 🙂 All in good time…

Anyone have suggestions of things to look at as I explore this project?

