janaka.dev

Introducing Osobisty, My Universal Personal Search Engine

01 October 2021

Over the past 3 weeks I built a universal personal search system which I’ve named Osobisty (which means personal or private in Polish). I’ll dive deeper into why I need Osobisty in another post. The short story is I’m looking at ways to improve how I capture information, learn (turn into knowledge), and recall that information+knowledge when needed. This post is about why I decided to build this rather than “buy” along with some technical details.

What is a Universal Personal Search Engine?

It’s a single place to search and access my curated content both public or private. I currently have private notes (Zettlekasten), Kindle highlights, and bookmarked tweets indexed. Bookmarked web pages are next on the list. Contact list, stared emails , and stared Google Drive content might come down the road. You get the idea.

Inspiration

The penny dropped when I heard Linus Lee (aka The Sephist) talk about his Monocle project on the The Changelog E455 - Building Software for Yourself. The problem(s) he’s trying to solve resonated. Apollo is another personal search engine, inspired by Monocle, which I also looked at.

The reasons why Linus builds his own software also resonated. I used to outright dismiss the idea but I’ll give it more consideration going forward. However I don’t intend to build my own solutions to the extent Linus has. I’m definitely not going to build an entire custom stack and tool chain. I might implement a very basic programming language for fun and learning.

As a side note Linus is a side project machine, he’s obviously a very talented engineer. He shares a lot of interesting thoughts on various topics from creative work, productivity, startups, to life. It’s worth following his writing for inspiration if those topics interest you.

Why build my own solution?

Generally my preference is to buy rather than build. Other than websites and this Desktop Clock I’ve never built software for personal use. My important needs and problems don’t tend to be unique so somebody else has already built a solution that is good enough. The same is of course true in this case so here are my reasons.

My main requirements:

  1. Full control over my private data. I can choose to not expose the system over the Internet. If it’s way less likely to get hacked and leaked in a public way compared to a commercial service.
  2. No data and information lock-in (including meta). I want to manage as much of the data as possible in easy to edit Markdown because want it to be super portable over a long period of time (multiple decades into the future).
  3. Support for ingesting my information sources. This meant having the flexibility to extent with custom source formats.

These criteria exclude SaaS solutions. By those requirements Monocle or Apollo are still in the run, they both can meet my main requirements. However, Monocle is built in Linus’ custom programming language (Ink) and UI framework (Torus). Apollo is written in Golang. If I’m going to spend any time coding I want to use languages and technology that are the most broadly applicable in the industry today. That gives me the most return on time-investment because I gain knowledge that helps with my day job. Which gets at the other key reason to build rather than buy. I want a good excuse to code and build, because I enjoy it. Something that adds value to learning and my day job makes it worth while.

I think it’s obvious why I don’t want to learn Ink and Torus (Linus’ reasons are legit). What’s the problem with Golang then? It’s applicable in the industry. But I don’t want to build a web UI using Golang. It’s the wrong tool for the job in my opinion. If I decided to build the search engine backend myself I may have used Golang but I decided not to hand build that (more about that below). Unlike Linus I don’t have a goal to learn the insides of a search engine implementation. It’s something I’ve already dabbled in, though 10-15yrs ago, and I don’t need to understand it any deeper at this time. Of course assuming I could find a suitable search engine to build on top of…

Technologies used to build Osobisty

UI - ReactJS + TypeScript. Crawlers and indexers - NodeJS + TypeScript Search engine - Typesense a fast OSS search engine that’s really easy to work with. I started out considering DuckDB which I’d come across recently (I think on Changelog E454) which supported text indexing. And Lucene was probably one of my backup options. But then I discovered Typesene. I also looked at Meilisearch but Typesense seemed to be better all-round. I didn’t do a deep analysis, just on paper assessment.

React is a very popular industry choice for web UI and I’m familiar with it. I did consider using this as an excuse to try out NextJS but decided against the overhead for now. I wanted to balance productivity and learning. Given the UI was going to be Typescript I decided to use the same for the indexers, again remove the learning/refreshing overheard of two languages rather double down on one.

As a starting point the UI and functonality is clone of Monocle (full credit to Linus in the code) which I plan to Open Source soon.

I’ve also started to work on an accompanying Chrome extension. More on that in a separate post. This is also inspired by Linus, he has a Chrome extension that does semantic search in Monocle to surface related content to a web page is browsing. It looks like I’ll have to implement the semantic search algorithm.


Personal notes and thoughts on web technology, software development, and technical product management by Janaka Abeywardhana. On Github, Twitter, and Instagram

© 2020-2022, Built with Gatsby