Re-thinking document organization




Document Storage

If you are a big organization with tons of documents piling around, you might have been searching for a perfect application for organizing your document inventory. 
You might have already been using BOX, Google’s drive, Dropbox or something similar. They do solve the need to place your documents at a single place, accessible to all, which is great. However, what these tools fail to understand is the fact that organization’s document inventory is like a big, un-organized library; storing it is just very small part of the problem.

The real challenge is to find a way to catalog, classify, tag and place the document where they rightly belong to, without having to hire a workforce for this task.

Well that exactly is the problem statement we were faced with.

Engeo wanted an application that deals with their need to organize their large set of document inventory. There were largely PDF files, emails data, application specific files beside a huge inventory of image files. What they needed was an ability to find the right document in the first search.

Daunting task, huh! 

What we did?

  • Angular Frontend 
  • Elastic search  
  • Machine Learning


We instantly organized all the vital components required. For document storage, we searched for a cloud provider that could provides a highly scalable and secure service. We finalized on Amazon’s S3, however, we built two variations, giving the application the ability to use local storage as well a secure SFTP connection to store files on remote host. 

Content extraction

We needed a way to extract textual content out of the documents, for which we used Apache Tika.

Fast search

To implement fast search a cluster of Elastic search instances was programmed very diligently to optimize the search scenarios for our parent organization.


At the core of this was the REST Api, sitting between all these different components, and exposed over a REST API to an angular front-end app as well as other components. All backgrounds task were handled through a Redis based queue, optimized for higher through puts. 

Auto Tagging & Classification.

The brain of our application was built using #textacy & #Spacy which are industrial grade Natural Language processing frameworks.

An overview of the final architecture we settled for.


We use Atlassian’s Jira to handle task backlogs and defining user stories.
  • Bi weekly sprints 
  • Project Planning Using Jira
  • Task tracking through git commits


New feature and development is continuing on FAF. Application like FAF are highly beneficial for organizations with huge amount of data; needing classification and cataloging. 

Get in touch  

If you are interested in

 — discussing your new project’s technical feasibility

 — or looking for a tech team to revamp your existing product

 — looking to build a prototype or an MVP to test the market

Feel free to send us a note at usama@esipick .com

Similar Projects