Re-thinking document organization
If you are a big organization with tons of
documents piling around, you might have been searching for a perfect application for organizing your
You might have already been using BOX, Google’s drive, Dropbox or something similar. They do solve the need to place your documents at a single place, accessible to all, which is great. However, what these tools fail to understand is the fact that organization’s document inventory is like a big, un-organized library; storing it is just very small part of the problem.
The real challenge is to find a way to catalog, classify, tag and place the document where they rightly belong to, without having to hire a workforce for this task.
Well that exactly is the problem statement we were faced with.
Engeo wanted an application that deals with their need to organize their large set of document inventory. There were largely PDF files, emails data, application specific files beside a huge inventory of image files. What they needed was an ability to find the right document in the first search.
Daunting task, huh!
What we did?
- REST API
- Angular Frontend
- Elastic search
- Machine Learning
We instantly organized all the vital components required. For document storage, we searched for a cloud provider that could provides a highly scalable and secure service. We finalized on Amazon’s S3, however, we built two variations, giving the application the ability to use local storage as well a secure SFTP connection to store files on remote host.
We needed a way to extract textual content out of the documents, for which we used Apache Tika.
To implement fast search a cluster of Elastic search instances was programmed very diligently to optimize the search scenarios for our parent organization.
At the core of this was the REST Api, sitting between all these different components, and exposed over a REST API to an angular front-end app as well as other components. All backgrounds task were handled through a Redis based queue, optimized for higher through puts.
Auto Tagging & Classification.
The brain of our application was built using #textacy & #Spacy which are industrial grade Natural Language processing frameworks.
- Bi weekly sprints
- Project Planning Using Jira
- Task tracking through git commits
New feature and development is continuing on FAF. Application like FAF are highly beneficial for organizations with huge amount of data; needing classification and cataloging.
Get in touch
If you are interested in
— discussing your new project’s technical feasibility
— or looking for a tech team to revamp your existing product
— looking to build a prototype or an MVP to test the market
Feel free to send us a note at usama@esipick .com