This content originally appeared on Level Up Coding - Medium and was authored by Joosep Parts
The following article explains how to install and run ElasticSearch with sist2 to index files and search your local machine using Docker. Index content from *.pdf, *.pptx, *.docx, and other such files and use browser UI to search text from inside files.
The motivation for this stack, for me at least, was to prepare for exams. 🤭 Some courses shared 40+ .pdf’s whereas some files had over 100+ pages, so CTRL+F is out of the question. But I still needed to find the right files fast, for which sist2 is great! It gives a small preview of the file and the content. Controlling the length of the returned text is possible via web UI (look at tip#2).
Of course, this doesn’t mean that’s the only use for this tool. Take a look at the sist2 documentation for inspiration to index something else.
I looked at different options to index local files and make them searchable. I found some applications online, but most seemed to be designed for Windows 98. Besides, I wanted this stack to be shareable with coursemates (so they could use this on open-book exams as well).
Alternatives that I tried
Fscrawler using ElasticSearch and Kibana.🤔 Though it worked, I ended up not using that stack. Getting Fscrawler to work on Windows was hard due to the conflicting versions and dependencies required. Besides, Kibana may not be that intuitive for the first time. Also, file thumbnail generation was not supported, and getting more text previews in Kibana or opening files via localhost links was too much granular configuration for me to do.
Tip #1
Before you start indexing, convert .doc, .pptx, etc. file types that require external tools to be opened into .pdf. This allows you to open files inside the browser.
Tip #2
In sist2 UI, go to settings -> Highlight context size in characters — to increase the length of text preview.
How to use
- Install Docker
- Download start.bat and docker-compose.yml
- Run start.bat
Source
Instructions and source can be found on GitHub https://github.com/Nurech/sist2_index_files
CMD script
Does basic setup for folders, pulling of the images, and then running Docker Compose. Drag the files you want to be indexed into the\documents folder. Containers run exactly once, meaning the scan will index files you currently have in the documents folder. If you want indexing and scanning to work in some intervals, you have to set up windows service to run containers periodically that do the indexing and scanning jobs.
https://raw.githubusercontent.com/Nurech/sist2_index_files/main/start.bat
Docker Compose
Starts stack of ElasticSearch and sist2 images.
- Start ElasticSearch
- Scan files and make an index
- Send index to ElasticSearch
- Launch web UI to view indexed files
https://raw.githubusercontent.com/Nurech/sist2_index_files/main/docker-compose.yml
Thanks for reading!
Level Up Coding
Thanks for being a part of our community! Before you go:
- 👏 Clap for the story and follow the author 👉
- 📰 View more content in the Level Up Coding publication
- 🔔 Follow us: Twitter | LinkedIn | Newsletter
🚀👉 Join the Level Up talent collective and find an amazing job
Index and search text from local files using ElasticSearch and sist2 was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.
This content originally appeared on Level Up Coding - Medium and was authored by Joosep Parts
Joosep Parts | Sciencx (2022-11-28T01:11:43+00:00) Index and search text from local files using ElasticSearch and sist2. Retrieved from https://www.scien.cx/2022/11/28/index-and-search-text-from-local-files-using-elasticsearch-and-sist2/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.