Listen to this: our Linux podcast kicks ass. Subscribe for free!

The best Linux desktop search tools

Group Test

Tools such as grep, find and awk have often come to the rescue of gleeful Bash-mongers searching for files buried beneath gigabytes of other items. But when a typical Linux distro takes up a couple of gigs of disk space, it's not hard to imagine that finding your files will only become trickier over time.

Compared with their internet brethren, today's desktop search tools can be used not only to look for the names of files on your disk, but can also perform context-sensitive searches within email archives, images, videos and music. Some tools take it a bit further and even index your browser history and bookmarks. But with so many different tools to choose from, often offering the same or similar features, just which are worth trying?

We picked out the best desktop search tools for Linux and put them through their paces - read on to find out how they fared!

(PS: if you much prefer working on the command line, don't miss our how to find files on the Linux command line tutorial!)

How they work

Desktop search tools work by creating an index of all the files on your system. When you search for a file, instead of looking through the entire disk, the tools only search the index. Since the reason for the existence of desktop search tools is to enable you to find files faster and more conveniently than conventional tools such as find and grep, they need to be fast and reliable, and offer as much information as possible to help you identify items.

Most tools can therefore read the metadata on files, provide you with an excerpt from text documents, show you the resolution of images (along with a thumbnail) and provide other details for each file. Stemming - which means that, for example, if you search for 'beat' the tool will also match 'Beatles', 'deadbeat' and so on - is standard.

Convenient graphical interfaces are a common feature of most desktop search tools, but many also come with a suite of command line applications to help you index and search your system for files.

How we tested

Installation hiccups and large memory footprints are no concern for the desktop search tools in our list, with 512MB RAM sufficing for most of them. Since the size of the index grows with the number of files, you want a tool that can find items quickly and accurately.

So, how vague can you be and still end up with the file you want? And, on the other hand, how much information can you fill in to be as specific as possible? Can you use wildcards? Bonus points go to any tool that keeps the size of the index small and offers numerous search options, such as limiting the search to certain MIME types. Features such as stemming, reading metadata, reporting on exotic formats and searching text within files make for a great tool.

Beagle

Having been around since the glory days of Richard Burton, discussions about Beagle's memory-hogging habit should really have come to an end by now. If anything, however, it's become a prime example of an application that just can't drop an old tag, however inaccurate. Not surprisingly then, a large number of potential new users are scared away from Beagle because of the numerous forum and blog posts trashing it for its insatiable appetite for memory. The latest version, 0.3.9, is available in the repositories of just about all distros.

Controlling what directories to index and what paths to ignore has become standard in most desktop search tools, and Beagle doesn't disappoint. Unlike most other tools, however, it also enables you to index your emails, IM, RSS readers, address book and more, in addition to the browsing history and bookmarks from your browser. And, with its built-in Inotify support, Beagle updates the index as soon as it detects any changes to files or directories.

Beagle can index a wide variety of application data, including IM chats and RSS feed readers.

Beagle can index a wide variety of application data, including IM chats and RSS feed readers.

By default, it indexes everything in your home directory, excluding the default *~, ~.tmp and other such paths. To change this behaviour, start Beagle, listed as Search under the Application > Accessories menu, and click Search > Preferences. From the Indexing tab of the Search Preferences window you can specify the directories to index, as well as the paths to exclude.

In addition to the graphical interface, Beagle has an extensive suite of command-line tools that you can use to create an index and search files. The command beagle-search .txt launches the graphical interface and shows you search results for .txt. The alternative is to use beagle-query, which prints out the search results on the terminal itself.

Browser interface

Beagle is equally at home on Gnome and KDE, but if you prefer a neutral environment, it can easily be arranged. To enable the web interface, open a terminal window and type the following:

/beagle-config Networking WebInterface true

You can access the experimental web interface at http://localhost:4000. This is supposed to be accessible from other machines on the network too, but hey, it's still experimental.

When using these desktop search tools, remember that few of them can differentiate between filenames and file types, so searching for "mp3" and ".mp3" gives you very different results.

Beagle can extract text and metadata from a host of filetypes including Office documents, plain text, HTML, DocBook, various image and audio formats, and more. When looking for files, you can refine your search to one of the 14 available categories, such as Pictures, Media, Files, Archives, Mails and so on. Select a type from the Find In drop‑down list to narrow the criteria. Beagle displays only eight items per page, so it's best to refine the search as much as possible, or you'll be clicking away at the tiny blue navigation arrow until your mouse gives up.

If you don't specify a category when searching, Beagle will still break up the results into different categories, such as Images, Documents or Folders. You can then scroll through several pages of each of these if there are many results.

Search using wildcards and the OR and AND operators to quickly find files.

Search using wildcards and the OR and AND operators to quickly find files.

Beagle can also look for search terms within files. If your search returns files that contain the query term, clicking on the file will reveal a partial sentence or matching text within that file. This isn't true for PDFs, for which Beagle only gives you a thumbnail.

The interface isn't very aesthetically pleasing, but the real beauty of Beagle lies in its complex search options. You can, for instance, prefix terms with a minus sign to exclude them from your search, or use the OR operator to define your query, or the date operator to limit the search within a date range.

Our verdict: Surviving despite the bad press, a little interface redesign will spell victory for Beagle. 9/10.

Google Desktop

Release cycles for most distros being what they are, it seems aeons ago that Google Desktop was available via the online software repositories. Nowadays, all you've got to do is head to the project's homepage and download the latest release. There's even a 64-bit version. Additional points go to Google Desktop for providing RPM and Deb binaries.

Once installed, Google Desktop sits pretty in the system tray and begins indexing immediately. While indexing the disk doesn't tax any of the system resources, it does take its own sweet time. Configure it to your liking from the get-go and define what directories to index and what types of files to ignore.

Google Desktop can also keep tabs on your browsing history and all of your email accounts, thanks to the incredible Thunderbird support. You can configure it to index your Gmail account too, even if you only access it via the browser.

The convenient search applet can be accessed by pressing the Ctrl key twice - and you can put it somewhere on your desktop for quick access. As you start typing into the search applet, matching results will be displayed there. You can then use the arrow keys to move down to the file you were looking for and hit Enter to open it. You can also click on See All Results In A Browser to open a new Firefox tab, if it's running. If not, Google Desktop launches Firefox for you.

Displaying six results by default, the applet can be configured from the Display tab of the User Preferences.

Displaying six results by default, the applet can be configured from the Display tab of the User Preferences.

Google offers many diverse services, such as Groups, Maps and News, and you can configure the applet to search for the specified term in any one of them. Right-click the system tray button and choose the Default Search Type from Web, Desktop, News, Groups, Images and even I'm Feeling Lucky.

When you search your desktop, the results are displayed in the browser, much like a traditional Google search. A small icon to the left of each result reflects the type of file, so you can never confuse your MP3s with your emails. For text and PDF files, it'll also show you a small excerpt under the results. Clicking on the files opens them in the relevant associated application. Emails are opened in the browser itself, but you do get the Reply With Gmail and the Read In Gmail options when applicable.

Feeling lucky

To perform an advanced search, right-click the system tray button and click Show Home Page. Next, click the Advanced Search link. You can now limit your search to certain file types. For instance, if you only wish to search for ODT files, click the Files radio button and select OpenOffice.org Writer from the File Type drop-down list.

Despite the clever design, Advanced Search has one problem: you can't use it to see a list of all files of a particular type. Despite all the categories and file types to choose from, you must always specify a search term. So, if you want a list of all the PNG or MP3 files on your machine, just search for ".mp3" without going into the Advanced section.

Narrow down your search using Google Desktop's extensive Advanced Search options.

Narrow down your search using Google Desktop's extensive Advanced Search options.

There's one Google Desktop feature that's not offered by the competition: Google calls it file versioning, and we call it a brilliant idea. Each time you edit a file, Google Desktop creates and stores a cached copy that you can access later if needed. Click the Cached link at the bottom-right of the result item whose copies you wish to review. All cached items are displayed, with the most recent at the top.

You can configure Google Desktop to no longer index deleted files, but if you do that, you'll also lose the ability to rescue files you've deleted accidentally.

Since Google Desktop is also available on Windows and Mac, it's natural to compare the features available on each platform. While the Linux variant works flawlessly, has a slick interface and can index your emails as well as browsing history, we do feel cheated to find that the bling of the proprietary versions is missing from the Linux version, even two years after Google Desktop 1.0 was released.

Our verdict: Fast and reliable with impressive features, but bare-bones compared with the Windows version. 8/10.

Tracker

Here's another desktop search tool with a fetish for your system tray, but only if you run Gnome. KDE users can feel free to hunt for it in the beloved K menu. Tracker, like most others in this test, is still to reach the pivotal 1.0 release and yet it too is robust, elegant and effective.

Available from the software repositories of most distros, Tracker is a bit more fussy about getting started. How else would you define a file indexer where indexing is disabled by default?

Use the tracker-preferences command to launch the Preferences window and enable indexing. Unlike the other tools, you can configure Tracker to index certain directories, but not actively watch them. That is, if you change the files within a directory that Tracker isn't watching, the changes won't reflect in the index.

Results are divided into the categories on the left. Use the arrows to navigate pages.

Results are divided into the categories on the left. Use the arrows to navigate pages.

The Preferences window comprises several tabs, each of which deals with different aspects of Tracker. You can specify the paths and patterns that you want Tracker to ignore from the Ignored Files tab and enable Evolution email indexing from the Email tab. Future versions will support indexing browser bookmarks, history, notes, tasks etc.

You can't change the default Tracker behaviour of displaying 10 results per page, so keep an eye out for the Next and Previous buttons to scroll through pages. When you click on a result, Tracker shows you details about the file, such as its dimensions if it's an image.

Tracker also lets you add tags to each of the indexed files. You can use the same tags for different files and thus create a collection of grouped content that can be accessed easily. Unfortunately, this isn't very reliable because Tracker sometimes fails to display all files assigned the same tag.

Our verdict: A definite podium contender if it can improve its launch time and fix the broken tagging feature. 7/10.

Strigi

The fastest and smallest desktop search tool (as the Strigi developers claim) disappointed us, never once returning the correct search results for a query. Here's hoping that your mileage varies...

It can be installed easily enough from the software repositories of most distributions. Designed to be a graphical replacement for grep and find, its many shortcomings make Strigi our least favourite tool on this list.

You can launch Strigi using the strigiclient command or the Alt+F2 quicklauncher. By default, it doesn't create an entry in any of the menus. It begins indexing the directories listed as soon as you click the Start Indexing button. Depending on the number of files involved, the size of the index can be very large, so you might have to keep a careful watch on that number.

It can find many files, but never the ones you want. There's no mechanism to scroll the pages.

It can find many files, but never the ones you want. There's no mechanism to scroll the pages.

When you're ready to use Strigi, begin typing into the text bar at the bottom of the window. Matching results are displayed pretty much for each keystroke, so it does deserve the title of being the fastest search tool. But while Strigi claims to be able to search inside archives, all tests point to the contrary.

The biggest problem with Strigi is that it'll only show you the first 10 results for your query, so even if it finds 55 matching items, you can't browse this list. Being forced to refine your search so the file you want is listed in the first 10 matched items defies the entire premise of desktop search tools. Even command line noobs will probably have more luck using find and grep.

For a tool that racks up a 200MB index in half an hour without breaking a sweat, the fact that it can't find the file you're looking for is beyond baffling.

Our verdict: Not even usable enough to test the claims made on its website. Never shows any file you're looking for. 1/10.

Recoll

With the might of the Xapian search engine at its command, the lightweight Recoll might just be the tool for you, if you can get it up and running. Almost no distro carries Recoll in its software repositories, and the tool's dependency list might put off a few users. To begin, you need xapian-core, plus Qmake and Qt. Fortunately, these are readily available from the software repositories. This would translate to a 10-place penalty on the starting grid were it not for the packaged binaries for Ubuntu, Fedora, Mandriva and other distros.

Recoll starts indexing as soon as you click File > Update. It stores the index in the ~/.recoll/xapiandb/ directory. By default, it'll begin indexing from your home directory, including any mounted partitions or SMB shares. Distros that use Gvfs, the replacement tool for Gnome Virtual File System, mount shares under the ~/.gvfs/ directory, so when indexing begins at the user's home directory, it also engulfs the mounted shares. You can configure Recoll to avoid certain paths and directories in the Prefs window.

Along with the search bar at the top of the interface, you can use the All Terms drop‑down list to select one of Any Term, All Terms, File Name and Query Language. You can then limit your search to text files, or any other MIME type, by clicking the relevant radio button from under the search bar. For example, when you're looking for emails, select Messages.

You can use the *, ? and square bracket wildcards when searching for files in the index. This, along with the auto-complete feature (accessed by pressing Esc+Space) gives Recoll a slight edge over the other tools. For example, typing "pyt" and pressing Esc+Space displays a list of possible terms such as Python, pytype etc.

Hard-to-find files are easily revealed when you spend enough time with the advanced search.

Hard-to-find files are easily revealed when you spend enough time with the advanced search.

Go fetch

Depending on your search term, and how many results turn up, you might have to browse through several pages to find what you're looking for - simply click the Next Page link at the top-right of the results panel. When displaying the results, Recoll prints a small excerpt with each, but this might not be enough to decide if the result is what you're looking for. Helpfully, you can click the Preview link next to the entries in the results list to read the contents of the file in Recoll's internal document viewer.

To help you refine your search, Recoll enables you to define keywords and then filter them by All Of These, None Of These, Any Of These and other such clauses from the Advanced Search dialog. You can also limit your search to specific MIME types, such as PDF or spreadsheets. Finally, if you know the general location of the file, you can contain the search to defined subtrees.

While most other tools keep a constant eye on your disk and keep the index abreast of any changes, Recoll by default only creates a static index. This means you must manually update the index by clicking File > Update if you wish to find up-to-date search results.You can, however, use Cron to configure periodic indexing. This is both a curse and an advantage: system resources aren't constantly manhandled in an attempt to keep the index up to date, but it requires users to take an extra step compared with the other tools.

Using only HTML, you can easily change the look of the results page. It's easy, fun and unique.

Using only HTML, you can easily change the look of the results page. It's easy, fun and unique.

But there's a ray of hope, at least for those who want to compile Recoll themselves. File Alteration Monitor (Fam) and Inotify are two tools that monitor the filesystem for any changes. When compiling Recoll, you can enable support for either of these with the --with-fam or --with-inotify options.

Recoll isn't built to index all file types. To index PDF, MP3, RTF, MS Office and a few other exotic formats properly, you need to install additional packages, such as Antiword (for MS Word), and Catdoc (for MS Excel and PowerPoint). Without these tools, only filenames will be indexed and Recoll won't offer an abstract or the Preview function.

Our verdict: Faces tough competition from Beagle. The option to create static indexes is a real advantage. 9/10.

Our choice: Recoll

The most startling aspect of desktop search tools is that many are yet to reach the big version 1.0. Despite that, each program grows more impressive with every release. While the majority of their features are the same across the board, almost every application has something unique to offer.

As desktop search tools eat up enough disk space to shame the great mountain apes, you need to carefully choose the one that you want to implement on your system. While most of these tools have celebrated a few birthdays already, desktop search as a whole (and these tools individually) doesn't have a large user base. You'll still find more people discussing grep than you will Recoll.

Quite possibly, this stems from a misconception that desktop search tools are resource-hungry. Nothing could be further from the truth.

Top three

Although we weren't exactly spoilt for choice, it wasn't easy deciding the finishing order because the three podium contenders, Google Desktop, Beagle and Recoll, made for some stiff competition. Here's hoping that this also drives innovation.

But first, let's consider the ones that didn't make it to the top three. The worst thing about Strigi is its lack of page navigation. You can't use a tool if it won't let you browse through the pages of search results. Tracker, however, has all the makings of a top contender. The ability to specify which directories to watch and index translates to a smaller index size, which is why it gets 6/10 despite the broken tagging feature and slow launch time.

Coming in at number three is Google Desktop. This tool gets bonus points for using the browser to display results, but the advanced search section could still use a little refinement. The auto-suggestion feature in the search applet is helpful, although the fact that it's limited to showing only the top six results in the applet by default is a definite design flaw in our book.

Meanwhile, the unfairly maligned Beagle comes in at number two. One of this tool's strong points is that it keeps the size of the index small, especially compared with Strigi and Google Desktop. With a network-accessible browser interface in the offing, Beagle's future as one of the best is sealed.

With strong competition from Beagle, especially in the number of search options, Recoll just manages to cling on to the top spot. Forcing the user to index the system manually is actually a clever design. For anyone who's convinced that desktop search tools eat up too many resources, this offers the opportunity to index the system at a convenient time. It should therefore help Recoll to attract users who'd normally distance themselves from desktop search tools.

Defining the keywords to look for and avoid for each search helps Recoll accurately locate the correct file each time. Other tools should also consider adopting this feature.

Over to you!

Have we missed off your favourite file finder? Did we rate Strigi too low? Should we have ignored Beagle because some people hate Mono? Send us your comments below!

First published in Linux Format

First published in Linux Format magazine

You should follow us on Identi.ca or Twitter


Your comments

Arch Linux has Recoll

Recoll is in the Arch Linux repos.

Tracker 0.7

Pretty nice comparison you got there! really well written.

The only complaint I can make of it is that you probably used Tracker 0.6.X for your review. With the 0.7.X series Tracker has gotten A WHOLE LOT BETTER.

Maybe you should look into their latest version (0.7.15 as of now) and review it again?

Same experience, different conclusion

Unfortunately, Recoll is the best indexer. Its GUI is just awful. No decent linux desktop search exist yet. A maximal note of 5/10 for Recoll is an optimistic maximum.
You have not tested a java program that was fairly efficient. I have not test the recent version. I will try soon.
http://scan.sourceforge.net/

GNOME Do's Locate plugin

The GNOME Do Locate plugin works nicely: http://do.davebsd.com/wiki/Locate_Plugin

They all suck.

I go through periods ( depending on what I'm doing and how much I need Windows and what state my installations of Windows and Linux re ) that I spend more time on Windows than Linux and vica versa. The last time I did mostly Windows, I installed Google Sidebar (and with it Google Desktop ). I wound up removing it because Desktop was always hogging CPU time.

As for the other apps, I've used some but not other but it is clear to me that they all lack the most important feature. I find that when I am searching for something that is not in
some relatively small subdir ( meaning at most a few seconds to search with find ) I rarely look for it by name because I garble the name ( for example a tuxradar podcast might be saved under the abreviantion tuxrdr_pod, and I forget the abbreviation of rdr or reduction of podcast to pod ) when searching. So for me the best searches are generally some subset of a name , along with a filesize, and a date range.
Something I cannot find on the programs I've tried out.

Furthermore the realtime updating drains my computer so much that I do not find the limit search abilities they give me to be worth it. The funny thing is that I since I am dual boot I keep the lions share of diskspace in NTFS file systems so I can access the data from both platforms. None of the indexing programs I've seen so far uses NTFS's MFT to index a partition despirte the fact that even for large partitions, using MFT for indexing only takes a few seconds.

MFT

How about multi-user indexes?

Linux is designed from the bottom up to be a multi-user operating system, but there seem to be many tools that are confined to work only on a per-user basis. If there are multiple users on a machine wanting to index a common shared file system, they will each have a copy of the same large index database.

Is there an indexing tool that will keep a single copy of an index that can be shared by multiple users?

Mal

Heavy thougths

Very nice article, I like it, not so interested in desktop tools, but Recoll seems to be great, thanks!

Google Desktop

When I read this article in Linux Format I installed Google Desktop. As I have had bad experiences with resource hogging search tools in the past I tended to avoid them. I am running Google Desktop under KDE 4.3.4 and Sidux AMD64 and so far I have never noticed it running. It took two days to index my /home folder, but now it just ticks along unnoticed in the background and when needed it "just works".

I am finally happy with a desktop search tool.

Thanks

RE: How about multi-user indexes?

I thought Google Desktop could be set up as a server and used within an intranet. Can anyone confirm this.

So, about Beagle...

How much memory did it consume, Graham?

Recoll easily beats the rest by miles.

Looked into this about 18 month ago.

Beagle is a dog... no _really_, just, bad, bad, bad.

Tracker was a waste of time - the presentation of the results sucked, so bad I turn it off in Karmic.

Google looked good but crashed and crashed... and its proprietary... and not great at finding what you want.

Catfish is a good front end for slocate and find - simple and clean. slocate is fast as it is regenerated each night.

Recoll is just fantastic for finding text in a document. I have 5 years of saved html data (500K items, totalling 5GB) and if its there it will find it. Really the rest are toys compared to this program.

Oh and not a resource hog unlike the usual suspects.

Also it is all packaged and working in Karmic.

Sweet :)

strigi and UI

Strigi's primary usage is as the file indexing engine for Nepomuk and nearly all the usages of Strigi from user interfaces happens via Nepomuk.

While you can make direct Nepomuk searches, the primary usage for this is to integrate search features directly within applications.

While there is file searching using Nepomuk (and therefore strigi) available in Dolpin, Konqueror, the Plasma Folder View and the Run Command (aka "KRunner") window, the trend of embedding such search within applications that is contextual to what the app does is likely to accelerate as we find users find that useful and natural (especially when the searching is matched up with tagging and other metadata creation, either implicit or explicit).

Personally, I find complex search tools to be of limited use and utilization by the user base. There are so many of them, have been around so long and yet get so little use. I think the reason for that is that if you want a search tool it needs to be right at the fingertips (e.g. alt+f2 or alt+space) or else integrated with additional content features. Most search tools fail in both and so generally have limited appeal.

Tools like KRunner, Katapult, GNOME Do and others meet the "right at the fingertips" metric, while search integrated into apps meet the "additional content features" metric.

*shrug*

glimpse

Glimpse is an effective tool for finding instances of strings within files. It doesn't have a GUI but wasn't covered in your
command-line search tools article either.

http://glimpse.cs.arizona.edu/

For personal use, the software is free but a charitable donation is requested.

Recoll

Recoll is now in the Ubuntu repositories.
Quick download,
quick install.
It is indexing now, using about 50% processor.

cheers
Pete

+1 for Google Desktop 4 Linux

I use it because it works better to fully index all my work emails in Thunderbird, despite its huge search index, way of starting.

Re @mal: multiuser indexes

Recoll can be set up with multiple indexes, switchable at search time. Typically, a user would have a personal index and also access some shared ones.

Strigi Works Fine

It works as part of an overall Krunner-Nepomuk-Strigi (And soon to be Akonadi) mix. And even then, Strigi uses different backends which will effect your user experience. Of course, it is still in the works, but it has a promising future.

tagging works nicely, too

rather than focusing on indexing and searching file contents, i've been tagging the files instead:

http://pages.stern.nyu.edu/~marriaga/software/oyepa

works like a charm. It is just a pity he doesn't make .debs available.

Recoll Awesome

I've only tried pinot (briefly - too resource hungry), Google Desktop search, Beagle and Recoll. For my purposes, Recoll is the bomb. It's fast, I'm indexing now and it's not harming my ability to get anything done.
Beagle was the 2nd best of the lot. But in my experience it was quite a bit slower to find stuff. With recoll, i type in a search string, and a millisecond later, the results are in. It re-builds the index once a day - good enough for me.
On Windows locate32 is my prized search engine - super fast, super fast indexing. But it only searches file names - which is the majority of the searching I need. But you can get similar instant search results with recoll by just searching for file names. Plus you have the added advantage of being able to search content and many other options. So for me, now, Recoll is THE desktop linux search.

Google Desktop is completely

Google Desktop is completely unusable. It cannot be extended, the user has no way to define new file types besides the handful of the supported ones. And non-English speaking users will be severely disappointed by lack of stemming support.

tag

Can you tag with recoll?
Oyepa seemed it was what I need but scary for a newbie, how do I get it going, what is X?

Mentally inept

Very good info--I just put Ubuntu on 6 systems--i love it 10.4 but just learning----

No support in Google Desktop for Thunderbird 3

I haven't found a way to get Google Desktop to support Thunderbird 3, so that makes it a non-starter for me.

Post new comment

CAPTCHA
Fill in this captcha, or you shall be mocked mercilessly.
Username:   Password:
Create Account | About TuxRadar