Building efficient keyword queries for eDiscovery searches in Exchange and SharePoint

by Tony Redmond
Nov 26, 2015

These days it's somewhat strange when software allows you free rein, which is why it seems funny to be able to build keyword queries for eDiscovery searches from scratch without any assistance from Exchange or SharePoint. Perhaps the developers believe that all administrators are perfectly fluent in the Keyword Query Language (isn't everyone?). Or it's their little way of warning the unknowledgeable away from the unintelligible.

The ability to run searches across a mixture of Exchange mailboxes and SharePoint sites to uncover the deep and dark secrets of those who would prefer their work to go unnoticed is what, in some degree, eDiscovery is all about. Microsoft has invested heavily in the area of compliance over the last decade, with the initial work showing up in Exchange 2010 and SharePoint 2010. There was much to like in the first implementation, especially in Exchange 2010, if only because so many legal discovery actions center on email. Of course, time moves on and the degree of sophistication in the eDiscovery space has improved, so joined-up thinking is the order of the day to ensure that all manner of digital content can be revealed of the electronic bailiffs come calling.

The first tender signs of cooperation appeared when Exchange 2013 and SharePoint 2013 shared a common search infrastructure (the Search Foundation) and a common query language (KQL, or the Keyword Query Language). This set things up for the SharePoint eDiscovery center where eDiscovery cases that use sources drawn from Exchange and SharePoint can be managed. It has now brought us to the Office 365 Compliance Center, which is really Microsoft's way of telling everyone just what a good job it has done to incorporate eDiscovery and other compliance features across Office 365.

Search queries lie at the heart of all this work, including the eponymous Search-Mailbox cmdlet, which is still used to locate and remove offending content from mailboxes. You can construct the best possible indexes from mailbox databases but they are not worth a fig unless effective and efficient queries can be built to interrogate the indexes.

Which is why it sometimes seem strange that Microsoft has gone to all the expense and engineering effort to incorporate eDiscovery functionality into Exchange and SharePoint and is satisfied that the GUI provided to build the queries is, shall we say, lame. Or, to be kinder, basic. The same problem exists for SharePoint Online and Exchange Online as in the on-premises versions.

The problem is this. When the time comes to construct a search query, you are left to your own devices. A blank canvas awaits the dedicated legal investigator who must construct a query to interrogate what might be tens of millions of messages or documents. Given factors such as increased mailbox quotas, the lower cost of storage, and the propensity of people to avoid the mundane task of cleaning up digital content, the number of items indexed by Exchange and SharePoint is not reducing. In turn, this means that the need for efficient and focused search queries is more important than ever before. Ten years ago a relatively efficient search might uncover one hundred items and it was easy enough to go through that amount of data to make a decision on what was really important and what could be discarded. Today, a query with the same relative degree of efficiency might return several thousand items and the thought of going through so much junk to focus in on what's really needed is too tiresome and expensive to bear.

I've chosen Exchange to illustrate the point here but exactly the same situation pertains in SharePoint and it happens for both on-premises and cloud platforms. The perfect emptiness of the space reserved for search query keywords is a challenge for those charged with building a query. Sure, it's easy to follow the advice in the pop-up panel and come up with queries like "(Cat OR HAT) NEAR Seuss", but these queries are unlikely to meet the needs of an investigation querying the darker secrets of stock market manipulation, fraud, patent infringement, or any of the other myriad reasons why eDiscovery happens. 

It would seem sensible for Microsoft to build something like a software wizard to guide the unwary through the basics of building good eDiscovery queries. After all, they already have some of the components in place, like the date range fields provided to limit search queries. It's not as if these kind of queries are unknown to Exchange as other places exist where software guides administrators to in selecting the right data, like when creating the recipient query for a dynamic distribution group.

On the other hand, you can argue that dumbing-down search queries is exactly the wrong thing to do as it would remove the power of KQL queries from the hands of investigators. It's true that any interface presented by software is limited by the imagination and foresight of the designers, but it's not beyond the wit of man (or woman) to allow the unfettered (current) interface to exist alongside a software-guided query builder. Or so you'd think.

But perhaps Microsoft presents the blank canvas for a search query with intention. It's a form of warning that those who are not fluent in KQL should leave queries to the professionals. I don't think this is the case and view the situation as a potential opportunity for improvement.

And in the meantime, if you need help in understanding how to build queries for Exchange or SharePoint searches, you should head over to Quentin Christensen's blog and read what he has to say about queries. As the lead program manager for eDiscovery at Microsoft, he might know a thing or two about the topic!

Follow Tony @12Knocksinna

Please log in or register to post comments.