I had to look for empty values in a mandatory field in SOLR today. Wait, what? Shouldn’t mandatory values in the index should be marked as required=”true” when you are defining the field?
Well yes, but some people forget to do it or maybe the spec was not fully completed at the time when they worked on the schema so they did not include it… just in case! (YAGNI definitively comes to mind)
Well, in any case I had to find which documents did not have the publication date (which sounds like a really really really mandatory field to me).
So how do you identify them?
Option A: Query *:* and start paginating taking down notes of which documents do not have the value… Ok this is totally brute force approach. But I wouldn’t be too impressed if I find someone doing it. The things I have seen…
Option B: Query *:* and in your fl include only id and publicationdate. Paginate or add enough rows. Very amateur but a bit better than before
Option C: Query *:*, include only the two fields in fl and sort asc! Much better as in your results you will have the ones with empty at the beginning.
Option Winner: or instead of *:* simply use the nice flexibility of a specific query and use q=-publicationdate:*
This is definitively the best approach and as I just demonstrated, there are many ways of finding a solution which go from biggest effort to most effective. I strive to do things always as efficiently as possible and so should you!