What I'm about to say is just my take on the situation... I could be wrong. I don't have any references for it, and if someone can jump in and confirm or disprove any of it that is welcome.
I would assume that BIG differences would be much easier to recall. For instance, I can recall that the only MB Quart tweeters I have ever heard really hurt my ears. Speakers have such a big difference between them that it should be retainable, the main points about them at least (eg: bright, dull, boomy, etc). For instance I can recall that the system in my old car had great bass up front and a high sound stage... these aren't really complex things about the system, but are critical parts of an SQ system.
I also doubt that even the most golden earned judge out there can recall ALL the subtle details of a system he heard long ago or even an hour ago. For that matter, SQ competitions are so subjective it's pretty much pointless as it is, maybe the judge liked it today but he wouldn't tomorrow. That is an ongoing debate in competition circuits... SPL is you and the meter, SQ adds the human element that can easily be wrong.