Well assuming you don't have access to an RTA here's how I go... First thing I do i set my crossovers at fairly arbitrary values that should be ok for the drivers in question. I usually run my mids as high as I can to take stress of the tweets as a tweet playing low is usually worse than a mid playing high. This depends on speaker location, if they mids are big or wayy of axis that doesnt' always work. 24db slopes so everything is in phase at the xo'er point.
Secondly I play with phase. For mids I usually run bandwidth limited pink noise within the band of the driver and check polarity to find which way gives me the best acoustic polarity. Midbass is where you usually notice phase issues the most. Tweets arent' as important, but I'll usually try it both ways and see if I hear a difference. Check all combinations starting with one midbass out, one in. Then I try one tweet and then both tweets out.
For time alignment I break out the test tones. Bandwidth limited pinknoise 100-400hz. I play just the mids and delay the closer mid until it centers up. After that I recheck phase, once in a great while it'll mess with it if I had to use ALOT of T/A to get it centered up, I'm usually use good speaker locations though so thats rare.
Once I think my phase/time alignment is setup properly I pull out the headphones. I use an cd with burned tones and then plug my ipod in to play the same tones to my headphones. First tones I start with are near the Xover point. THis allows me to get the level matchign correct a bit easier. I start with a tone within the mids area and compare it to one that is just over into the tweeters domain. I get it so the first tone on my headphones is just as loud as whats coming out of the stereo. Then when I switch to a higher tone i level match that one. Honestly, I usually do it by ear and get very close, if I didnt' have that ability I'd just use headphones as reference on either side of the xover point lol.
If I have 32 band EQ that is also how I EQ. Get a 1k tone to be level with my headphones and then play the specified tones that match to each band on my EQ. I try to get the relative jump in intensity from each tone to another correct using the heaphones jump up or down as my reference. I rarely boost frequencies. Most nulls in a car are because of cancellation and you can't eq that out anyway.
Once I get it dialed in with each side I go back and get my left and right to match as closely as possible also using the same 32 tones. Getting the left and right as close as you can really brings a good stereo image into focus. On alot of tracks it hard to hear if your off tonally without a reference right next to you. Hearing your stage pull left or right is alot more obvious. I bring down the stronger side and boost the weaker side if needed. Doing it that way preserves your tonal balance overall while centering up your stage.
lastly I bring in the sub. I try to level match right at the crossover point area. I bring the sub in as high as I can and not draw the stage back at low volumes. At high volumes everyone will have localization problems if your sub is behind you unless your running your sub very low. However, if your running your mids super low, you cant' ever get to high volume without the mids giving out lol. I'm a proponent of higher crossover points... Anway once I get it level matched near the crossover point to blend it I use a variable phase adjuster on the amp if it has one. If not i use the 180 switch or wire it backwards and see what sounds closer. If I have time alignment once I see what 180 switchis closer I delay EVERY speaker in the fronst stage until it blends. You really only need to do the mids to begin with and then once the mids and subs blend, delay the tweets by that additional amount. Then I cut 40hz by the requisite 3-6db's or so lol, most cars have a bump right there no matter what your setup looks like.
After I do that I pull out some music I know and test by ear.
edit: focal disk is very good, that's where I get my tones from.
edit 2: RTA's are bunk. Without a gated measurement it's far too hard to correlate what your actually hearing with what's on the screen. After I've actually tuned the car by ear if I have an ungated RTA measuring device I'll use it to look for any major peaks or dips. Then I go back and reinspect those areas, if I don't hear the issue, I don't "fix" it.