Koto-ish
When the koto (or whatever it was) first comes in, at about 0:00 - 0:08, the velocities just register as stiff to me; did you adjust the magnitudes and quantization? It feels quite on-the-grid, and it's especially noticeable because (and when) you're playing chords. I usually write the bottom chord note coming first, and the top chord note coming last. If the person is plucking two notes at the same time on different strings, then I just slightly offset the timings in whichever direction since one finger might just be faster than the other. Consider checking this stiffness in other spots for the (presumed) koto---even a few ms of offset matters, and it really helps to adjust the velocity magnitudes on each chord note, too. I tend to have the first note quietest and last note loudest on a strummed chord, and for plucking two notes at the same time on different strings, they may be close in velocity more often than not.
Also, I'm not sure how the ADSR envelope for the koto is set up (or if you can look into that at all), but assuming the release is low and the decay and sustain are high, perhaps if you want to minimize the amount of times the (seemingly pre-recorded) vibrato is heard, I would guess that you can just shorten the length of the sequenced note and it should stop the note before it reaches the pre-recorded vibrato in the sample. With a sample like that, I would try to minimize the amount of pre-recorded vibrato that ends up playing so that it doesn't sound so fake. Ideally, if possible, you should adjust the ADSR envelope so that the vibrato hardly ever plays, and then record your own vibrato manually using pitch bend "event edits" or "automation clips". That way it's more human. By the way, with "pre-recorded vibrato", I mean "baked into the sample".
(was that "stitched" sequencing of more than one instrument instance?)
-----
Erhu
I agree that the erhu doesn't have much realism; its slow attack coupled with its same-y tone makes it stick out as lacking variation in tone (through round robins). I also find that ride sample that comes soon after, a bit distracting (the one panned rather far right).
-----
Big Picture / tl;dr (major: reverb, timing stiffness, velocity magnitude sameness, vibrato sameness)
Overall, this kind of reminds me of the Mystical Ninja Starring Goemon OST (great OST). This atmosphere works pretty well in terms of the instruments chosen. Still kinda rough around the edges on getting the instruments to sound realistic (various stiff areas in the timings, especially with chords, and the amount of times a seemingly pre-recorded vibrato plays). Another good idea is to keep the instrumentation sounding like it's all in the same room. I think the ideas are pretty good so far, but I also think, if you have the options available, refining the cohesion between reverb tones (whether the reverb is primarily low-mids, midrange, or treble reflections, where the low cut and high cut of the reverb "wet signal threshold" are, etc) would help. Unfortunately I think it's something you'd have to isolate in your DAW and listen for, and it's not something I can hear in the full context since it's pretty subtle.
Generally, low-mids reflection adds low end ambience, midrange reflections may sound a bit metallic if overdone, and treble reflections should sound "hissy" if overdone. The low and high cut basically jointly mark the frequency range that the reverb will affect. Everything below the low cut frequency won't be affected, etc.