NFS doesn't do anything fancy though, the difference in time will be insignificant as most is spent on the wire anyways. I think most of the emulations will be rather simple abstractions of common models that you'd otherwise code the same way yourself, but are rather slim in their actual performance overhead.
Most, if not all of these, are limits we currently impose explicitly while testing this new feature. We intend to change if not lift most once we have a better understanding of how this behaves in reality and what we can and can't allow.
This one will stick with us for a while though as it is hard enough to agree on one concurrent change in a distributed system. As it stands right now, a change is bound to the version of the entry and will be rejected if those don't match - then the client should fetch and try again. This will probably stay this way for a while.
All this said, the MD has the disadvantage of - in its current implementation - centralising a lot of information on the same set of nodes (as the address is with the MD not its keys) and one reason we have those limits is that we are concerned about what that might mean for the overall thread and vulnerability of the network - especially if those keys are predictable, like in the case of the comments-feature.
As we haven't actually rolled out the feature in the network itself, I assume you are using mocked-routing? This only runs on your local system though, depending on the boundaries of your system, while in reality the bottle neck will most likely be the wire (meaning latency) between the nodes.
As usual the best for high-throughput is caching. Especially if you have single-instance behaviour (meaning only one instance might write to the same MD at the same time), you could cache the result, return and only occasionally write (like once a second for e.g.) like redis does. Another thing would be that you shard the data, and read/write from different MDs then have different client instances, each one having their own safe-lib and connection to the network. This way you'd balance the load across multiple nodes in the network (as other parts are answering for each MD) and on your system. Though whether the later is really more efficient I am not sure, considering the new version uses mostly async already and the wire will still be the bottleneck, not CPU time this particular process has...
What exactly are you intending to build? We'd love to know about more use cases, so we can include them into our models an plans.