WebSockets - Why differentiate between text and binary frames?

As I mentioned here, the WebSockets protocol is, at this point, a bit of a mess due to the evolution of the protocol and the fact that it’s being pulled in various directions by various interested parties. I’m just ranting about some of the things that I find annoying…

Back when binary frames were mentioned in the WebSocket protocol specification as a slightly hand wavy “something for the future” and only text frames were actually possible to send and receive using the clients at the time then there MAY (in the strictest RFC meaning of the word) have been a need to differentiate between text and binary frames. Especially since text frames were 0xFF terminated rather than length prefixed. Now, however, it seems pointless to have a protocol level differentiation between text and binary frame types.

This would be, perhaps, less pointless if a general purpose protocol handler could reliably deliver complete messages to an application, but as we’ve seen before, that’s not possible. Although backtracking to the last complete character in the UTF-8 stream is relatively straight forward (see here) there seems to be little value in converting a partial message to a wide string form only to have the have the application layer have to store and concatenate to obtain a complete message…

Robert Gezelter probably says it far better than I can here on his blog; the differentiation between binary and text should occur at the application level.