I\'m using the Zemanta API, which accepts up to 8 KB of text per call. I\'m extracting the text to send to Zemanta from Web pages using JavaScript, so I\'m looking for a fun
No it's not safe to assume that 8KB of text is 8192 characters, since in some character encodings, each character takes up multiple bytes.
If you're reading the data from files, can't you just grab the filesize? Or read it in in chunks of 8KB?