How to add or remove a byte order mark

A byte order mark (BOM) consists of the character code […] at the beginning of a data stream, where it can be used as a signature defining the byte order and encoding form, primarily of unmarked plaintext files. Under some higher level protocols, use of a BOM may be mandatory (or prohibited) in the Unicode data stream defined in that protocol.

At the moment, I’m testing a few different scenarios where a client application uploads various XML files to a Java web service endpoint. Initially, this service didn’t support files with or without a BOM (I can’t remember which off the top of my head). At any rate, there’s a need to produce sample files of both types; UTF-8 and UTF-8 with BOM.

For this example, I tested with both Notepad++ (6.1) or Sublime Text 2.

  1. Open file in your editor of choice.
  2. Select the required encoding option:

    • In Notepad++, pick Encoding from the main menu and pick either Encode in UTF-8 (implied with BOM) or Encode in UTF-8 without BOM.
    • In Sublime Test 2, pick File > Save with Encoding and pick either UTF-8 (implied without BOM) or UTF-8 with BOM.
  3. You will then need to double-check that you’ve saved the file to apply encoding changes.
  4. To ensure that the BOM is now present or absent in your file, I recommend opening it up in a hex editor. There are many out there but I use the HexView 4 plugin:

     //With BOM:
     00000000:  efbb bf42 4f4d 2074 6573 7465    :...BOM test
     //Without BOM:
     00000000:  424f 4d20 7465 7374              :BOM test

You can clearly see that there is a preceeding byte sequence of 0xEF,0xBB and 0xBF 5, as would be expected for a UTF-8 file with a BOM present.


