GitHunt
DA

davidB/xml-to-json-bench-rs

compare rust crates/approach to convert xml into json (serde_json::Value)

XML to JSON Benchmark for Rust

Comparison of Rust crates for converting XML (primarily JUnit reports) to JSON (serde_json::Value).

Purpose

This project evaluates different Rust XML parsing libraries to determine the best solution for integrating XML input support into cdviz-collector. The selected solution will be used for:

  • OpenDAL source XML processing
  • cdviz-collector send command XML input

Output Format Comparison

Comparison Table

Converter Attributes Text Content Root Preserved Empty Text Nodes Special Structure
quick-xml @attributes object #text ✅ Yes ❌ No Direct nesting
quick-xml-to-json @attr #t ✅ Yes ❌ No #c array wrapper
rsxml2json -attr #content ✅ Yes ❌ No Dash prefix
serde-xml-rs @attributes object #text ✅ Yes ❌ No Direct nesting
xml_to_json_rs @attr #text ❌ No ✅ Yes Flattened (root stripped)
xmltree @attributes object #text ✅ Yes ❌ No Direct nesting
roxmltree @attributes object #text ✅ Yes ❌ No Direct nesting

Sample Input XML

All converters are tested against the same XML inputs like:

<?xml version="1.0" encoding="UTF-8"?>
<testsuites>
    <testsuite name="SimpleTests" tests="3" failures="1" errors="0" skipped="1" time="0.123">
        <testcase name="test_addition" classname="math.BasicTests" time="0.001"/>
        <testcase name="test_subtraction" classname="math.BasicTests" time="0.002">
            <failure message="Expected 5 but got 6" type="AssertionError">
at math.BasicTests.test_subtraction (BasicTests.java:42)
            </failure>
        </testcase>
        <testcase name="test_division" classname="math.BasicTests" time="0.000">
            <skipped message="Not implemented yet"/>
        </testcase>
    </testsuite>
</testsuites>

or

<?xml version="1.0" encoding="UTF-8"?>
<sample>
    <node attr1="val1.1" attr2="val2.1"/>
    <node attr1="val1.2">
        <child foo="bar"/>
    </node>
    <node attr1="val1.2">
        <child foo="bar2"/>
        hello
        <child foo="bar3"/>
        <childtext>world</childtext>
        <childtext><![CDATA[toto]]></childtext>
    </node>
    <!-- comment at the end -->
</sample>

Example of output are viewable at tests/snapshots, and an extract in the description of each crates below.

quick-xml - Manual event-based conversion
  • Version: 0.38+
  • Approach: Fast, zero-copy XML parser with manual event-based conversion
  • Pros: High performance, low memory overhead, actively maintained
  • Cons: More complex API, requires careful handling of attributes vs elements

Notable Features:

  • Attributes stored in @attributes object
  • Text content as #text
  • Preserves whitespace and text content
  • May miss text in some elements (implementation-dependent)

Sample Output:

{
  "testsuites": {
    "testsuite": {
      "@attributes": {
        "errors": "0",
        "failures": "1",
        "name": "SimpleTests",
        "skipped": "1",
        "tests": "3",
        "time": "0.123"
      },
      "testcase": [
        {
          "@attributes": {
            "classname": "math.BasicTests",
            "name": "test_addition",
            "time": "0.001"
          }
        },
        {
          "@attributes": {
            "classname": "math.BasicTests",
            "name": "test_subtraction",
            "time": "0.002"
          },
          "failure": {
            "@attributes": {
              "message": "Expected 5 but got 6",
              "type": "AssertionError"
            }
          }
        },
        {
          "@attributes": {
            "classname": "math.BasicTests",
            "name": "test_division",
            "time": "0.000"
          },
          "skipped": {
            "@attributes": {
              "message": "Not implemented yet"
            }
          }
        }
      ]
    }
  }
}
quick-xml-to-json - Convenience library built on quick-xml
  • Version: 0.1+
  • Approach: Dedicated XML to JSON converter built on quick-xml
  • Pros: Purpose-built for XML→JSON conversion, simple API
  • Cons: Early version, less configurable

Notable Features:

  • Uses #c array for all child elements
  • Text content as #t (shorter)
  • Attributes with @ prefix directly on element
  • More verbose structure with explicit child arrays

Sample Output:

{
  "testsuites": {
    "#c": [
      {
        "testsuite": {
          "#c": [
            {
              "testcase": {
                "@classname": "math.BasicTests",
                "@name": "test_addition",
                "@time": "0.001"
              }
            },
            {
              "testcase": {
                "#c": [
                  {
                    "failure": {
                      "#t": "at math.BasicTests.test_subtraction (BasicTests.java:42)",
                      "@message": "Expected 5 but got 6",
                      "@type": "AssertionError"
                    }
                  }
                ],
                "@classname": "math.BasicTests",
                "@name": "test_subtraction",
                "@time": "0.002"
              }
            },
            {
              "testcase": {
                "#c": [
                  {
                    "skipped": {
                      "@message": "Not implemented yet"
                    }
                  }
                ],
                "@classname": "math.BasicTests",
                "@name": "test_division",
                "@time": "0.000"
              }
            }
          ],
          "@errors": "0",
          "@failures": "1",
          "@name": "SimpleTests",
          "@skipped": "1",
          "@tests": "3",
          "@time": "0.123"
        }
      }
    ]
  }
}
rsxml2json - Built on roxmltree with dash prefix
  • Version: 0.1+
  • Approach: Built on roxmltree with ConvertConfig for customization
  • Pros: Simple API, configurable conversion behavior
  • Cons: Early version, minimal documentation

Notable Features:

  • Unique: Uses - prefix for attributes instead of @
  • Text content as #content
  • Clean, straightforward structure
  • Good for scenarios where @ symbol conflicts with your schema

Sample Output:

{
  "testsuites": {
    "testsuite": {
      "-errors": "0",
      "-failures": "1",
      "-name": "SimpleTests",
      "-skipped": "1",
      "-tests": "3",
      "-time": "0.123",
      "testcase": [
        {
          "-classname": "math.BasicTests",
          "-name": "test_addition",
          "-time": "0.001"
        },
        {
          "-classname": "math.BasicTests",
          "-name": "test_subtraction",
          "-time": "0.002",
          "failure": {
            "#content": "at math.BasicTests.test_subtraction (BasicTests.java:42)",
            "-message": "Expected 5 but got 6",
            "-type": "AssertionError"
          }
        },
        {
          "-classname": "math.BasicTests",
          "-name": "test_division",
          "-time": "0.000",
          "skipped": {
            "-message": "Not implemented yet"
          }
        }
      ]
    }
  }
}
serde-xml-rs - Serde-based with xml-rs backend
  • Version: 0.8+
  • Approach: Serde-based deserialization (uses xml-rs backend)
  • Pros: Familiar serde patterns, simple API
  • Cons: Less maintained, potential performance overhead

Notable Features:

  • Familiar serde patterns
  • Attributes in @attributes object
  • Standard #text for text content
  • May have performance overhead from serde abstraction

Sample Output:

{
  "testsuites": {
    "testsuite": {
      "@attributes": {
        "errors": "0",
        "failures": "1",
        "name": "SimpleTests",
        "skipped": "1",
        "tests": "3",
        "time": "0.123"
      },
      "testcase": [
        {
          "@attributes": {
            "classname": "math.BasicTests",
            "name": "test_addition",
            "time": "0.001"
          }
        },
        {
          "@attributes": {
            "classname": "math.BasicTests",
            "name": "test_subtraction",
            "time": "0.002"
          },
          "failure": {
            "#text": "at math.BasicTests.test_subtraction (BasicTests.java:42)",
            "@attributes": {
              "message": "Expected 5 but got 6",
              "type": "AssertionError"
            }
          }
        },
        {
          "@attributes": {
            "classname": "math.BasicTests",
            "name": "test_division",
            "time": "0.000"
          },
          "skipped": {
            "@attributes": {
              "message": "Not implemented yet"
            }
          }
        }
      ]
    }
  }
}
xml_to_json_rs - Strips root element by default
  • Version: 0.1+
  • Approach: Built on roxmltree with configurable output
  • Pros: Simple API, configurable with with_root() and with_text_name()
  • Cons: Early version, less battle-tested

Notable Features:

  • Unique: Removes root element by default (can be enabled with with_root())
  • Includes empty #text nodes for whitespace
  • Attributes with @ prefix
  • Flatter structure (starts at testsuite instead of testsuites)

Sample Output:

{
  "#text": "",
  "testsuite": {
    "#text": "",
    "@errors": "0",
    "@failures": "1",
    "@name": "SimpleTests",
    "@skipped": "1",
    "@tests": "3",
    "@time": "0.123",
    "testcase": [
      {
        "@classname": "math.BasicTests",
        "@name": "test_addition",
        "@time": "0.001"
      },
      {
        "#text": "",
        "@classname": "math.BasicTests",
        "@name": "test_subtraction",
        "@time": "0.002",
        "failure": {
          "#text": "at math.BasicTests.test_subtraction (BasicTests.java:42)",
          "@message": "Expected 5 but got 6",
          "@type": "AssertionError"
        }
      },
      {
        "#text": "",
        "@classname": "math.BasicTests",
        "@name": "test_division",
        "@time": "0.000",
        "skipped": {
          "@message": "Not implemented yet"
        }
      }
    ]
  }
}
xmltree - DOM-like tree structure
  • Version: 0.12+
  • Approach: DOM-like tree structure
  • Pros: Simple, intuitive API, easy to navigate
  • Cons: Higher memory usage, slower for large files

Notable Features:

  • Simple, intuitive DOM-like API
  • Attributes in @attributes object
  • Preserves text content as #text
  • Higher memory usage due to full tree in memory

Sample Output:

{
  "testsuites": {
    "testsuite": {
      "@attributes": {
        "errors": "0",
        "failures": "1",
        "name": "SimpleTests",
        "skipped": "1",
        "tests": "3",
        "time": "0.123"
      },
      "testcase": [
        {
          "@attributes": {
            "classname": "math.BasicTests",
            "name": "test_addition",
            "time": "0.001"
          }
        },
        {
          "@attributes": {
            "classname": "math.BasicTests",
            "name": "test_subtraction",
            "time": "0.002"
          },
          "failure": {
            "#text": "at math.BasicTests.test_subtraction (BasicTests.java:42)",
            "@attributes": {
              "message": "Expected 5 but got 6",
              "type": "AssertionError"
            }
          }
        },
        {
          "@attributes": {
            "classname": "math.BasicTests",
            "name": "test_division",
            "time": "0.000"
          },
          "skipped": {
            "@attributes": {
              "message": "Not implemented yet"
            }
          }
        }
      ]
    }
  }
}
roxmltree - Fast read-only parser with manual conversion
  • Version: 0.21+
  • Approach: Read-only XML tree with validation
  • Pros: Fast parsing, low memory, validates XML
  • Cons: Read-only, requires manual conversion to JSON

Notable Features:

  • Very fast parsing (read-only, validated)
  • Manual conversion to JSON required
  • Attributes in @attributes object
  • Low memory footprint

Sample Output:

{
  "testsuites": {
    "testsuite": {
      "@attributes": {
        "errors": "0",
        "failures": "1",
        "name": "SimpleTests",
        "skipped": "1",
        "tests": "3",
        "time": "0.123"
      },
      "testcase": [
        {
          "@attributes": {
            "classname": "math.BasicTests",
            "name": "test_addition",
            "time": "0.001"
          }
        },
        {
          "@attributes": {
            "classname": "math.BasicTests",
            "name": "test_subtraction",
            "time": "0.002"
          },
          "failure": {
            "#text": "at math.BasicTests.test_subtraction (BasicTests.java:42)",
            "@attributes": {
              "message": "Expected 5 but got 6",
              "type": "AssertionError"
            }
          }
        },
        {
          "@attributes": {
            "classname": "math.BasicTests",
            "name": "test_division",
            "time": "0.000"
          },
          "skipped": {
            "@attributes": {
              "message": "Not implemented yet"
            }
          }
        }
      ]
    }
  }
}

Comparison Protocol

1. Functional Comparison

Test Cases

Each implementation must handle:

  • Basic JUnit XML: Simple test suite with pass/fail cases
  • Nested structures: Test suites with nested test cases
  • Attributes: Preserving XML attributes in JSON output
  • CDATA sections: Handling CDATA content
  • Empty elements: Self-closing tags and empty elements
  • Special characters: XML entities and Unicode

Output Format

  • JSON structure consistency
  • Attribute representation (e.g., @attribute vs nested object)
  • Text node handling
  • Array vs single object for repeating elements

Snapshot Testing: Uses cargo-insta to capture and compare JSON outputs. Each converter's output for each test fixture is stored as a snapshot in tests/snapshots/, allowing easy comparison of how different libraries structure the JSON.

2. Performance Benchmarks

Using Criterion 0.7:

  • Small files: ~10KB JUnit reports
  • Medium files: ~100KB JUnit reports

Metrics:

  • Total time (XML → JSON)

Running the Comparison

Setup

# Install dependencies
mise install

# Build project
cargo build

Tests

# Run functional tests
cargo test

# Run specific crate tests
cargo test quick_xml
cargo test quick_xml_to_json
cargo test rsxml2json
cargo test serde_xml
cargo test xml_to_json_rs
cargo test xmltree
cargo test roxmltree

# Review snapshot changes (after modifying converters)
cargo insta test
cargo insta review

# Accept all snapshots
cargo insta accept

# View snapshots
ls tests/snapshots/

Benchmarks

The benchmark suite uses Criterion 0.7 to measure XML to JSON conversion performance.

Structure: One benchmark group per XML file, testing all 7 converters:

  • simple - Basic JUnit XML (3 test cases)
  • nested - Complex nested structure (2 test suites, 6 test cases)
  • attributes - Heavy attribute usage with CDATA

Each converter is benchmarked within each group, measuring the time to convert XML string to serde_json::Value.

# Run all benchmarks (takes ~5 minutes)
# cargo bench -- --quiet
mise run bench

# Run benchmarks for specific fixture
cargo bench simple
cargo bench nested
cargo bench attributes

# Run benchmark for specific converter across all fixtures
cargo bench quick-xml
cargo bench xmltree

# View HTML report with graphs
open target/criterion/report/index.html

Output format:

simple/quick-xml        time:   [7.6106 µs 7.6661 µs 7.7289 µs]
simple/quick-xml-to-json
                        time:   [8.9144 µs 8.9647 µs 9.0181 µs]
simple/rsxml2json       time:   [20.751 µs 20.869 µs 20.987 µs]
simple/serde-xml-rs     time:   [29.547 µs 29.675 µs 29.817 µs]
simple/xml_to_json_rs   time:   [7.3689 µs 7.3922 µs 7.4168 µs]
simple/xmltree          time:   [28.526 µs 28.691 µs 28.839 µs]
simple/roxmltree        time:   [8.4336 µs 8.4788 µs 8.5222 µs]

nested/quick-xml        time:   [21.428 µs 21.509 µs 21.593 µs]
nested/quick-xml-to-json
                        time:   [21.718 µs 21.783 µs 21.848 µs]
nested/rsxml2json       time:   [51.850 µs 51.999 µs 52.177 µs]
nested/serde-xml-rs     time:   [81.787 µs 81.875 µs 81.983 µs]
nested/xml_to_json_rs   time:   [19.440 µs 19.476 µs 19.510 µs]
nested/xmltree          time:   [72.162 µs 72.403 µs 72.723 µs]
nested/roxmltree        time:   [20.481 µs 20.566 µs 20.666 µs]

attributes/quick-xml    time:   [11.704 µs 11.720 µs 11.737 µs]
attributes/quick-xml-to-json
                        time:   [12.149 µs 12.164 µs 12.182 µs]
attributes/rsxml2json   time:   [29.953 µs 30.038 µs 30.124 µs]
attributes/serde-xml-rs time:   [49.541 µs 49.794 µs 50.052 µs]
attributes/xml_to_json_rs
                        time:   [11.249 µs 11.282 µs 11.321 µs]
attributes/xmltree      time:   [44.960 µs 45.107 µs 45.252 µs]
attributes/roxmltree    time:   [11.308 µs 11.344 µs 11.388 µs]

Interpreting results:

  • Lower time = faster conversion
  • Check for outliers (marked in output)
  • Compare within groups for same XML input
  • HTML report provides visual comparison charts

Test Data

Sample XML files are located in tests/fixtures/:

  • sample.xml: A XML with various case (node with 0-4 childrens, text node, comments, attributes)
  • simple.xml: Basic test suite (JUnit legacy)
  • nested.xml: Complex nested structure (JUnit legacy)
  • attributes.xml: Heavy attribute usage (JUnit legacy)

TODO

  • evaluate error handling
  • evaluate ease of manipulation of output into vrl code (out of scope)

License

Apache 2.0

davidB/xml-to-json-bench-rs | GitHunt