# Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity, line: 42 in /var/www/app/src/Parser/HtmlSanitizer.php:18

- **ID:** `php/domdocument-load-html-entity-warning`
- **Domain:** php
- **Category:** encoding_error
- **Verification:** ai_generated
- **Fix Rate:** 78%

## Root Cause

The HTML string passed to DOMDocument::loadHTML() contains a malformed HTML entity (e.g., &nbsp instead of &nbsp;), which causes the HTML parser to emit a warning and may result in incomplete parsing.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| php:8.1.0 | active | — | — |
| php:8.2.0 | active | — | — |
| php:8.3.0 | active | — | — |

## Workarounds

1. **Pre-process the HTML to fix common malformed entities using a regex: $html = preg_replace('/&(?![a-zA-Z0-9#]+;)/', '&amp;', $html); $dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);** (85% success)
   ```
   Pre-process the HTML to fix common malformed entities using a regex: $html = preg_replace('/&(?![a-zA-Z0-9#]+;)/', '&amp;', $html); $dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
   ```
2. **Use the LIBXML_NOERROR flag to suppress the warning but still parse the document: $dom->loadHTML($html, LIBXML_NOERROR); however, be aware that this may hide other parsing issues.** (80% success)
   ```
   Use the LIBXML_NOERROR flag to suppress the warning but still parse the document: $dom->loadHTML($html, LIBXML_NOERROR); however, be aware that this may hide other parsing issues.
   ```
3. **Use a more forgiving HTML parser like 'html5-php' (masterminds/html5-php) which handles malformed entities gracefully: $html5 = new Masterminds\HTML5(); $dom = $html5->loadHTML($html);** (90% success)
   ```
   Use a more forgiving HTML parser like 'html5-php' (masterminds/html5-php) which handles malformed entities gracefully: $html5 = new Masterminds\HTML5(); $dom = $html5->loadHTML($html);
   ```

## Dead Ends

- **** — Suppressing the warning with @ (e.g., @$dom->loadHTML($html)) hides the error but does not fix the malformed entity, which can lead to corrupted DOM trees and unexpected behavior when traversing or querying the document. (90% fail)
- **** — Using htmlspecialchars() on the entire HTML input encodes all ampersands, including those that are part of valid entities (e.g., &amp; becomes &amp;amp;), breaking the HTML structure further. (80% fail)
- **** — Switching to loadXML() instead of loadHTML() causes a fatal error because HTML5 documents with unclosed tags or non-well-formed structures are not valid XML. (90% fail)
