Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity, line: 42 in /var/www/app/src/Parser/HtmlSanitizer.php:18
ID: php/domdocument-load-html-entity-warning
Version Compatibility
| Version | Status | Introduced | Deprecated | Notes |
|---|---|---|---|---|
| php:8.1.0 | active | — | — | — |
| php:8.2.0 | active | — | — | — |
| php:8.3.0 | active | — | — | — |
Root Cause
The HTML string passed to DOMDocument::loadHTML() contains a malformed HTML entity (e.g.,   instead of ), which causes the HTML parser to emit a warning and may result in incomplete parsing.
generic中文
传递给 DOMDocument::loadHTML() 的 HTML 字符串包含格式错误的 HTML 实体(例如   而不是 ),导致 HTML 解析器发出警告并可能导致解析不完整。
Official Documentation
https://www.php.net/manual/en/domdocument.loadhtml.phpWorkarounds
-
85% success Pre-process the HTML to fix common malformed entities using a regex: $html = preg_replace('/&(?![a-zA-Z0-9#]+;)/', '&', $html); $dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
Pre-process the HTML to fix common malformed entities using a regex: $html = preg_replace('/&(?![a-zA-Z0-9#]+;)/', '&', $html); $dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD); -
80% success Use the LIBXML_NOERROR flag to suppress the warning but still parse the document: $dom->loadHTML($html, LIBXML_NOERROR); however, be aware that this may hide other parsing issues.
Use the LIBXML_NOERROR flag to suppress the warning but still parse the document: $dom->loadHTML($html, LIBXML_NOERROR); however, be aware that this may hide other parsing issues.
-
90% success Use a more forgiving HTML parser like 'html5-php' (masterminds/html5-php) which handles malformed entities gracefully: $html5 = new Masterminds\HTML5(); $dom = $html5->loadHTML($html);
Use a more forgiving HTML parser like 'html5-php' (masterminds/html5-php) which handles malformed entities gracefully: $html5 = new Masterminds\HTML5(); $dom = $html5->loadHTML($html);
中文步骤
使用正则表达式预处理 HTML 以修复常见的格式错误的实体:$html = preg_replace('/&(?![a-zA-Z0-9#]+;)/', '&', $html); $dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);使用 LIBXML_NOERROR 标志来抑制警告但仍然解析文档:$dom->loadHTML($html, LIBXML_NOERROR); 但请注意,这可能会隐藏其他解析问题。
使用更宽容的 HTML 解析器,如 'html5-php' (masterminds/html5-php),它可以优雅地处理格式错误的实体:$html5 = new Masterminds\HTML5(); $dom = $html5->loadHTML($html);
Dead Ends
Common approaches that don't work:
-
90% fail
Suppressing the warning with @ (e.g., @$dom->loadHTML($html)) hides the error but does not fix the malformed entity, which can lead to corrupted DOM trees and unexpected behavior when traversing or querying the document.
-
80% fail
Using htmlspecialchars() on the entire HTML input encodes all ampersands, including those that are part of valid entities (e.g., & becomes &), breaking the HTML structure further.
-
90% fail
Switching to loadXML() instead of loadHTML() causes a fatal error because HTML5 documents with unclosed tags or non-well-formed structures are not valid XML.