Discussion:
[jruby-dev] [jira] (JRUBY-7195) REXML gives 1 character strings BINARY text encoding
(too old to reply)
Ben Summers (JIRA)
2013-08-21 13:02:52 UTC
Permalink
<style>
/* Changing the layout to use less space for mobiles */
@media screen and (max-device-width: 480px), screen and (-webkit-min-device-pixel-ratio: 2) {
#email-body { min-width: 30em !important; }
#email-page { padding: 8px !important; }
#email-banner { padding: 8px 8px 0 8px !important; }
#email-avatar { margin: 1px 8px 8px 0 !important; padding: 0 !important; }
#email-fields { padding: 0 8px 8px 8px !important; }
#email-gutter { width: 0 !important; }
}
</style>
<div id="email-body">
<table id="email-wrap" align="center" border="0" cellpadding="0" cellspacing="0" style="background-color:#f0f0f0;color:#000000;width:100%;">
<tr valign="top">
<td id="email-page" style="padding:16px !important;">
<table align="center" border="0" cellpadding="0" cellspacing="0" style="background-color:#ffffff;border:1px solid #bbbbbb;color:#000000;width:100%;">
<tr valign="top">
<td bgcolor="#ffffff" style="background-color:#ffffff;color:#00AA00;font-family:Arial,FreeSans,Helvetica,sans-serif;font-size:12px;line-height:1;"><img src="Loading Image..." alt="" style="vertical-align:top;" /></td>
</tr><tr valign="top">
<td id="email-banner" style="padding:32px 32px 0 32px;">




<table align="left" border="0" cellpadding="0" cellspacing="0" width="100%" style="width:100%;">
<tr valign="top">
<td style="color:#505050;font-family:Arial,FreeSans,Helvetica,sans-serif;padding:0;">
<img id="email-avatar" src="https://jira.codehaus.org/secure/useravatar?avatarId=10232" alt="" height="48" width="48" border="0" align="left" style="padding:0;margin: 0 16px 16px 0;" />
<div id="email-action" style="padding: 0 0 8px 0;font-size:12px;line-height:18px;">
<a class="user-hover" rel="bensummers" id="email_bensummers" href="https://jira.codehaus.org/secure/ViewProfile.jspa?name=bensummers" style="color:#005500;">Ben Summers</a>
created <img src="Loading Image..." height="16" width="16" border="0" align="absmiddle" alt="Bug"> <a style='color:#005500;text-decoration:none;' href='https://jira.codehaus.org/browse/JRUBY-7195'>JRUBY-7195</a>
</div>
<div id="email-summary" style="font-size:16px;line-height:20px;padding:2px 0 16px 0;">
<a style='color:#005500;text-decoration:none;' href='https://jira.codehaus.org/browse/JRUBY-7195'><strong>REXML gives 1 character strings BINARY text encoding</strong></a>
</div>
</td>
</tr>
</table>
</td>
</tr>
<tr valign="top">
<td id="email-fields" style="padding:0 32px 32px 32px;">
<table border="0" cellpadding="0" cellspacing="0" style="padding:0;text-align:left;width:100%;" width="100%">
<tr valign="top">
<td id="email-gutter" style="width:64px;white-space:nowrap;"></td>
<td>
<table border="0" cellpadding="0" cellspacing="0" width="100%">
<tr valign="top">
<td style="color:#000000;font-family:Arial,FreeSans,Helvetica,sans-serif;font-size:12px;padding:0 10px 10px 0;white-space:nowrap;">
<strong style="font-weight:normal;color:#505050;">Issue Type:</strong>
</td>
<td style="color:#000000;font-family:Arial,FreeSans,Helvetica,sans-serif;font-size:12px;padding:0 0 10px 0;width:100%;">
<img src="https://jira.codehaus.org/images/icons/issuetypes/bug.png" height="16" width="16" border="0" align="absmiddle" alt="Bug"> Bug
</td>
</tr> <tr valign="top">
<td style="color:#000000;font-family:Arial,FreeSans,Helvetica,sans-serif;font-size:12px;padding:0 10px 10px 0;white-space:nowrap;">
<strong style="font-weight:normal;color:#505050;">Affects Versions:</strong>
</td>
<td style="color:#000000;font-family:Arial,FreeSans,Helvetica,sans-serif;font-size:12px;padding:0 0 10px 0;width:100%;">
JRuby 1.7.4 </td>
</tr>
<tr valign="top">
<td style="color:#000000;font-family:Arial,FreeSans,Helvetica,sans-serif;font-size:12px;padding:0 10px 10px 0;white-space:nowrap;">
<strong style="font-weight:normal;color:#505050;">Assignee:</strong>
</td>
<td style="color:#000000;font-family:Arial,FreeSans,Helvetica,sans-serif;font-size:12px;padding:0 0 10px 0;width:100%;">
Unassigned </td>
</tr> <tr valign="top">
<td style="color:#000000;font-family:Arial,FreeSans,Helvetica,sans-serif;font-size:12px;padding:0 10px 10px 0;white-space:nowrap;">
<strong style="font-weight:normal;color:#505050;">Components:</strong>
</td>
<td style="color:#000000;font-family:Arial,FreeSans,Helvetica,sans-serif;font-size:12px;padding:0 0 10px 0;width:100%;">
Encoding </td>
</tr>
<tr valign="top">
<td style="color:#000000;font-family:Arial,FreeSans,Helvetica,sans-serif;font-size:12px;padding:0 10px 10px 0;white-space:nowrap;">
<strong style="font-weight:normal;color:#505050;">Created:</strong>
</td>
<td style="color:#000000;font-family:Arial,FreeSans,Helvetica,sans-serif;font-size:12px;padding:0 0 10px 0;width:100%;">
21/Aug/13 8:01 AM
</td>
</tr> <tr valign="top">
<td style="color:#000000;font-family:Arial,FreeSans,Helvetica,sans-serif;font-size:12px;padding:0 10px 10px 0;white-space:nowrap;">
<strong style="font-weight:normal;color:#505050;">Description:</strong>
</td>
<td style="color:#000000;font-family:Arial,FreeSans,Helvetica,sans-serif;font-size:12px;padding:0 0 10px 0;width:100%;">
<p style='margin-top:0;margin-bottom:10px;'>The script below on JRuby outputs:</p>

<p style='margin-top:0;margin-bottom:10px;'><span class="error">&#91;&quot;a&quot;, &quot;ASCII-8BIT&quot;&#93;</span><br/>
<span class="error">&#91;&quot;aa&quot;, &quot;US-ASCII&quot;&#93;</span><br/>
<span class="error">&#91;&quot;☃&quot;, &quot;UTF-8&quot;&#93;</span></p>

<p style='margin-top:0;margin-bottom:10px;'>and on MRI 1.9.3p392:</p>

<p style='margin-top:0;margin-bottom:10px;'><span class="error">&#91;&quot;a&quot;, &quot;UTF-8&quot;&#93;</span><br/>
<span class="error">&#91;&quot;aa&quot;, &quot;UTF-8&quot;&#93;</span><br/>
<span class="error">&#91;&quot;☃&quot;, &quot;UTF-8&quot;&#93;</span></p>

<p style='margin-top:0;margin-bottom:10px;'>Which would be OK if it were just US-ASCII and UTF-8, except that one character strings are encoded as ASCII-8BIT! This is totally unexpected, and breaks code which quite reasonably would expect UTF-8 or 7 bit clean text.</p>

<p style='margin-top:0;margin-bottom:10px;'>--------------------------------</p>

<ol>
<li>coding: utf-8</li>
</ol>


<p style='margin-top:0;margin-bottom:10px;'>require 'rexml/document'</p>

<p style='margin-top:0;margin-bottom:10px;'><span class="error">&#91;&quot;a&quot;, &quot;aa&quot;, &quot;☃&quot;&#93;</span>.each do |string|<br/>
doc = REXML::Document.new(%Q!&lt;?xml version="1.0" encoding="UTF-8"?&gt;&lt;string&gt;#</p>
{string} <p style='margin-top:0;margin-bottom:10px;'>&lt;/string&gt;!)<br/>
decoded_string = doc.elements<span class="error">&#91;&quot;string&quot;&#93;</span>.text<br/>
p <span class="error">&#91;decoded_string, decoded_string.encoding.name&#93;</span><br/>
end</p>



</td>
</tr>
<tr valign="top">
<td style="color:#000000;font-family:Arial,FreeSans,Helvetica,sans-serif;font-size:12px;padding:0 10px 10px 0;white-space:nowrap;">
<strong style="font-weight:normal;color:#505050;">Project:</strong>
</td>
<td style="color:#000000;font-family:Arial,FreeSans,Helvetica,sans-serif;font-size:12px;padding:0 0 10px 0;width:100%;">
<a style="color:#005500;" href="https://jira.codehaus.org/browse/JRUBY">JRuby</a>
</td>
</tr> <tr valign="top">
<td style="color:#000000;font-family:Arial,FreeSans,Helvetica,sans-serif;font-size:12px;padding:0 10px 10px 0;white-space:nowrap;">
<strong style="font-weight:normal;color:#505050;">Priority:</strong>
</td>
<td style="color:#000000;font-family:Arial,FreeSans,Helvetica,sans-serif;font-size:12px;padding:0 0 10px 0;width:100%;">
<img src="Loading Image..." height="16" width="16" border="0" align="absmiddle" alt="Major"> Major
</td>
</tr>
<tr valign="top">
<td style="color:#000000;font-family:Arial,FreeSans,Helvetica,sans-serif;font-size:12px;padding:0 10px 10px 0;white-space:nowrap;">
<strong style="font-weight:normal;color:#505050;">Reporter:</strong>
</td>
<td style="color:#000000;font-family:Arial,FreeSans,Helvetica,sans-serif;font-size:12px;padding:0 0 10px 0;width:100%;">
<a class="user-hover" rel="bensummers" id="email_bensummers" href="https://jira.codehaus.org/secure/ViewProfile.jspa?name=bensummers" style="color:#005500;">Ben Summers</a>
</td>
</tr>


</table>
</td>
</tr>
</table>
</td>
</tr>













</table>
</td><!-- End #email-page -->
</tr>
<tr valign="top">
<td style="color:#505050;font-family:Arial,FreeSans,Helvetica,sans-serif;font-size:10px;line-height:14px;padding: 0 16px 16px 16px;text-align:center;">
This message is automatically generated by JIRA.<br />
If you think it was sent incorrectly, please contact your JIRA administrators<br />
For more information on JIRA, see: <a style='color:#005500;' href='http://www.atlassian.com/software/jira'>http://www.atlassian.com/software/jira</a>
</td>
</tr>
</table><!-- End #email-wrap -->
</div><!-- End #email-body -->

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Loading...