Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


issue with char coded with cp1252 or windows-1252




'value': '<html>


Code Block
titleissue ""
### dpage_content : {'title': 'IT', 'type': 'page', 'body': {'storage': {'value': '<html>\n\t\n<head>\n\t<title>2- Add server on XenCenter</title>\n\t<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\n    <meta name="generator" content="H\


using notpad + HEX-Editor

Image Modified

python + bs4 modeule  to remove it 


Code Block
titleremove
    #clean up the begining of the file wih special char
	with open(cleaned_html, 'w') as cleaned_file2:
        nonBreakSpace = u'\xef\xbb\xbf\x3c'
        cleaned_file2.write(str(soup).replace(nonBreakSpace,r'<'))

    cleaned_file2.close()




0xe2809d  or â€  ( Right dual quote: " )

ChassisInfoFetcher Using Vagrant



0xe2809c or “   ( Left dual quote )LEFT DOUBLE QUOTATION MARK