file-format

Text file with 0D 0D 0A line breaks

♀尐吖头ヾ 提交于 2019-11-29 02:53:15
A customer is sending me a .csv file where the line breaks are made up of the sequence 0xD 0xD 0xA . As far as I know line breaks are either 0xA from Mac or Unix or 0xD 0xA from Windows. Is the 0xD 0xD 0xA any known encoding? Is there any known sequence of savings that corrupts a file's line endings that causes this (I think the customer uses a Mac)? The file doesn't start with any encoding markers, it starts with the text contents directly. The text is displayed correctly if opened with code page 1252. The CRCRLF is known as result of a Windows XP notepad word wrap bug . For future reference,

Need to export fields containing linebreaks as a CSV from SQL Server

耗尽温柔 提交于 2019-11-28 23:49:27
问题 I'm running a query from SQL Server Management Studio 2005 that has an HTML file stored as a a string, e.g.: SELECT html FROM table This query displays in the "Results" window nicely, i.e., each row contains a string of the whole HTML file, with one record per row. However, when I use the "Results to file" option, it exports it as an unusable CSV with line breaks in the CSV occurring wherever line breaks occurred in the field (i.e., in the HTML), rather than one per row as needed. I tried

What are important points when designing a (binary) file format? [closed]

匆匆过客 提交于 2019-11-28 15:20:20
问题 When designing a file format for recording binary data, what attributes would you think the format should have? So far, I've come up with the following important points: have some "magic bytes" at the beginning, to be able to recognize the files (in my specific case, this should also help to distinguish the files from "legacy" files) have a file version number at the beginning, so that the file format can be changed later without breaking compatibility specify the endianness and size of all

Convert a shapefile (.shp) to xml/json

 ̄綄美尐妖づ 提交于 2019-11-28 13:27:28
问题 I'm working with a shapefile (.shp, .dbf, etc) and would like to convert it to xml. I'm on a mac, and I'm having trouble finding an application that will help me with the conversion. Does anyone know of a method for converting this file format into an xml file? 回答1: What dassouki said. Get GDAL from http://www.kyngchaos.com/software:frameworks. Use it to convert a shapefile to GeoJSON like this: $ ogr2ogr -f "GeoJSON" output.json input.shp eg $ ogr2ogr -f "GeoJSON" /tmp/world.json world

When are files “splittable”?

不羁岁月 提交于 2019-11-28 11:29:25
When I'm using spark, I sometimes run into one huge file in a HIVE table, and I sometimes am trying to process many smaller files in a HIVE table. I understand that when tuning spark jobs, how it works depends on whether or not the files are splittable . In this page from cloudera, it says that we should be aware of whether or not the files are splittable: ...For example, if your data arrives in a few large unsplittable files... How do I know if my file is splittable? How do I know the number of partitions to use if the file is splittable ? Is it better to err on the side of more partitions if

Is the 2nd and 3rd byte of a JPEG image always the APP0 or APP1 marker?

有些话、适合烂在心里 提交于 2019-11-28 11:26:00
I have a few different JPEG images I've been testing with. As far as I've seen the 0th and first bytes are always 0xFF and 0xD8 . The second and third are usually either 0xFF and 0xE0 ( APP0 ) indicating either a JFIF segment or JFIF extension segment or 0xFF and 0xE1 ( APP1 ) indicating an EXIF segment. My question: is this always the case? Are the 2nd and 3rd bytes always APP0 or APP1? No. There are e.g. several cameras that create JPEGs without these markers, or with other APP markers. The only thing you can rely on is the SOI sequence, FF D8 , not even EOI is produced by all cameras. Also

What is the BMP format for Gray scale Images?

跟風遠走 提交于 2019-11-28 10:57:48
What is the BMP format for Gray scale Images (especially for 16 bit per pixel) ? The wikipedia just talks about colour images for bmp. Update: Just for an update and information for future visitors, I am going for PGM as this is uncompressed and can support 16 bit gray-scale. Another option was to use PNG , but it compresses the data (which is not what I want) as discussed here . Also note that the image may appear distorted, since most of the monitors support 256 colors and not 4096 for 16 bit. So the Image will be saturated. It was though surprising to know that BMP is almost helpless in

What is the format of a patch file?

ぃ、小莉子 提交于 2019-11-28 06:40:42
What does the following mean ? diff -rBNu src.orig/java/org/apache/nutch/analysis/NutchAnalysisConstants.java src/java/org/apache/nutch/analysis/NutchAnalysisConstants.java --- src.orig/java/org/apache/nutch/analysis/NutchAnalysisConstants.java 2009-03-10 11:34:01.000000000 -0700 +++ src/java/org/apache/nutch/analysis/NutchAnalysisConstants.java 2009-03-10 14:11:55.000000000 -0700 @@ -4,9 +4,12 @@ + int CJK = 21; + int DIGIT = 22; int DEFAULT = 0; String[] tokenImage = { "<EOF>", + "\"OR\"", "<WORD>", "<ACRONYM>", "<SIGRAM>", @@ -39,6 +42,8 @@ "\"\\\"\"", "\":\"", "\"/\"", + "\"(\"", + "\")\""

“Best” Input File Formats for C++? [closed]

南楼画角 提交于 2019-11-28 05:27:31
I am starting work on a new piece of software that will end up needing some robust and expandable file IO. There are a lot of formats out there. XML, JSON, INI, etc. However, there are always plusses and minuses so I thought I would ask for some community input. Here are some rough requirements: The format is a "standard"...I don't want to reinvent the wheel if I don't have to. It doesn't have to be a formal IEEE standard, but something you could Google and get some information on as a new user, may have some support tools (editors) beyond vi. (Though the software users will generally be

Reading PSV (pipe-separated) file or string

北慕城南 提交于 2019-11-28 05:19:47
问题 I have just received a data file, whose extension is "*.psv". After doing a bit of research, I don't know how to open it R. 回答1: Thanks to hrbrmstr, I found the answer. We could use read.csv to read *.psv file. read.csv("myfile.psv", sep = "|", header = FALSE, stringsAsFactors = FALSE) There might be many different representations of psv file, but when it comes to data mining, I think it might be more about "pipe separated" file. The data in the file is separated by "|" 来源: https:/