问题
I am trying to read filebytes using AppleScript or JXA (I don't know which one is better yet). I already have tried this code:
set theFile to (choose file with prompt "Select a file to read:")
open for access theFile
set fileContents to (read theFile)
close access theFile
However that code will read the file as a string and store it in fileContents. I need this to be a byte array.
回答1:
I have experimented a little and devised a number of methods with which a file's contents might be read into a list or array of bytes. In each case, the filepath
should be a posix path to the file being read.
Any snippets using AppleScriptObjC will need appropriate headers inserted at the top of the script, and I have included them at the end, along with the extra block that will be used with JXA scripts.
1. read
the file and obtain the ASCII number
of each character
The file is read "as is", and each character of the string is converted into an ascii code value:
to readBytes from filepath as text
local filepath
script bytes
property list : characters of (read the filepath)
end script
repeat with char in (a reference to the list of bytes)
set char's contents to ASCII number char
end repeat
return the list of bytes
end readBytes
Here's a similar implementation using AppleScriptObjC:
to readBytes from filepath as text
local filepath
set bytes to NSMutableArray's new()
set hexdump to (NSString's stringWithContentsOfFile:((NSString's ¬
stringWithString:filepath)'s stringByStandardizingPath()) ¬
encoding:NSASCIIStringEncoding |error|:nil)
repeat with i from 0 to (hexdump's |length|()) - 1
(bytes's addObject:(hexdump's characterAtIndex:i))
end repeat
return the bytes as list
end readBytes
2. read
the file into a list a short (2-byte) integers and then extract the high- and low-byte values from each
This is the fastest method, and again uses the standard additions read
command, this type mapping the contents directly into a list of short integers. If the number of bytes is odd, then the first byte is is read singly, whilst the remaining are 2-byte pairs that are extracted into 1-byte values and returned as a list:
to readBytes from filepath as text
local filepath
script bytes
property length : get eof of filepath
property index : length mod 2 + 1
property shortInts : read filepath as short ¬
from index for length - index - 1
property list : {}
end script
if bytes's index = 2 then set the end of the list of bytes ¬
to ASCII number of (read filepath for 1)
repeat with shortInt in bytes's shortInts
set abs to (shortInt + 65536) mod 65536
set the end of the list of bytes to abs div 256
set the end of the list of bytes to abs mod 256
end repeat
return the list of bytes
end readBytes
3. read
the file into a data
class object and convert the hexadecimal byte values to their decimal representation
The use of read
here pulls a raw data
encapsulated object that, strictly speaking, we can't do a lot with as it isn't a type class that coerces to any other. However, the additional handler __string__()
is a quick and dirty method of getting the hexadecimal byte values, which are then converted to decimal form and returned:
to __string__(object)
if the object's class = text then return the object
set tids to my text item delimiters
try
set s to {_:object} as null
on error e
set my text item delimiters to "Can’t make {_:"
set s to text items 2 thru -1 of e as text
set my text item delimiters to "} into type null."
set s to text items 1 thru -2 of s as text
set my text item delimiters to tids
end try
s
end __string__
to readBytes from filepath as text
local filepath
script bytes
property data : read filepath as data
property list : {}
end script
script hexdump
property chars : "0123456789ABCDEF"
property string : text 11 thru -2 of __string__(bytes's data)
property hibyte : a reference to text 2 of my string
property lobyte : a reference to text 1 of my string
to decimal()
set i to (offset of hibyte in chars) - 1
set j to (offset of lobyte in chars) - 1
i + j * 16
end decimal
end script
repeat ((hexdump's string's length) / 2 - 1) times
set the end of the list of bytes to hexdump's decimal()
set hexdump's string to hexdump's string's text 3 thru -1
end repeat
return the list of bytes
end readBytes
4. Use AppleScriptObjC to transform an ascii string into unicode hex values then convert to decimal using NSScanner
I included it as an alternative way to convert hexadecimal byte strings to integer decimal values using NSScanner
, but it's actually slow than my vanilla AppleScript handler decimal()
, so this method is more for general interest:
to readBytes from filepath as text
local filepath
set hexdump to ((NSString's stringWithContentsOfFile:((NSString's ¬
stringWithString:filepath)'s stringByStandardizingPath()) ¬
encoding:NSASCIIStringEncoding |error|:nil)'s ¬
stringByApplyingTransform:"Any-Hex" |reverse|:no)'s ¬
componentsSeparatedByString:"\\u00"
hexdump's removeFirstObject()
set hexbytes to hexdump's objectEnumerator()
script bytes
property list : {}
end script
repeat
set hexbyte to the nextObject() of the hexbytes
if hexbyte = missing value then exit repeat
set scanner to NSScanner's scannerWithString:hexbyte
set [bool, s] to scanner's scanHexInt:_1
set the end of the list of the bytes to s as integer
end repeat
return the list of bytes
end readBytes
5. Use JSObjC (JXA-ObjectiveC) to read the raw data then...
Retrieve an array of C-pointers to the bytes values directly
One of the nice things about JXA is the access it has to other data types outwith AppleScriptObjC, which means we can manipulate C data types and access array buffers:
function readBytes(filepath) { const bytes = $.NSData.dataWithContentsOfFile( $.NSString.stringWithString(filepath) .stringByStandardizingPath); const bytesPtr = bytes.bytes; var bytesArr = []; const numBytes = Number(bytes.length); for (let i = 0; i < numBytes; i++) { bytesArr.push(bytesPtr[i]); } return bytesArr; }
The disappointing thing in this particular case is that accessing the values in an array buffer has to be done iteratively in order to manually copy the values over into a JavaScript
array
object. This isn't slower than the other methods, but it's slower than I feel it would have been were this not the case.So it can be a little surprising when a more manual implementation that looks like it ought to be slower is, in fact, noticeably faster than using ready-made API methods/functions:
Access the hexadecimal string value and manually decimalise
The
NSData
class object has adescription
that contains the hexadecimal string representing the file's contents. It requires a small amount of clean up, using regular expressions, that trim unwanted characters and split the hex string into an array of paired hex bytes. Then JavaScript provides themap()
function that saves iterating manually, allowing each hex byte pair to be sent through the JXA translated version of mydecimal()
handler from before:function readBytes(filepath) { const bytes = $.NSData.dataWithContentsOfFile( $.NSString.stringWithString(filepath) .stringByStandardizingPath); var bytesArr = []; const bytesStr = bytes.description; bytesArr = ObjC.deepUnwrap(bytesStr .stringByReplacingOccurrencesOfStringWithStringOptionsRange( '(?i)\\<?([A-F0-9]{2})\\>?\\B', '$1 ', $.NSRegularExpressionSearch, $.NSMakeRange(0, bytesStr.length) ).componentsSeparatedByString(' ') ).map(hexbyte => { if (hexbyte.length != 2) return null; const hexchars = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "a", "b", "c", "d", "e", "f"]; const hex = hexbyte.split(''); const hi = hexchars.indexOf(hex[1]), lo = hexchars.indexOf(hex[0]); return (lo * 16) + hi; }); bytesArr.pop(); return bytesArr; }
Headers
If you want to test any of the AppleScriptObjC code for yourself, include these lines at the top of the script:
use framework "Foundation"
use scripting additions
property this : a reference to the current application
property nil : a reference to missing value
property _1 : a reference to reference
property NSArray : a reference to NSArray of this
property NSData : a reference to NSData of this
property NSMutableArray : a reference to NSMutableArray of this
property NSScanner : a reference to NSScanner of this
property NSString : a reference to NSString of this
property NSASCIIStringEncoding : a reference to 1
property NSRegularExpressionSearch : a reference to 1024
property NSUTF16StringEncoding : a reference to 10
This is an exhaustive list that covers all of the various AppleScriptObjC snippets above, so you can delete any properties that aren't used in a specific script if you want to.
The script that ended up being fastest in my testing (which wasn't by any means thorough or even quantified, but it stood out as returning an immediate result) was number (2), which is written in vanilla AppleScript. Therefore, this does not require the above headers, and it's advisable not to include them if they aren't necessary.
For the JSObjC scripts, you will want to insert this auto-run function below the readBytes
function declaration:
(() => {
const filepath = '/Users/CK/Desktop/Pasted on 2019-07-28 at 07h08m.jpg';
return readBytes(filepath);
})();
回答2:
I knew I'd seen this somewhere before. There's an old post at MacScripter where people dive into this problem fairly deeply. It's well worth a read if you're inclined that way, but the simplest version seems to be this:
set theFile to choose file
set theBytes to getByteValues(theFile)
on getByteValues(thisFile) -- thisFile's an alias or a file specifier.
script o
property integerValues : {}
property byteValues : {}
on convertBytesToHex()
repeat with thisItem in byteValues
set s to ""
repeat until contents of thisItem = 0
tell (thisItem mod 16)
if it > 9 then
set s to character (it - 9) of "ABCDEF" & s
else
set s to (it as string) & s
end if
end tell
set contents of thisItem to thisItem div 16
end repeat
set contents of thisItem to s
end repeat
end convertBytesToHex
end script
set fRef to (open for access thisFile)
try
-- The file will be read as a set of 4-byte integers, but does it contain an exact multiple of 4 bytes?
set oddByteCount to (get eof fRef) mod 4
set thereAreOddBytes to (oddByteCount > 0)
-- If the number of bytes isn't a multiple of 4, treat the odd ones as being in the first four, then …
if (thereAreOddBytes) then set end of o's integerValues to (read fRef from 1 for 4 as unsigned integer)
-- … read integers from after the odd bytes (if any) to the end of the file.
set o's integerValues to o's integerValues & (read fRef from (oddByteCount + 1) as unsigned integer)
close access fRef
on error errMsg number errNum
close access fRef
error errMsg number errNum
end try
-- Extract the odd-byte values (if any) from the first integer.
if (thereAreOddBytes) then
set n to beginning of o's integerValues
repeat oddByteCount times
set end of o's byteValues to n div 16777216
set n to n mod 16777216 * 256
end repeat
end if
-- Extract the 4 byte values from each of the remaining integers.
repeat with i from 1 + ((thereAreOddBytes) as integer) to (count o's integerValues)
set n to item i of o's integerValues
set end of o's byteValues to n div 16777216
set end of o's byteValues to n mod 16777216 div 65536
set end of o's byteValues to n mod 65536 div 256
set end of o's byteValues to n mod 256 div 1
end repeat
o's convertBytesToHex()
return o's byteValues
end getByteValues
on convertNumberToHex(aNumber)
set s to ""
set n to get aNumber
repeat until n is 0
tell (n mod 16)
if it > 9 then
set s to character (it - 9) of "ABCDEF" & s
else
set s to (it as string) & s
end if
end tell
set n to n div 16
end repeat
set contents of aNumber to s
end convertNumberToHex
I've added a routine to convert the integer values to hex-value strings; not sure which form you prefer.
来源:https://stackoverflow.com/questions/57235855/how-do-i-read-a-file-as-a-byte-array-in-applescript