I have a directory containing ORC files. I am creating a DataFrame using the below code
var data = sqlContext.sql(\"SELECT * FROM orc.`/directory/containing/
If you have the parquet version as well, you can just copy the column names over, which is what I did (also, the date column was partition key for orc so had to move it to the end):
tx = sqlContext.table("tx_parquet")
df = sqlContext.table("tx_orc")
tx_cols = tx.schema.names
tx_cols.remove('started_at_date')
tx_cols.append('started_at_date') #move it to end
#fix column names for orc
oldColumns = df.schema.names
newColumns = tx_cols
df = functools.reduce(
lambda df, idx: df.withColumnRenamed(
oldColumns[idx], newColumns[idx]), range(
len(oldColumns)), df)