SOLR index and extract .sh and .sql files

纵饮孤独 提交于 2019-12-25 12:46:10

问题


Recently I have set up an image with SOLR. My goal is to index and extract files on a Windows and Linux server. It is possible for me to index and extract data from multiple file types. This is done by the SOLR CELL request handler. See the post.jar cmd below.

j ava -Dauto -Drecursive -jar post.jar Y:\ SimplePostTool version 1.5 Posting files to base url localhost:8983/solr/update.. Entering auto mode. File endings considered are xml,json,csv,pdf,doc,docx,ppt,pp tx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log Entering recursive mode, max depth=999, delay=0s 0 files indexed.

Is it possible to index and extract metadata/content from file types like .sh and .sql? If it is possible I would like to know how of course :)


回答1:


WHAT specifically do you want to extract from .sh files and .sql files that's different to any other generic file (name, location, date, etc).

Do you want to extract command names used in .sh? Do you want to extract table/field names from .sql? I don't think it is possible now, but if there is a parser for the file format, it can be connected to Tika as a module. And Tika is what Solr uses under the covers.




回答2:


I solved it today. I only needed to add de sh and sql to the mime-map of the SimplePostTool.Java.

mimeMap = new HashMap<>();
mimeMap.put("xml", "text/xml");
mimeMap.put("csv", "text/csv");
mimeMap.put("json", "application/json");
mimeMap.put("pdf", "application/pdf");
mimeMap.put("rtf", "text/rtf");
mimeMap.put("html", "text/html");
mimeMap.put("htm", "text/html");
mimeMap.put("doc", "application/msword");
mimeMap.put("docx", "application/vnd.openxmlformats-officedocument.wordprocessingml.document");
mimeMap.put("ppt", "application/vnd.ms-powerpoint");
mimeMap.put("pptx", "application/vnd.openxmlformats-officedocument.presentationml.presentation");
mimeMap.put("xls", "application/vnd.ms-excel");
mimeMap.put("xlsx", "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
mimeMap.put("odt", "application/vnd.oasis.opendocument.text");
mimeMap.put("ott", "application/vnd.oasis.opendocument.text");
mimeMap.put("odp", "application/vnd.oasis.opendocument.presentation");
mimeMap.put("otp", "application/vnd.oasis.opendocument.presentation");
mimeMap.put("ods", "application/vnd.oasis.opendocument.spreadsheet");
mimeMap.put("ots", "application/vnd.oasis.opendocument.spreadsheet");
mimeMap.put("txt", "text/plain");
mimeMap.put("log", "text/plain");
mimeMap.put("sh", "text/plain");
mimeMap.put("sql", "text/plain");

I also added the sh and sql to the following code:

private static final String DEFAULT_FILE_TYPES = "xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log";


来源:https://stackoverflow.com/questions/23514196/solr-index-and-extract-sh-and-sql-files

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!