How to access tag information on office files via C#

☆樱花仙子☆ 提交于 2019-12-05 07:27:57

问题


I would like to write a simple bit of code that would extract only the tag information from a set of office (docx, pptx, etc.) files that exist in a directory so that it could be indexed and searched easily.

When I say "tag", I mean the tag info that you have been able to add to a file since Vista. It's typically done using Explorer. For example, the pptx file in the screenshot below has the tag, "bubble" attached.

But searching those tags is already built into Windows, you say? Why, yes, but I need this to only index the tags and I need to expose the info through an intranet rather than inside of Windows.

I have found that inside the office file package, the actual information is stored in /docProps/core.xml file in the cp:keywords element. And I do realize that, in code, I could unzip the file, access that file, and extract what I need. I'm hoping that there's a pre-abstracted solution out there somewhere, however. I seriously doubt that's what Windows is doing to index that same information (but admittedly, I can't really find any good info on it).

I have also found some discussions about IFilters. And yet, this accesses the text of the file. I don't see where an IFilter helps solve this particular problem.

Can anyone point me in the right direction on this one?


回答1:


I don't have word installed but i'll guess that they are accessible from the standard property system as the KEYWORD entries as are the tags on a jpg picture.

If you want to know exactly how it's done, I played with the shell COM API and here is a full sample code in Gist : FileTags.cs. But that was just for fun you should use the Microsoft Windows API Code Pack as their implementation is a lot cleaner.

To get the tags (called keywords internally) reference Microsoft.WindowsAPICodePack.Shell.dll then :

using System;
using Microsoft.WindowsAPICodePack.Shell;

class Program
{
    static void Main()
    {
        var shellFile = ShellFile.FromFilePath(@"C:\path\to\some\file.jpg");
        var tags = (string[])shellFile.Properties.System.Keywords.ValueAsObject;
        tags = tags ?? new string[0];
        Console.WriteLine("Tags: {0}", String.Join("; ", tags));
        Console.ReadLine();
    }
}

if they didn't mess it up it should work starting from Windows XP SP2 (Mine should work from SP1 as I avoided the PropVariantGetStringElem but it's really annoying without them).



来源:https://stackoverflow.com/questions/7759661/how-to-access-tag-information-on-office-files-via-c-sharp

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!