How to convert img url to BASE64 string in HTML on one method chain by using LINQ or Rx

a 夏天 提交于 2019-12-08 08:57:14

问题


I found I could generate XDocument object from html by using SgmlReader.SL. https://bitbucket.org/neuecc/sgmlreader.sl/

The code is like this.

public XDocument Html(TextReader reader)
{
    XDocument xml;
    using (var sgmlReader = new SgmlReader { DocType = "HTML", CaseFolding = CaseFolding.ToLower, InputStream = reader })
    {
        xml = XDocument.Load(sgmlReader);
    }
    return xml;
}

Also we can get src attributes of img tags from the XDocument object.

var ns = xml.Root.Name.Namespace;

var imgQuery = xml.Root.Descendants(ns + "img")
    .Select(e => new
        {
            Link = e.Attribute("src").Value
        });

And, we can download and convert stream data of image to BASE64 string.

public static string base64String;

WebClient wc = new WebClient();
wc.OpenReadAsync(new Uri(url));  //image url from src attribute
wc.OpenReadCompleted += new OpenReadCompletedEventHandler(wc_OpenReadCompleted);

void wc_OpenReadCompleted(object sender, OpenReadCompletedEventArgs e)
{
    using (MemoryStream ms = new MemoryStream())
    {
        while (true)
        {
            byte[] buf = new byte[32768];
            int read = e.Result.Read(buf, 0, buf.Length);

            if (read > 0)
            {
                ms.Write(buf, 0, read);
            }
            else { break; }
        }
        byte[] imageBytes = ms.ToArray();
        base64String = Convert.ToBase64String(imageBytes);
    }
}

So, What I'd like to do is bellow steps. I'd like to do bellow steps in one method chain like LINQ or Reactive Extensions.

  1. Get src attributes of img tags from XDocument object.
  2. Get image datas from urls.
  3. Generate BASE64 string from image datas.
  4. Replace src attributes by BASE64 string.

The simplest source and output are here.

  • Before

    <html>
    <head>
    </head>
    <body>
        <img src='http://image.com/image.jpg' />
        <img src='http://image.com/image2.png' />
    </body>
    </html>
    
  • After

    <html>
    <head>
    </head>
    <body>
        <img src='...' />
                <img src='...' />
    </body>
    </html>
    

Does anyone know the solution for this?

I'd like to ask experts.


回答1:


Both LINQ and Rx are designed to promote transformations that result in new objects, not ones that modify existing objects, but this is still doable. You have already done the first step, breaking the task into parts. The next step is to make composable functions that implement those steps.

1) You mostly have this one already, but we should probably keep the elements around to update later.

public IEnumerable<XElement> GetImages(XDocument document)
{
    var ns = document.Root.Name.Namespace;
    return document.Root.Descendants(ns + "img");
}

2) This seems to be where you have hit a wall from the composability point of view. To start, lets make a FromEventAsyncPattern observable generator. There are already ones for the Begin/End async pattern and standard events, so this will come out somewhere in between.

public IObservable<TEventArgs> FromEventAsyncPattern<TDelegate, TEventArgs> 
    (Action method, Action<TDelegate> addHandler, Action<TDelegate> removeHandler
    ) where TEventArgs : EventArgs
{
    return Observable.Create<TEventArgs>(
        obs =>
        {
            //subscribe to the handler before starting the method
            var ret = Observable.FromEventPattern<TDelegate, TEventArgs>(addHandler, removeHandler)
                                .Select(ep => ep.EventArgs)
                                .Take(1) //do this so the observable completes
                                .Subscribe(obs);
            method(); //start the async operation
            return ret;
        }
    );
}

Now we can use this method to turn the downloads into observables. Based on your usage, I think you could also use DownloadDataAsync on the WebClient instead.

public IObservable<byte[]> DownloadAsync(Uri address)
{
    return Observable.Using(
             () => new System.Net.WebClient(),
             wc =>
             {
                return FromEventAsyncPattern<System.Net.DownloadDataCompletedEventHandler,
                                             System.Net.DownloadDataCompletedEventArgs>
                          (() => wc.DownloadDataAsync(address),
                           h => wc.DownloadDataCompleted += h,
                           h => wc.DownloadDataCompleted -= h
                          )
                       .Select(e => e.Result);
                //for robustness, you should probably check the error and cancelled
                //properties instead of assuming it finished like I am here.
             });
}

EDIT: As per your comment, you appear to be using Silverlight, where WebClient is not IDisposable and does not have the method I was using. To deal with that, try something like:

public IObservable<byte[]> DownloadAsync(Uri address)
{
    var wc = new System.Net.WebClient();
    var eap = FromEventAsyncPattern<OpenReadCompletedEventHandler,
                                    OpenReadCompletedEventArgs>(
                 () => wc.OpenReadAsync(address),
                 h => wc.OpenReadCompleted += h,
                 h => wc.OpenReadCompleted -= h);
    return from e in eap
           from b in e.Result.ReadAsync()
           select b;
}

You will need to find an implementation of ReadAsync to read the stream. You should be able to find one pretty easily, and the post was long enough already so I left it out.

3 & 4) Now we are ready to put it all together and update the elements. Since step 3 is so simple, I'll just merge it in with step 4.

public IObservable<Unit> ReplaceImageLinks(XDocument document)
{
    return (from element in GetImages(document)
            let address = new Uri(element.Attribute("src").Value)
            select (From data in DownloadAsync(address)
                    Select Convert.ToBase64String(data)
                   ).Do(base64 => element.Attribute("src").Value = base64)
           ).Merge()
            .IgnoreElements()
            .Select(s => Unit.Default); 
            //select doesn't really do anything as IgnoreElements eats all
            //the values, but it is needed to change the type of the observable.
            //Task may be more appropriate here.
}


来源:https://stackoverflow.com/questions/8897769/how-to-convert-img-url-to-base64-string-in-html-on-one-method-chain-by-using-lin

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!