问题
I'm working on a project where I need to get the Text data from pdf files and dump the whole text in a DB column. With the help of iTextsharp, I got the data and referred it String.
But now I need to check whether the string exceeds the 4MB limit or not and if it is exceeding then accept the string data which is less than 4MB in size.
This is my code:
internal string ReadPdfFiles()
{
// variable to store file path
string filePath = null;
// open dialog box to select file
OpenFileDialog file = new OpenFileDialog();
// dilog box title name
file.Title = "Select Pdf File";
//files to be accepted by the user.
file.Filter = "Pdf file (*.pdf)|*.pdf|All files (*.*)|*.*";
// set initial directory of computer system
file.InitialDirectory = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
// set restore directory
file.RestoreDirectory = true;
// execute if block when dialog result box click ok button
if (file.ShowDialog() == DialogResult.OK)
{
// store selected file path
filePath = file.FileName.ToString();
}
//file path
/// use a string array and pass all the pdf for searching
//String filePath = @"D:\Pranay\Documentation\Working on SSAS.pdf";
try
{
//creating an instance of PdfReader class
using (PdfReader reader = new PdfReader(filePath))
{
//creating an instance of StringBuilder class
StringBuilder text = new StringBuilder();
//use loop to specify how many pages to read.
//I started from 5th page as Piyush told
for (int i = 5; i <= reader.NumberOfPages; i++)
{
//Read the pdf
text.Append(PdfTextExtractor.GetTextFromPage(reader, i));
}//end of for(i)
int k = 4096000;
//Test whether the string exceeds the 4MB
if (text.Length < k)
{
//return the string
text1 = text.ToString();
} //end of if
} //end of using
} //end try
catch (Exception ex)
{
MessageBox.Show(ex.Message, "Please Do select a pdf file!!", MessageBoxButtons.OK, MessageBoxIcon.Warning);
} //end of catch
return text1;
} //end of ReadPdfFiles() method
Do help me!
回答1:
Changing the Length
of the StringBuilder to your desidered length is the simplest way to reach your point. There are other ways, as noted in other answers, but you need to account for their side effects like exceptions or inefficiency in string handling.
try
{
using (PdfReader reader = new PdfReader(filePath))
{
StringBuilder text = new StringBuilder();
.....
int k = 4096000;
// If length > limit (k) then truncate
if (text.Length > k)
text.Length = k;
// Truncate at k or get everything
text1 = text.ToString();
} //end of using
}
......
回答2:
There are several possibilities:
- You could read the documentation for the StringBuilder class.
- You could init your stringbuilder with a maximum capacity.
- You could use StringBuilder.ToString(0, maxlength)
- You could use StringBuilder.ToString().Substring(0, maxlength)
BTW: 4MB = 4194304 Bytes
回答3:
The solution of simply truncating the StringBuilder to a specified length will not handle surrogate pairs and combining character sequences correctly. Surrogate pairs are sequences of two .Net chars that represent a single unicode code point; certain Kanji characters are represented this way. Combining character sequences represent a character with a diacritical or other modifying mark. Thus if your PDF document might contain international characters (and you should assume this for any user-created document), you need to truncate the StringBuilder at the last abstract character boundary on or before the StringBuilder length would exceed your maximum length,
.Net provides utilities for enumerating through the abstract characters in a string, however they provide no similar tools for enumerating through a more general list of characters such as a StringBuilder. Thus I would suggest preventing the StringBuilder from ever exceeding your maximum length rather than truncating it afterwards:
public static bool AppendUpToMaximumLength(this StringBuilder sb, string str, int maxLen)
{
if (sb == null)
throw new ArgumentNullException("sb");
if (str == null)
str = string.Empty; // Or throw an exception if that's your coding convention.
var sbLen = sb.Length;
if (sbLen > maxLen)
return false;
if (sbLen + str.Length <= maxLen)
{
sb.Append(str);
return true;
}
//http://referencesource.microsoft.com/#mscorlib/system/globalization/textelementenumerator.cs
var enumerator = StringInfo.GetTextElementEnumerator(str);
while (enumerator.MoveNext())
{
var textElement = enumerator.GetTextElement();
var elemLen = textElement.Length;
if (sb.Length + elemLen > maxLen)
return false;
sb.Append(textElement);
}
return true;
}
来源:https://stackoverflow.com/questions/24710770/how-to-restrict-a-content-of-string-to-less-than-4mb-and-save-that-string-in-db