Import Sharepoint picture library images into SQL Server database by SSIS

We have a request to synchronize a particular customer list on SharePoint with the SQL Server database, this list actually is a picture library, so the task actually need an operation which retrieve Sharepoint library images to the SQL server table (the image would store in a “image” type field). It can be achieved by different ways, one is using Microsoft.Sharepoint.dll which is really simple for development as all objects you needed are there, but this solution only can run on the server which installed sharepoint and limited its usage. So eventually we selected the solution to use SSIS SharePointListAdapters and sharepoint imaging service to implement this.

1. SharepointListAdapter for SSIS
This adapter is issued by Microsoft SQL Server Community on codeplex and accepted by MSDN. So far it supported SQL Server 2005/2008/2008R2/2012, and the installation can be auto recognized on SQL Server 2008 or upper version. The official website can be referenced at sharepoint list adapter, the detail of how to setup and samples can be referenced at Sharepoint Adapter details usage 1 and Sharepoint Adapter details usage 2

2. Sharepoint web service
a. why this SSIS package need call sharepoint web service.
If the request just need to import those regular list or non document/picture library attachments, then the adapter works perfectly. However it doesn’t work for those binary field no matter your data conversion component setup, the trick is as below
20130205_ssis8
as you can see the adapter output columns recognized this column to be a unicode string and cannot be changed to other data type (the correct data type should be “image” or “byte steam”), hence all the input columns afterward this source already convert the binary image data to a string, so there is no luck to fill the sql server “image” field for sure.

b. How to revoke sharepoint web service
First, take a glance of all web services provided by Sharepoint at here web services, I used the imaging web service as the importing referenced to the picture library.
Note 1: Since SSIS 2008, it allow you to add web service reference directly similar as add web service in VS2008/2010 (In Visual Studio 2010–> add “Service Reference” –> From “Advance” add web service), in VS2005, a proxy class needs to be generated by wsdl.exe.
Notes 2: the first retrieve actually got a list of the picture library records (xml content with “z:row” properties, can got it from the outerxml property ), in this xml list, it provided the image absolute path in the property “ows_EncodedAbsUrl”, after got this path, then retrieve it into a steam and write the steam to the database.
Note 3: the ID is the unique ID in sharepoint library, which can be used for updating.
Notes 4: Accessing ADO.Net Connection Managers from an SSIS script task / script component
You cannot access the connection string as the regular VisualStudio project (from app.config / web.config), because the app.config file actually only effective in development time, and at runtime, cannot be access from ConfigurationManager etc. The only way to access the connection is by the connection setup in SSIS itself.
you’ll need to add a reference to the Microsoft.SqlServer.DTSRuntimeWrap assembly to get the IDTSConnectionManagerDatabaseParameters100 interface. If you’re doing this in a script task, you’ll need to prefix the Microsoft.SqlServer.Dts.Runtime.Wrapper namespace (or use fully qualified names) so that it doesn’t conflict with the namespace for the VSTA proxy classes.

For ado.net connection (Native client) use the following:

sqlConnection conn = (SqlConnection)Dts.Connections["adonet"].AcquireConnection(null);

For OLEDB connection managers is little complex

ConnectionManager cm = Dts.Connections["oledb"];
IDTSConnectionManagerDatabaseParameters100 cmParams = cm.InnerObject as IDTSConnectionManagerDatabaseParameters100;
OleDbConnection conn = cmParams.GetConnectionForSchema() as OleDbConnection;

So the step would be
1) add script task to the SSIS package
2) add sharepoint imaging web service to the script project
3) add the following code to revoke the service and insert into the table.

    using System;
    using System.Data;
    using Microsoft.SqlServer.Dts.Runtime;
    using System.Windows.Forms;
    using System.IO;
    using System.Data.SqlClient;
    using System.Diagnostics;
    using System.Collections.Generic;
    using System.Text;
    using System.Net;
    using System.Xml;
    using System.Xml.XPath;

public void Main()
{
   string jpgname = string.Empty;
   string jpgpath = string.Empty;
   string imgid = string.Empty;
   try
      {
        SharepointWS_Photos.Imaging  objLists = new SharepointWS_Photos.Imaging ();
        objLists.Credentials = System.Net.CredentialCache.DefaultCredentials;
        objLists.Url = "http://mysharepoint/sites/mypictures/_vti_bin/Imaging.asmx";
        XmlDocument xmlDoc = new System.Xml.XmlDocument();
        XmlNode ndListItems = objLists.GetListItems("MyPhotosList", "");
        StringReader sr = new StringReader(ndListItems.OuterXml);
        xmlDoc.Load(sr);
        XmlNodeList nodelist = xmlDoc.GetElementsByTagName("z:row");
        foreach (XmlNode mynode in nodelist)
        {
           jpgpath = mynode.Attributes["ows_EncodedAbsUrl"].Value;
           string[] segment = jpgpath.Split(Convert.ToChar("/"));
           jpgname = segment[segment.Length - 1];
           imgid = mynode.Attributes["ows_ID"].Value;
           DownLoadAttachment(jpgpath, jpgname, imgid);
        }
     }
   catch (Exception ex)
      {
         Console.WriteLine(ex.Message);
       }

   Dts.TaskResult = (int)ScriptResults.Success;
        
}

private void DownLoadAttachment(string strURL, string strFileName, string imgid)
{
   HttpWebRequest request = default(HttpWebRequest);
   HttpWebResponse response = null;
   byte[] imageData = null;
   ConnectionManager cm = Dts.Connections["MyADODatabaseConnection"];
   SqlConnection conn = (SqlConnection)cm.AcquireConnection(null);
   try
     {
       request = (HttpWebRequest)WebRequest.Create(strURL);
       request.Credentials = System.Net.CredentialCache.DefaultCredentials;
       request.Timeout = 10000;
       request.AllowWriteStreamBuffering = false;
       response = (HttpWebResponse)request.GetResponse();
       Stream imgstream = response.GetResponseStream();
       BinaryReader br = new BinaryReader(imgstream);
       imageData = br.ReadBytes(Convert.ToInt32(response.ContentLength));
       SqlCommand sqlCmd = new SqlCommand("UPDATE MyPhotoTable SET photoField = @imgbinary where spID=" + imgid, conn);
                sqlCmd.Parameters.Add("@imgbinary", SqlDbType.Image);
                sqlCmd.Parameters["@imgbinary"].Value = imageData;
                sqlCmd.ExecuteNonQuery();

            }
            catch (Exception ex)
            {
              
                Console.WriteLine(ex.Message);
            }
            finally 
            {
                if (conn.State == ConnectionState.Open) conn.Close();
            }


        }


3. The Complete SSIS package
a. Control flow:
The first component is a SQL Server task which delete all the existing records
The second component is a data flow which load all the picture library records with all columns except the picture (binary or image) field
The third component is the SQL Server task which run an SQL statement to check the loading successfully or not
The fourth one is a script task which runs the script showing on upper. Before running this component, there is a condition there (“f(x)”), so this step will only run when the pre-step runs successfully.
20130205_ssis2
b. Data flow:
20130205_ssis5

Advertisements

SSIS – issue of importing Excel data to database

I really want to finish the jQuery series before write anything else, but just cannot help composing this post because it took me hours to solve the issue of importing an Excel sheet to the database.

I have a spread sheet which contains two columns as “Notes” and “Reference”, normally those two columns are filled by explanation words, less than 500 words but some of them exceeded 255 words.

When I first imported from SQL Server Management Studio, it gave the following error “[Excel Source [20]] Error: There was an error with Excel Source.Outputs[Excel Source Output].Columns[Notes] on Excel Source.Outputs[Excel Source Output]. The column status returned was: “Text was truncated or one or more characters had no match in the target code page.”.” which is make sense because some cell’s “notes” data is definitely longer than 255 characters, it seems the solution is simple, just change the field size to a larger one. However even I find every place as I can to change the size to 500, it still gave the upper error. Then I thought maybe it is due to the Management Studio’s restriction, so I create a SSIS package from scratch by VS2010. This SSIS package is very simple as below:

SSISExcelimport_1         SSISExcelimport_2

Actually I tried to add a data conversion component between the source and destination, but it is not the reason of why the truncate error still showing up, so I just ignore that part and keep the description simple.

I did find the place to reset the source input columns definition. Right click the source rectangle and select “show advanced editor” and you will see the screen below by click “Input and Output properties” tab.

SSISExcelimport_3 SSISExcelimport_4

The weird thing is you only be able to change the data type and size under “output columns” but not “External Columns”, actually “External Columns” allow you change everything, but just won’t save it and no alerts (crap…) which confused me for quite a while.

After some research, everything is clear. See the following from MSDN regarding SSIS Excel data driver:

Truncated text. When the driver determines that an Excel column contains text data, the driver selects the data type (string or memo) based on the longest value that it samples. If the driver does not discover any values longer than 255 characters in the rows that it samples, it treats the column as a 255-character string column instead of a memo column. Therefore, values longer than 255 characters may be truncated. To import data from a memo column without truncation, you must make sure that the memo column in at least one of the sampled rows contains a value longer than 255 characters, or you must increase the number of rows sampled by the driver to include such a row. You can increase the number of rows sampled by increasing the value of TypeGuessRows under the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel registry key. For more information, see PRB: Transfer of Data from Jet 4.0 OLEDB Source Fails w/ Error.

That means, SSIS Excel import driver will use the first 10 (default) rows in the sheet for testing, if no data in this column are greater than 255 characters, then it treated this column as a “unicode string [DT_WSTR]” and size would be a firm 255 and cannot be modified under “External Columns”(SSIS looks like too smart…..).

So the solution could be(First of all, ensure your table fields are big enough, for example set to varchar(500) etc.):

1. put a long dummy string into the first cell of that column so that the driver will detected it and treated this column to be a “unicode text stream [DT_NText]”, then you can change the relative column under “Output Columns” to “unicode string [DT_WSTR]” and size could set to 500. Then the importing would pass and no truncate error showing up.

SSISExcelimport_5

2. change HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel registry key to a big number, in this case, the SSIS will detect more rows in Excel. However normally the server registry key won’t be allowed to to touch by developers except you are the super admin…..

3. save this Excel to a csv file, then you will find “External Columns” allow you to modify and save, in this way you can easily to define the data type and size to what you want. However, if there are many double quotes, spaces, single quote, comma etc. inside your “Notes” field, you have to deal with another issue “define the right deliminator to  figure out columns”.

I would recommend the first solution, but it may not fit for some special scenarios. Also please read the full reference from :MSDN: SSIS – Excel Source