229 views
CERNBox meetings - SharePoint Alternatives === **Date:** 23/04/2020 **Attendees:** Hugo (CERNBox), Pablo (Sharepoint), Eduardo (WebFrameworks) ## Topics discussed: ### Document version CERNBox support multiple versions of a file. Each version is generated when the file is saved. The previous versions are phisically represented by a hidden folder following the pattern .sys.v#.{ORIGINAL_FILENAME} whithin this folder each previous version is represented by a file {timestamp}.xxxx containing the content of the file in that version of time. ``` pwd /eos/user/e/eduardoa/test ls -la total 13 drwx------. 2 eduardoa cw 4096 Apr 23 15:24 . drwx------. 2 eduardoa cw 4096 Feb 28 14:38 .. drwx------. 2 eduardoa cw 4096 Apr 23 15:24 .sys.v#.test.txt -rw-r--r--. 1 eduardoa cw 18 Apr 23 15:24 test.txt cd .sys.v#.test.txt/ ls -la total 9 drwx------. 2 eduardoa cw 4096 Apr 23 15:24 . drwx------. 2 eduardoa cw 4096 Apr 23 15:24 .. -rw-r--r--. 1 eduardoa cw 0 Dec 10 10:19 1575969589.10c1ba3b -rw-r--r--. 1 eduardoa cw 8 Dec 10 10:19 1575969593.10c1ba55 cat 1575969593.10c1ba55 .m;kdsds ``` The second part is the hexadecimal representation of the inode of the file. When manually generating versions this part can be faked. ### Shared with me list CERNBox team is working to improve the structure of the shared files, it's a milestone for this year with high priority. In the future, you can decline or approve the request for shared documentation. At this time the mode to reduce the shared folders you can find the option "Decline Share" ![declineShare](https://codimd.web.cern.ch/uploads/upload_ca11c567b5bcae2633f36f26291c64c6.png) ### Permissions CERNBox supports user and e-group permissions. **Important:** For the moment it doesn't allow lightweight accounts, while SharePoints does. ### Individual file permissions CERNBox supports sharing of individual files by using public links with different permissions levels. User and group sharing is not supported on files due to limitations of the underlying filesystem, EOS. This limitation should have been addressed in EOS and we plan this year to also include support to share individual files with users and groups. ### Integrations with more applications #### Kopano, CODIMD, Indico, ... CERNBox team had discussions with CERN mail team and Kopano representatatives to connect Kopano to CERNBox. Ideas discussed were that from Kopano you can generate public links for big file attachments and upload and download from your mail attachments to CERNBox. Kopano provides a integration with ownCloud and they are working in the new integration with the new ownCloud UI codenamed Phoenix. Other integrations CERNbox team are currently looking at a part from the existing ones are integrations with Collabora and with CODIMD. A possible integration is foreseen with Indico as part of the CS3MESH4EOSC project where we'll get a Fellows to help in this task. This year CERNBox team plans to introduce the new ownCloud product: OCIS which will connect to the new CERN SSO infrastructure using OIDC. We have a test site here: https://ocis-latest.owncloud.com/#/login User: einstein Pass: relativity #### Office CERNBox currently offers integration with Microsoft Office 365 and with OnlyOffice (in Canary Mode). The team is working on pushing OnlyOffice to production in the coming weeks. Maria Alandes from CDA is leading this effort and Mario Rey will be responsible for running the production OnlyOffice server. Once OnlyOffice is in production we plan to put the integration with Office 365 in the backstage so users will use OnlyOffice by default and therefore reduce usage of this Microsoft product. ### Directory Structure Two models on CERNBox, user space and project space. For the Sharepoint use case it has more sense to have a project space approach. In the project space approach a service account is needed to create the space. Then the root access control is controlled by 3 e-groups (admin, read access, write access). Those base permissions can be overriden by specific ones. Again two possibilities: - a global project `WorkSpaces` managed by Webservices that will contain one folder per site that needs to be migrated. Owners and administrators of the site will have special permissions on the site's folder (Probably we can't grant them admin permissions and only read/write permissions, to confirm) - **(\*Agreed)** Each Sharepoint will generate a project. Shouln't be a problem for CERNBox team in terms of number of projects. This will simplify permissions management, granting owners and administrators of the site admin rights on the project by used the provided e-groups. Each SharePoint Document library within the site will generate a directory on the project space. **TODO:** Provide Hugo an estimation in the number of files, folders and total size for the SharePoint sites ### Data Migration Data Migration will involve different steps: #### Synchronization of the files SharePoint team will export the document structure from the SharePoint database to a filesystem representation in an the structured directory that will be stored in a DFS space. A Powershell script will be needed to traverse the SharePoint site's Document Libraries and extract folders and files stored in the database to generate the filesystem structure. Following the same approach as done for the DFS migration the CERNBox team has a machine prepared that mounts DFS and EOS to perform the data migration. CERNBox team will grant access to this machine to the SharePoint team to perform an rsync synchronization so they can manage the migration process at their own pace. Basically running something along this line: ``` rsync /dfs/projects/SharePoint/exports/SiteA /eos/project/s/site-a ``` #### Versions Additionally the script will detect previous tracked versions of the files and generate the necessary as explained on the `Document version` sub section of this document. Rsync is needed again to move generated files on DFS to EOS/CERNBox #### ACLs generation The first step to is lay down conceptually how to map SharePoint ACLs to CERNBox ACLS. SharePoint team will need to get a list of the ACLs and the meaning that it has for SharePoint users and how these will be mapped in CERNBox. Once this is identified, then the ACLS from SharePoint will need to be extracted. A script will be needed to parse the SharePoint ACLs and run commands on EOS that will set these ACLs. To been seen later on what would be the best option to set the ACLS on EOS. ### Search Only directory and folders searches are available for the moment and in the current working directory. No current plans for this year to focus on improving this search mechanism. However, due to the upgrade to the new and more scalable product from ownCloud, this event could open the possibility to investigate allowing full text search by using Elastic Search for individual users/individual directories. ### Mail integration Some specific sites (fire brigade or project-HL-LHC-Technical-coordination) use workflows or the incoming email feature to sent emails to a Sharepoint email account, the emails sent to the site's address will end up stored (with attachments) on the document library. This functionality doesn't exist on CERNBox **TODO:** Identify use case, propose alternatives to the email gateway. ### Next steps Prepare a test site with Document Library. Create some files, directories with different permissions and roles. And prepare the script to create the necessary filesystem structure. - [x] First focused only on extracting the structure and files.e - [ ] Then versions - [ ] Finally permissions #### Data * Sites o project number: 2496 * Size: 1960,285 GB This data is not updated since 2015 so maybe there are more or less number of files and folders (They are 1098 GB of site data) Folders: 279.514 Files: 2.839.943 Example: Path Export: \\CERN\dfs\Services\WebArchive\s\SP13-PGJ-20200424T111336Z URL Site: https://espace.cern.ch/SP13-PGJ/_layouts/15/start.aspx#/_layouts/15/viewlsts.aspx In the export file we have the ACLs: \\CERN\dfs\Services\WebArchive\s\SP13-PGJ-20200424T111336Z\Shared Documents\Documents - data