# helix-theblog-scanner **Repository Path**: mirrors_adobe/helix-theblog-scanner ## Basic Information - **Project Name**: helix-theblog-scanner - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-09-24 - **Last Updated**: 2026-04-18 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Helix - TheBlog Scanner TheBlog should run periodically (via an [Openwhisk trigger](https://github.com/apache/openwhisk/blob/master/docs/triggers_rules.md)) and scan [theblog.adobe.com](https://theblog.adobe.com) to determine if new blog entries have been created. For each new blog entry detected, it invokes [TheBlog Importer](https://github.com/adobe/helix-theblog-importer). The execution flow looks like this: - fetch the content of the theblog.adobe.com homepage - compute the list of links on the page - for each link, check if it present in a list of already processed urls stored in a OneDrive XLSX file (`/importer/urls.xlsx`) - if not present, invoke [helix-theblog-importer action](https://github.com/adobe/helix-theblog-importer) It happens sometimes that the post entries published on [theblog.adobe.com](https://theblog.adobe.com) are corrupted and get fixed later. The scanner may have already detected and triggered the import of the corrupted version. To re-trigger the import, simply remove the entry from the `/importer/urls.xlsx` file (delete row): if the blog entry is still visible on the homepage, it will be re-imported. If not, then you need to manual trigger the import: change the URL and run the test https://github.com/adobe/helix-theblog-importer/blob/master/test/index.test.js#L24. ## Status [![CircleCI](https://img.shields.io/circleci/project/github/adobe/helix-theblog-scanner.svg)](https://circleci.com/gh/adobe/helix-theblog-scanner) [![GitHub license](https://img.shields.io/github/license/adobe/helix-theblog-scanner.svg)](https://github.com/adobe/helix-theblog-scanner/blob/master/LICENSE.txt) [![GitHub issues](https://img.shields.io/github/issues/adobe/helix-theblog-scanner.svg)](https://github.com/adobe/helix-theblog-scanner/issues) [![LGTM Code Quality Grade: JavaScript](https://img.shields.io/lgtm/grade/javascript/g/adobe/helix-theblog-scanner.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/adobe/helix-theblog-scanner) [![semantic-release](https://img.shields.io/badge/%20%20%F0%9F%93%A6%F0%9F%9A%80-semantic--release-e10079.svg)](https://github.com/semantic-release/semantic-release) ## Setup ### Installation Deploy the action: ``` npm run deploy ``` Create a five mins triggers: ```bash wsk trigger create five-mins-trigger --feed /whisk.system/alarms/alarm --param cron "*/5 * * * *" ``` Link the trigger to a rule: ```bash wsk rule update five-mins-scan five-mins-trigger helix-theblog/helix-theblog-scanner@latest ``` ### Required env variables: Connection to OneDrive: - `AZURE_ONEDRIVE_CLIENT_ID` - `AZURE_ONEDRIVE_CLIENT_SECRET` - `AZURE_ONEDRIVE_REFRESH_TOKEN` OneDrive shared folder that contains the `/importer/urls.xlsx` file: - `AZURE_ONEDRIVE_ADMIN_LINK` Openwhish credentials to invoke the helix-theblog-importer action: - `OPENWHISK_API_KEY` - `OPENWHISK_API_HOST` Coralogix credentials to log: - `CORALOGIX_API_KEY` - `CORALOGIX_LOG_LEVEL` ## Development ### Deploying Helix Service Deploying Helix Service requires the `wsk` command line client, authenticated to a namespace of your choice. For Project Helix, we use the `helix` namespace. All commits to master that pass the testing will be deployed automatically. All commits to branches that will pass the testing will get commited as `/helix-theblog/helix-theblog-scanner@ci` and tagged with the CI build number.